## Theoretical Concept

Logistics Regression is a supervised machine learning algorithm which is used when the target variable is categorical. Don't let the 'Regression' in the name fool you. This algorithm is normally used for classification.

Logistic Regression can be considered as a variation of linear regression where we use a sigmoid function as the hypothetical function.

\begin{equation*}
h( x ) = sigmoid( mx + c )
\end{equation*}


Here, m is the weight vector.<br>
x is the feature vector. <br>
c is the bias.

<center>$sigmoid(z) = \frac{1}{1 + e^(-z)}$</center>

## Mathematical Intuition

The cost function that we use for the logistic regression so that the target value is between 0 and 1 is : <br>
<br>
<center>$ J = -y(log(h(x)) - (1 - y)log(1 - h(x)) $</center>

The gradient descent is the same as that of a linear regression with steps and weights being reduced by using the learning rate until convergence. 

## Datasets

We use the same datasets of iris that we have used in <a href="https://github.com/RitwickSV/ML-Models-From-Scratch/blob/main/Classification/K%20Nearest%20Neighbors.ipynb"> KNN </a>  classification.

## Importing modules

In [30]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
import warnings
warnings.filterwarnings( "ignore" )
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score

## Model for Logistic Regression

In [31]:
class LogReg: #Model For Logistic Regression from scratch
    def __init__(self, learning_rate, num_iterations):
        self.learning_rate = learning_rate
        self.num_iterations = num_iterations
    
    #Model training
    def fit(self, X, Y):
        self.X = X
        self.Y = Y
        self.examples, self.features = np.shape(X)
        
        #initialising weight to 0
        self.M = np.zeros(self.features)
        self.c = 0
        
        #gradient descent
        for i in range(self.num_iterations):
            
            H = 1 / (1 + np.exp( - (self.X.dot(self.M) + self.c)))
            
            #calculate the gradients
            tmp = ( H - self.Y.T )        
            tmp = np.reshape( tmp, self.examples )        
            df_M = np.dot( self.X.T, tmp ) / self.examples         
            df_c = np.sum( tmp ) / self.examples 
        
            #update weights and intercept until convergence
            self.M = self.M - self.learning_rate * df_M    
            self.c = self.c - self.learning_rate * df_c
            
        return self
    
    #Model Prediction
    def predict(self, X):
        H = 1 / ( 1 + np.exp( - ( X.dot( self.M ) + self.c ) ) )        
        Y = np.where( H > 0.5, 1, 0 )        
        return Y

We have finished generating our model for logistic regression using the learning rate and the number of iterations.
Let's call and check the model and compare it to the model present in `sklearn.datasets`

In [32]:
iris = load_iris()
X = iris.data
Y = iris.target

#Split data
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.2, random_state = 0 )


In [33]:
# Training both the models
model = LogReg( learning_rate = 0.01, num_iterations = 10000 )
model.fit( X_train, Y_train )

model1 = LogisticRegression()    
model1.fit( X_train, Y_train)

LogisticRegression()

In [34]:
# Prediction on test set
Y_pred = model.predict( X_test )    
Y_pred1 = model1.predict( X_test )

Let's compare the models' performance

In [35]:
# measure performance    
correctly_classified = 0    
correctly_classified1 = 0

# counter    
count = 0    
for count in range( np.size( Y_pred ) ) :  

    if Y_test[count] == Y_pred[count] :            
        correctly_classified = correctly_classified + 1

    if Y_test[count] == Y_pred1[count] :            
        correctly_classified1 = correctly_classified1 + 1

    count = count + 1

print( "Accuracy on test set by our model     :  ", ( 
  correctly_classified / count ) * 100 )
print( "Accuracy on test set by sklearn model   :  ", ( 
  correctly_classified1 / count ) * 100 )

Accuracy on test set by our model     :   43.333333333333336
Accuracy on test set by sklearn model   :   100.0
