# Logistic regression :

In this notebook we will be seeing how to implement logistic regression in two steps : 
- First to gain a knowledge on how things work under the hood we will be doing that from scratch without the help of any ML library 

- Then we will be applying the same thing using scickit learn.

## 1. load the data to be used:

In [1]:
import pandas as pd 
import numpy as np
from sklearn.model_selection import train_test_split

In [2]:
df = pd.read_csv("diabetes.csv")

In [3]:
df.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [4]:
#Let's check if we have any wrong types to convert 
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 108 entries, 0 to 107
Data columns (total 9 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   Pregnancies               108 non-null    int64  
 1   Glucose                   108 non-null    int64  
 2   BloodPressure             108 non-null    int64  
 3   SkinThickness             108 non-null    int64  
 4   Insulin                   108 non-null    int64  
 5   BMI                       108 non-null    float64
 6   DiabetesPedigreeFunction  108 non-null    float64
 7   Age                       108 non-null    int64  
 8   Outcome                   108 non-null    int64  
dtypes: float64(2), int64(7)
memory usage: 7.7 KB


In [5]:
# Let's split our dataset
x = df.iloc[:,:-1].values
y = df.iloc[:,-1:].values
# Splitting dataset into train and test set
X_train, X_test, Y_train, Y_test = train_test_split(x, y, test_size = 0.2, random_state = 0 )

## 2. Logistic regression from scratch:


In [6]:
# We need to first define the logistic regression model 
class Logistic_regression_scratch() :
    def __init__( self, learning_rate, iterations ) :        
        self.learning_rate = learning_rate        
        self.iterations = iterations
          
    # Function for model training    
    def fit( self, X, Y ) :        
        # no_of_training_examples, no_of_features        
        self.m, self.n = X.shape        
        # weight initialization        
        self.W = np.zeros( self.n )        
        self.b = 0        
        self.X = X        
        self.Y = Y
          
        # gradient descent learning
        for i in range( self.iterations ) :            
            self.update_weights()            
        return self
      
    # Helper function to update weights in gradient descent
      
    def update_weights( self ) :           
        A = 1 / ( 1 + np.exp( - ( self.X.dot( self.W ) + self.b ) ) )
          
        # calculate gradients        
        tmp = ( A - self.Y.T )        
        tmp = np.reshape( tmp, self.m )        
        dW = np.dot( self.X.T, tmp ) / self.m         
        db = np.sum( tmp ) / self.m 
          
        # update weights    
        self.W = self.W - self.learning_rate * dW    
        self.b = self.b - self.learning_rate * db
          
        return self
      
    # Hypothetical function  h( x ) 
      
    def predict( self, X ) :    
        Z = 1 / ( 1 + np.exp( - ( X.dot( self.W ) + self.b ) ) )        
        Y = np.where( Z > 0.5, 1, 0 )        
        return Y

In [7]:
# let's calculate the accuracy 
def get_accuracy(y_pred,Y_test):   
    correctly_classified = 0
    for count in range( np.size( Y_pred ) ) :  

        if Y_test[count] == Y_pred[count] :            
            correctly_classified = correctly_classified + 1
    acc = ( correctly_classified / len(Y_pred) ) * 100
    print( "Accuracy on testset = {:.2f} ".format(acc))
    return acc


In [8]:
# let's train our model 
model = Logistic_regression_scratch( learning_rate = 0.01, iterations = 1000 )
model.fit( X_train, Y_train )    

# Prediction on test set
Y_pred = model.predict( X_test )    
  

In [9]:
scratch_acc = get_accuracy(Y_pred,Y_test)

Accuracy on testset = 54.55 


## 3. Logistic regression using SKlearn:

In [10]:
from sklearn.linear_model import LogisticRegression
sk_model = LogisticRegression()

In [11]:
sk_model.fit(X_train, Y_train)

  return f(*args, **kwargs)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression


LogisticRegression()

In [12]:
y_pred2 = sk_model.predict(X_test)

In [13]:
acc2 = get_accuracy(y_pred2,Y_test)

Accuracy on testset = 54.55 
