Implement Gradient Descent For Neural Network (or Logistic Regression)

Predicting if a person would buy life insurnace based on his age using logistic regression

In [1]:
import numpy as np
import tensorflow as tf
from tensorflow import keras
import pandas as pd
from matplotlib import pyplot as plt
%matplotlib inline

In [2]:
df = pd.read_csv('insurance_data.csv')
df.head()

Unnamed: 0,age,affordibility,bought_insurance
0,22,1,0
1,25,0,0
2,47,1,1
3,52,0,0
4,46,1,1


Split train and test set

In [3]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(df[['age','affordibility']],df.bought_insurance,test_size=0.2)

Preprocessing: Scale the data so that both age and affordibility are in same scaling range

In [4]:
X_train_scaled = X_train.copy()
X_train_scaled['age'] = X_train_scaled['age'] / 100

X_test_scaled = X_test.copy()
X_test_scaled['age'] = X_test_scaled['age'] / 100

Model Building: First build a model in keras/tensorflow and see what weights and bias values it comes up with. We will than try to reproduce same weights and bias in our plain python implementation of gradient descent. Below is the architecture of our simple neural network

In [6]:
model = keras.Sequential([
    keras.layers.Dense(1, input_shape=(2,), activation='sigmoid', kernel_initializer='ones', bias_initializer='zeros')
])

model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

model.fit(X_train_scaled, y_train, epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.src.callbacks.History at 0x21477ec0090>

Evaluate the model on test set

In [7]:
model.evaluate(X_test_scaled,y_test)



[1.0918368101119995, 0.1666666716337204]

In [8]:
model.predict(X_test_scaled)



array([[0.76833874],
       [0.6438246 ],
       [0.56308746],
       [0.7701051 ],
       [0.7805038 ],
       [0.56553376]], dtype=float32)

In [9]:
y_test

20    0
7     1
21    0
0     0
11    0
12    0
Name: bought_insurance, dtype: int64

Now get the value of weights and bias from the model

In [10]:
coef, intercept = model.get_weights()

In [11]:
coef, intercept

(array([[0.99500257],
        [0.9950031 ]], dtype=float32),
 array([-0.00499889], dtype=float32))

In [12]:
def sigmoid(x):
    import math
    return 1/(1+math.exp(-x))
sigmoid(18)

0.9999999847700205

In [13]:
X_test

Unnamed: 0,age,affordibility
20,21,1
7,60,0
21,26,0
0,22,1
11,28,1
12,27,0


Instead of model.predict, write our own prediction function that uses w1,w2 and bias

In [14]:
def prediction_function(age, affordibility):
    weighted_sum = coef[0]*age + coef[1]*affordibility + intercept
    return sigmoid(weighted_sum)

prediction_function(.47, 1)

0.811173816944548

In [15]:
prediction_function(.18, 1)

0.7629831470727845

Now we start implementing gradient descent in plain python. Again the goal is to come up with same w1, w2 and bias that keras model calculated. We want to show how keras/tensorflow would have computed these values internally using gradient descent

First write couple of helper routines such as sigmoid and log_loss

In [16]:
def sigmoid_numpy(X):
   return 1/(1+np.exp(-X))

sigmoid_numpy(np.array([12,0,1]))

array([0.99999386, 0.5       , 0.73105858])

In [17]:
def log_loss(y_true, y_predicted):
    epsilon = 1e-15
    y_predicted_new = [max(i,epsilon) for i in y_predicted]
    y_predicted_new = [min(i,1-epsilon) for i in y_predicted_new]
    y_predicted_new = np.array(y_predicted_new)
    return -np.mean(y_true*np.log(y_predicted_new)+(1-y_true)*np.log(1-y_predicted_new))

All right now comes the time to implement our final gradient descent function !! yay !!!

In [24]:
class myNN:
    def __init__(self):
        self.w1 = 1 
        self.w2 = 1
        self.bias = 0
        
    def fit(self, X, y, epochs, loss_thresold):
        self.w1, self.w2, self.bias = self.gradient_descent(X['age'],X['affordibility'],y, epochs, loss_thresold)
        print(f"Final weights and bias: w1: {self.w1}, w2: {self.w2}, bias: {self.bias}")
        
    def predict(self, X_test):
        weighted_sum = self.w1*X_test['age'] + self.w2*X_test['affordibility'] + self.bias
        return sigmoid_numpy(weighted_sum)

    def gradient_descent(self, age,affordability, y_true, epochs, loss_thresold):
        w1 = w2 = 1
        bias = 0
        rate = 0.5
        n = len(age)
        for i in range(epochs):
            weighted_sum = w1 * age + w2 * affordability + bias
            y_predicted = sigmoid_numpy(weighted_sum)
            loss = log_loss(y_true, y_predicted)
            
            w1d = (1/n)*np.dot(np.transpose(age),(y_predicted-y_true)) 
            w2d = (1/n)*np.dot(np.transpose(affordability),(y_predicted-y_true)) 

            bias_d = np.mean(y_predicted-y_true)
            w1 = w1 - rate * w1d
            w2 = w2 - rate * w2d
            bias = bias - rate * bias_d
            
            if i%50==0:
                print (f'Epoch:{i}, w1:{w1}, w2:{w2}, bias:{bias}, loss:{loss}')
            
            if loss<=loss_thresold:
                print (f'Epoch:{i}, w1:{w1}, w2:{w2}, bias:{bias}, loss:{loss}')
                break

        return w1, w2, bias

In [25]:
customModel = myNN()
customModel.fit(X_train_scaled, y_train, epochs=8000, loss_thresold=0.4631)

Epoch:0, w1:0.9843176051259479, w2:0.9782314810689069, bias:-0.07870011234405219, loss:0.6120670994658153
Epoch:50, w1:1.3312978063627392, w2:1.5261568349141035, bias:-1.1681984677212294, loss:0.5146253470224637
Epoch:100, w1:1.8172715370737973, w2:1.8845040018397852, bias:-1.6689810943237045, loss:0.48940563341318255
Epoch:150, w1:2.2935033503998845, w2:2.0633056350670764, bias:-2.0215257319986026, loss:0.47391556154730724
Epoch:197, w1:2.7215161849725575, w2:2.1519826794995778, bias:-2.278979703881067, loss:0.4629039176480335
Final weights and bias: w1: 2.7215161849725575, w2: 2.1519826794995778, bias: -2.278979703881067


In [26]:
coef, intercept

(array([[0.99500257],
        [0.9950031 ]], dtype=float32),
 array([-0.00499889], dtype=float32))

This shows that in the end we were able to come up with same value of w1,w2 and bias using a plain python implementation of gradient descent function

In [27]:
X_test_scaled

Unnamed: 0,age,affordibility
20,0.21,1
7,0.6,0
21,0.26,0
0,0.22,1
11,0.28,1
12,0.27,0


(1) Predict using custom model

In [28]:
customModel.predict(X_test_scaled)

20    0.609336
7     0.343876
21    0.172019
0     0.615795
11    0.653629
12    0.175930
dtype: float64

(2) Predict using tensorflow model

In [29]:
model.predict(X_test_scaled)



array([[0.76833874],
       [0.6438246 ],
       [0.56308746],
       [0.7701051 ],
       [0.7805038 ],
       [0.56553376]], dtype=float32)

Above you can compare predictions from our own custom model and tensoflow model. You will notice that predictions are almost same