Implement Gradient Descent For Neural Network (or Logistic Regression)

Predicting if a person would buy life insurnace based on his age using logistic regression

In [8]:
import numpy as np
import tensorflow as tf
from tensorflow import keras
import pandas as pd
from matplotlib import pyplot as plt
%matplotlib inline

In [9]:
df = pd.read_csv("insurance_data.csv")
df.head()

Unnamed: 0,age,affordibility,bought_insurance
0,22,1,0
1,25,0,0
2,47,1,1
3,52,0,0
4,46,1,1




Split train and test set


In [10]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(df[['age','affordibility']],df.bought_insurance,test_size=0.2)

Preprocessing: Scale the data so that both age and affordibility are in same scaling range

In [11]:
X_train_scaled = X_train.copy()
X_train_scaled['age'] = X_train_scaled['age'] / 100

X_test_scaled = X_test.copy()
X_test_scaled['age'] = X_test_scaled['age'] / 100

Model Building: First build a model in keras/tensorflow and see what weights and bias values it comes up with. We will than try to reproduce same weights and bias in our plain python implementation of gradient descent. Below is the architecture of our simple neural network

In [12]:
model = keras.Sequential([
    keras.layers.Dense(1, input_shape=(2,), activation='sigmoid', kernel_initializer='ones', bias_initializer='zeros')
])

model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

model.fit(X_train_scaled, y_train, epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.src.callbacks.History at 0x2017bf23a10>

Evaluate the model on test set

In [13]:
model.evaluate(X_test_scaled,y_test)



[0.7606955170631409, 0.5]

In [14]:
model.predict(X_test_scaled)



array([[0.5630873 ],
       [0.8002732 ],
       [0.8096446 ],
       [0.8096446 ],
       [0.7629826 ],
       [0.81570315]], dtype=float32)

In [15]:
y_test

21    0
22    1
27    0
4     1
10    0
24    1
Name: bought_insurance, dtype: int64

Now get the value of weights and bias from the model

In [16]:
coef, intercept = model.get_weights()

In [17]:
coef, intercept

(array([[0.9950018],
        [0.9950011]], dtype=float32),
 array([-0.00499922], dtype=float32))

In [18]:
def sigmoid(x):
        import math
        return 1 / (1 + math.exp(-x))
sigmoid(18)

0.9999999847700205

In [19]:
X_test

Unnamed: 0,age,affordibility
21,26,0
22,40,1
27,46,1
4,46,1
10,18,1
24,50,1


Instead of model.predict, write our own prediction function that uses w1,w2 and bias

In [20]:
def prediction_function(age, affordibility):
    weighted_sum = coef[0]*age + coef[1]*affordibility + intercept
    return sigmoid(weighted_sum)

prediction_function(.47, 1)

0.8111733969782455

In [21]:
prediction_function(.18, 1)

0.7629826728010388

Now we start implementing gradient descent in plain python. Again the goal is to come up with same w1, w2 and bias that keras model calculated. We want to show how keras/tensorflow would have computed these values internally using gradient descent

First write couple of helper routines such as sigmoid and log_loss

In [22]:
def sigmoid_numpy(X):
   return 1/(1+np.exp(-X))

sigmoid_numpy(np.array([12,0,1]))

array([0.99999386, 0.5       , 0.73105858])

In [23]:
def log_loss(y_true, y_predicted):
    epsilon = 1e-15
    y_predicted_new = [max(i,epsilon) for i in y_predicted]
    y_predicted_new = [min(i,1-epsilon) for i in y_predicted_new]
    y_predicted_new = np.array(y_predicted_new)
    return -np.mean(y_true*np.log(y_predicted_new)+(1-y_true)*np.log(1-y_predicted_new))

All right now comes the time to implement our final gradient descent function !! yay !!!

In [24]:
def gradient_descent(age, affordability, y_true, epochs, loss_thresold):
    w1 = w2 = 1
    bias = 0
    rate = 0.5
    n = len(age)
    for i in range(epochs):
        weighted_sum = w1 * age + w2 * affordability + bias
        y_predicted = sigmoid_numpy(weighted_sum)
        loss = log_loss(y_true, y_predicted)

        w1d = (1/n)*np.dot(np.transpose(age),(y_predicted-y_true)) 
        w2d = (1/n)*np.dot(np.transpose(affordability),(y_predicted-y_true)) 

        bias_d = np.mean(y_predicted-y_true)
        w1 = w1 - rate * w1d
        w2 = w2 - rate * w2d
        bias = bias - rate * bias_d

        print (f'Epoch:{i}, w1:{w1}, w2:{w2}, bias:{bias}, loss:{loss}')

        if loss<=loss_thresold:
            break

    return w1, w2, bias

In [25]:
gradient_descent(X_train_scaled['age'],X_train_scaled['affordibility'],y_train,1000, 0.4631)

Epoch:0, w1:0.9789255572476006, w2:0.9482987365959945, bias:-0.11346806616222031, loss:0.7030794182859887
Epoch:1, w1:0.963590595213761, w2:0.9056915375296681, bias:-0.21249333296360345, loss:0.6732963792582041
Epoch:2, w1:0.9535017133724556, w2:0.8715428574986177, bias:-0.2983752826168001, loss:0.6513085857923913
Epoch:3, w1:0.9480717785959931, w2:0.8449931706408026, bias:-0.37261875521453625, loss:0.6353178125877158
Epoch:4, w1:0.9466834429407487, w2:0.825079155915613, bias:-0.43678043861765936, loss:0.6237664421419936
Epoch:5, w1:0.9487369079929006, w2:0.8108293944697493, bias:-0.49235635719393195, loss:0.61540188144373
Epoch:6, w1:0.9536799215257891, w2:0.8013282318458299, bias:-0.540713333395793, loss:0.6092716375223087
Epoch:7, w1:0.9610227546708602, w2:0.7957510533517764, bias:-0.5830566053661478, loss:0.6046810569493173
Epoch:8, w1:0.9703425213294389, w2:0.7933784177731138, bias:-0.6204225402172903, loss:0.6011389466988173
Epoch:9, w1:0.9812809188679797, w2:0.7935965904614277, 

(3.7940414031207776, 1.3655715053240072, -2.363988659720271)

In [26]:
coef, intercept

(array([[0.9950018],
        [0.9950011]], dtype=float32),
 array([-0.00499922], dtype=float32))