Implement Gradient Descent For Neural Network(or logestic Regression)

Predicting if a person would buy life insurace based on his age using logistic regression

In [1]:
import numpy as np
import tensorflow as tf
from tensorflow import keras
import pandas as pd
from matplotlib import pyplot as plt
%matplotlib inline

In [2]:
df = pd.read_csv('insurance_data.csv')
df.head()

Unnamed: 0,age,affordibility,bought_insurance
0,22,1,0
1,25,0,0
2,47,1,1
3,52,0,0
4,46,1,1


Split train and test set

In [3]:
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test = train_test_split(df[['age','affordibility']],df.bought_insurance,test_size=0.2)

Preprocessing: Scale the data so that both age and affordibility are in same scaling range

In [4]:
x_train_scaled = x_train.copy()
x_train_scaled['age'] = x_train_scaled['age'] / 100

x_test_scaled = x_test.copy()
x_test_scaled['age'] = x_test_scaled['age'] / 100

Mode Building: First build a mode in keras/ tensorflow and see what weights and bias values it comes up with. We will than try to reproduce same weights and bias in our plain python implementatio of gradient descent. Below is the architecture of our simple neural neural network.

In [8]:
model = keras.Sequential([
    keras.layers.Dense(1, input_shape=(2,), activation='sigmoid', kernel_initializer='ones', bias_initializer='zeros')
])

model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

model.fit(x_train_scaled, y_train, epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.src.callbacks.History at 0x293d4724b50>

Evaluate the model on test set

In [9]:
model.evaluate(x_test_scaled,y_test)



[0.42845407128334045, 0.6666666865348816]

In [10]:
model.predict(x_test_scaled)



array([[0.8081063 ],
       [0.5458833 ],
       [0.83159196],
       [0.62536925],
       [0.81420255],
       [0.8216112 ]], dtype=float32)

In [11]:
y_test

23    1
18    0
9     1
3     0
14    1
25    1
Name: bought_insurance, dtype: int64

Now get the value of weights and biias from the model

In [12]:
coef, intercept = model.get_weights()

In [13]:
coef, intercept

(array([[0.9950013],
        [0.9950007]], dtype=float32),
 array([-0.00499935], dtype=float32))

In [14]:
def sigmoid(x):
    import math
    return 1 / (1+math.exp(-x))
sigmoid(18)

0.9999999847700205

In [15]:
x_test

Unnamed: 0,age,affordibility
23,45,1
18,19,0
9,61,1
3,52,0
14,49,1
25,54,1


Instead of model.predict, write our own prediction function that uses w1,w2 and bias

In [16]:
def prediction_function(age,affordibility):
    weighted_sum = coef[0]*age+coef[1]*affordibility + intercept
    return sigmoid(weighted_sum)
prediction_function(.47,1)

0.8111732874217008

In [17]:
prediction_function(.18,1)

0.7629825865697422

Now we start implementing gradient descent in plain python. Again the goal is to come up with same w1,w2 and bias that keras model calculated. We want to show how keras / tensorflow would have computed these values internally using gradient descent

First write couple of helper routines such as sigmoid and log_loss

In [18]:
def sigmoid_numpy(x):
    return 1/(1+np.exp(-x))
sigmoid_numpy(np.array([12,0,1]))

array([0.99999386, 0.5       , 0.73105858])

In [20]:
def log_loss(y_true,y_predicted):
    epsilon = 1e-15
    y_predicted_new = [max(i,epsilon) for i in y_predicted]
    y_predicted_new = [min(i,1-epsilon) for i in y_predicted_new]
    y_predicted_new = np.array(y_predicted_new)
    return -np.mean(y_true*np.log(y_predicted_new)+(1-y_true)*np.log(1-y_predicted_new))

All right now comes the time to implement our final gradient descent function !! yay !!

In [21]:
def gradient_descent(age,affordability,y_true,epochs,loss_thresold):
    w1 = w2 = 1
    bias = 0
    rate = 0.5
    n = len(age)
    for i in range(epochs):
        weighted_sum = w1 * age + w2 * affordability + bias
        y_predicted = sigmoid_numpy(weighted_sum)
        loss = log_loss(y_true,y_predicted)
        
        w1d = (1/n)*np.dot(np.transpose(age),(y_predicted-y_true))
        w2d = (1/n)*np.dot(np.transpose(affordability),(y_predicted-y_true))
        
        bias_d = np.mean(y_predicted - y_true)
        w1 = w2 - rate * w1d
        w2 = w2 - rate * bias_d
        
        print(f'Epoch:{i},w1:{w1},w2:{w2},bias:{bias},loss:{loss}')
        
        if loss<=loss_thresold:
            break
    return w1, w2, bias

In [26]:
gradient_descent(x_train_scaled['age'],x_train_scaled['affordibility'],y_train,1000, 0.4631)

Epoch:0,w1:0.9711250444579651,w2:0.8611921700372231,bias:0,loss:0.7944138124272032
Epoch:1,w1:0.8357427562762106,w2:0.731274501087097,bias:0,loss:0.768771411880678
Epoch:2,w1:0.7110836180760427,w2:0.6144748296032445,bias:0,loss:0.7426146930489979
Epoch:3,w1:0.5994686409298327,w2:0.5105197414225773,bias:0,loss:0.7228931633383879
Epoch:4,w1:0.5004502547877832,w2:0.4187315153108265,bias:0,loss:0.7087162744102495
Epoch:5,w1:0.4132520971442131,w2:0.3382103441538193,bias:0,loss:0.6990189132119904
Epoch:6,w1:0.3369161015815538,w2:0.26793109307155144,bias:0,loss:0.6927988723634777
Epoch:7,w1:0.2703938983061735,w2:0.20682543984011292,bias:0,loss:0.689181120590792
Epoch:8,w1:0.21262073331333475,w2:0.15384363963398762,bias:0,loss:0.6874436281766326
Epoch:9,w1:0.1625684084963438,w2:0.10799520656732486,bias:0,loss:0.6870165833345155
Epoch:10,w1:0.11927837000521287,w2:0.06837179987475021,bias:0,loss:0.6874661330606838
Epoch:11,w1:0.08187890963459504,w2:0.03415696204958412,bias:0,loss:0.6884716508472

(-0.15424328598453202, -0.18182360914041498, 0)

In [27]:
coef, intercept

(array([[0.9950013],
        [0.9950007]], dtype=float32),
 array([-0.00499935], dtype=float32))