Implement Gradient Descent For Neural Network (or Logistic Regression)

Predicting if a person would buy life insurnace based on his age using logistic regression

In [1]:
import numpy as np
import tensorflow as tf
from tensorflow import keras
import pandas as pd
from matplotlib import pyplot as plt
%matplotlib inline

In [2]:
df = pd.read_csv("insurance_data.csv")
df.head()

Unnamed: 0,age,affordibility,bought_insurance
0,22,1,0
1,25,0,0
2,47,1,1
3,52,0,0
4,46,1,1




Split train and test set


In [3]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(df[['age','affordibility']],df.bought_insurance,test_size=0.2)

Preprocessing: Scale the data so that both age and affordibility are in same scaling range

In [4]:
X_train_scaled = X_train.copy()
X_train_scaled['age'] = X_train_scaled['age'] / 100

X_test_scaled = X_test.copy()
X_test_scaled['age'] = X_test_scaled['age'] / 100

Model Building: First build a model in keras/tensorflow and see what weights and bias values it comes up with. We will than try to reproduce same weights and bias in our plain python implementation of gradient descent. Below is the architecture of our simple neural network

In [5]:
model = keras.Sequential([
    keras.layers.Dense(1, input_shape=(2,), activation='sigmoid', kernel_initializer='ones', bias_initializer='zeros')
])

model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

model.fit(X_train_scaled, y_train, epochs=5)

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Epoch 1/5
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 890ms/step - accuracy: 0.4545 - loss: 0.7334
Epoch 2/5
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 62ms/step - accuracy: 0.4545 - loss: 0.7330
Epoch 3/5
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 51ms/step - accuracy: 0.4545 - loss: 0.7326
Epoch 4/5
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 52ms/step - accuracy: 0.4545 - loss: 0.7321
Epoch 5/5
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 50ms/step - accuracy: 0.4545 - loss: 0.7317


<keras.src.callbacks.history.History at 0x24792e6c150>

Evaluate the model on test set

In [6]:
model.evaluate(X_test_scaled,y_test)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 165ms/step - accuracy: 0.6667 - loss: 0.6505


[0.6504672169685364, 0.6666666865348816]

In [7]:
model.predict(X_test_scaled)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 56ms/step


array([[0.8245093 ],
       [0.76298267],
       [0.8157031 ],
       [0.8096446 ],
       [0.8216113 ],
       [0.8096446 ]], dtype=float32)

In [8]:
y_test

5     1
10    0
24    1
4     1
25    1
27    0
Name: bought_insurance, dtype: int64

Now get the value of weights and bias from the model

In [9]:
coef, intercept = model.get_weights()

In [10]:
coef, intercept

(array([[0.9950013],
        [0.995001 ]], dtype=float32),
 array([-0.00499932], dtype=float32))

In [11]:
def sigmoid(x):
        import math
        return 1 / (1 + math.exp(-x))
sigmoid(18)

0.9999999847700205

In [12]:
X_test

Unnamed: 0,age,affordibility
5,56,1
10,18,1
24,50,1
4,46,1
25,54,1
27,46,1


Instead of model.predict, write our own prediction function that uses w1,w2 and bias

In [13]:
def prediction_function(age, affordibility):
    weighted_sum = coef[0]*age + coef[1]*affordibility + intercept
    return sigmoid(weighted_sum)

prediction_function(.47, 1)

  return 1 / (1 + math.exp(-x))


0.8111733421999793

In [14]:
prediction_function(.18, 1)

  return 1 / (1 + math.exp(-x))


0.7629826512432166

Now we start implementing gradient descent in plain python. Again the goal is to come up with same w1, w2 and bias that keras model calculated. We want to show how keras/tensorflow would have computed these values internally using gradient descent

First write couple of helper routines such as sigmoid and log_loss

In [15]:
def sigmoid_numpy(X):
   return 1/(1+np.exp(-X))

sigmoid_numpy(np.array([12,0,1]))

array([0.99999386, 0.5       , 0.73105858])

In [16]:
def log_loss(y_true, y_predicted):
    epsilon = 1e-15
    y_predicted_new = [max(i,epsilon) for i in y_predicted]
    y_predicted_new = [min(i,1-epsilon) for i in y_predicted_new]
    y_predicted_new = np.array(y_predicted_new)
    return -np.mean(y_true*np.log(y_predicted_new)+(1-y_true)*np.log(1-y_predicted_new))

All right now comes the time to implement our final gradient descent function !! yay !!!

In [17]:
def gradient_descent(age, affordability, y_true, epochs, loss_thresold):
    w1 = w2 = 1
    bias = 0
    rate = 0.5
    n = len(age)
    for i in range(epochs):
        weighted_sum = w1 * age + w2 * affordability + bias
        y_predicted = sigmoid_numpy(weighted_sum)
        loss = log_loss(y_true, y_predicted)

        w1d = (1/n)*np.dot(np.transpose(age),(y_predicted-y_true)) 
        w2d = (1/n)*np.dot(np.transpose(affordability),(y_predicted-y_true)) 

        bias_d = np.mean(y_predicted-y_true)
        w1 = w1 - rate * w1d
        w2 = w2 - rate * w2d
        bias = bias - rate * bias_d

        print (f'Epoch:{i}, w1:{w1}, w2:{w2}, bias:{bias}, loss:{loss}')

        if loss<=loss_thresold:
            break

    return w1, w2, bias

In [18]:
gradient_descent(X_train_scaled['age'],X_train_scaled['affordibility'],y_train,1000, 0.4631)

Epoch:0, w1:0.9730107808384282, w2:0.9448358772697057, bias:-0.1297635684840324, loss:0.733398669288911
Epoch:1, w1:0.9523123704874401, w2:0.8993433767269012, bias:-0.24291470236225088, loss:0.6949990436123248
Epoch:2, w1:0.9373944591510792, w2:0.8629451763708244, bias:-0.34088139295067477, loss:0.6666256014135518
Epoch:3, w1:0.9276267308727877, w2:0.8347747513761317, bias:-0.42537054048700623, loss:0.6460202208408456
Epoch:4, w1:0.9223355641927409, w2:0.813819777827982, bias:-0.4981722645765692, loss:0.6311935366113732
Epoch:5, w1:0.9208621274173802, w2:0.799039468429279, bias:-0.5610158544198987, loss:0.620530613146006
Epoch:6, w1:0.92259867515373, w2:0.7894436952684669, bias:-0.6154826989758739, loss:0.612797075390602
Epoch:7, w1:0.9270064314976109, w2:0.7841369663598711, bias:-0.6629658579691459, loss:0.6070898816753098
Epoch:8, w1:0.933620625363704, w2:0.7823364760902722, bias:-0.7046613775739041, loss:0.6027688433521677
Epoch:9, w1:0.9420478069825752, w2:0.7833737107592609, bias:

(3.880241356748665, 1.464159011652884, -2.541404715196541)

In [19]:
coef, intercept

(array([[0.9950013],
        [0.995001 ]], dtype=float32),
 array([-0.00499932], dtype=float32))