# Training a Shallow Neural Network
The following code implements a shallow neural network with backpropagation using low-level libraries and compares it with a model generated by Scikit-learn.

## 1 Data Loading & Cleaning
The data set contains credit card debt information about 10,000 customers and whether they defaulted or not.

In [1]:
# Importing libraries
import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler

In [2]:
# Loading the data
df = pd.read_csv('Default.csv')
df.head()

Unnamed: 0,default,student,balance,income
0,No,No,729.526495,44361.625074
1,No,Yes,817.180407,12106.1347
2,No,No,1073.549164,31767.138947
3,No,No,529.250605,35704.493935
4,No,No,785.655883,38463.495879


In [3]:
# Scaling and converting to NumPy arrays
df['default']=df['default'].apply(lambda x: 0 if x=='No' else 1)
df['student']=df['student'].apply(lambda x: 0 if x=='No' else 1)

In [4]:
df.head()

Unnamed: 0,default,student,balance,income
0,0,0,729.526495,44361.625074
1,0,1,817.180407,12106.1347
2,0,0,1073.549164,31767.138947
3,0,0,529.250605,35704.493935
4,0,0,785.655883,38463.495879


In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 4 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   default  10000 non-null  int64  
 1   student  10000 non-null  int64  
 2   balance  10000 non-null  float64
 3   income   10000 non-null  float64
dtypes: float64(2), int64(2)
memory usage: 312.6 KB


In [6]:
scaler = StandardScaler()
df[['balance','income']] = scaler.fit_transform(df[['balance','income']])
df

Unnamed: 0,default,student,balance,income
0,0,0,-0.218835,0.813187
1,0,1,-0.037616,-1.605496
2,0,0,0.492410,-0.131212
3,0,0,-0.632893,0.164031
4,0,0,-0.102791,0.370915
...,...,...,...,...
9995,0,0,-0.255990,1.460366
9996,0,0,-0.160044,-1.039014
9997,0,0,0.020751,1.883565
9998,0,0,1.516742,0.236363


In [7]:
Y = df['default'].to_numpy().reshape(-1,1)
X = df.drop(columns=['default']).to_numpy()

In [8]:
print("Shape of Y:",Y.shape)
print("Shape of X:",X.shape)

Shape of Y: (10000, 1)
Shape of X: (10000, 3)


In [9]:
X = X.T
Y = Y.T

print("Shape of Y:",Y.shape)
print("Shape of X:",X.shape)

Shape of Y: (1, 10000)
Shape of X: (3, 10000)


## 2 Training a Shallow Neural Network Using Scikit-learn
The following code trains a shallow neural network with 4 neurons in its hidden layer using scikit-learn.

In [10]:
from sklearn.neural_network import MLPClassifier

In [11]:
m = Y.shape[1]
mlp = MLPClassifier(hidden_layer_sizes=[4],activation='tanh',solver='sgd',alpha=0,learning_rate_init=0.01,max_iter=2000,batch_size=m,shuffle=False,momentum=0,verbose=True)
mlp.fit(X.T,Y.T)

  y = column_or_1d(y, warn=True)


Iteration 1, loss = 1.24796562
Iteration 2, loss = 1.23658263
Iteration 3, loss = 1.22534232
Iteration 4, loss = 1.21424296
Iteration 5, loss = 1.20328285
Iteration 6, loss = 1.19246025
Iteration 7, loss = 1.18177350
Iteration 8, loss = 1.17122089
Iteration 9, loss = 1.16080077
Iteration 10, loss = 1.15051149
Iteration 11, loss = 1.14035140
Iteration 12, loss = 1.13031889
Iteration 13, loss = 1.12041235
Iteration 14, loss = 1.11063018
Iteration 15, loss = 1.10097080
Iteration 16, loss = 1.09143265
Iteration 17, loss = 1.08201418
Iteration 18, loss = 1.07271385
Iteration 19, loss = 1.06353014
Iteration 20, loss = 1.05446156
Iteration 21, loss = 1.04550660
Iteration 22, loss = 1.03666380
Iteration 23, loss = 1.02793169
Iteration 24, loss = 1.01930883
Iteration 25, loss = 1.01079379
Iteration 26, loss = 1.00238515
Iteration 27, loss = 0.99408151
Iteration 28, loss = 0.98588150
Iteration 29, loss = 0.97778373
Iteration 30, loss = 0.96978686
Iteration 31, loss = 0.96188954
Iteration 32, los

In [12]:
print("\nWeights and biases")
print("W1:",mlp.coefs_[0].T)
print("b1:",mlp.intercepts_[0].reshape(-1,1))
print("W2:",mlp.coefs_[1].T)
print("b2:",mlp.intercepts_[1].reshape(-1,1))


Weights and biases
W1: [[ 0.70461747 -0.76116886 -0.22569531]
 [-0.1695387  -0.45054328 -0.26953396]
 [ 0.04849962  0.16145835 -0.74656021]
 [ 0.56635342  0.28708787  0.38779696]]
b1: [[ 0.61733431]
 [-0.04359204]
 [-0.21404018]
 [-0.13622335]]
W2: [[-1.1888251   0.23708056  0.51935621 -0.38401234]]
b2: [[-1.61228508]]


## 3 Training a Shallow Neural Network Using Backpropagation
The following code implements backpropagation to train a shallow neural network with 4 neurons in its hidden layer.

In [13]:
# Initialising the paramaters of the neural network
W1 = np.random.rand(4,3)
b1 = np.zeros((4,1))
W2 = np.random.rand(1,4)
b2 = np.zeros((1,1))

In [14]:
# Updating parameters using gradient descent
iter = 777
lr = 0.01
loss = np.arange(10,21)

for i in np.arange(iter):
    # Forward propagation
    Z1 = W1@X + b1
    A1 = np.tanh(Z1)
    Z2 = W2@A1 + b2
    A2 = 1/(1+np.exp(-Z2))

    # Back propagation
    dZ2 = A2-Y
    dW2 = 1/m*(dZ2@A1.T)
    db2 = 1/m*np.sum(dZ2,axis=1,keepdims=True)
    dZ1 = (W2.T@dZ2)*(1-np.tanh(Z1)**2)
    dW1 = 1/m*(dZ1@X.T)
    db1 = 1/m*np.sum(dZ1,axis=1,keepdims=True)

    W2 -= lr*dW2
    b2 -= lr*db2
    W1 -= lr*dW1
    b1 -= lr*db1

    current_loss = -1/m*(Y@np.log(A2).T+(1-Y)@np.log(1-A2).T)
    loss = np.append(loss,current_loss)
    loss = np.delete(loss,0)

print("Last iteration:",i+1)
print("Losses:",loss)

print("\nWeights and biases")
print("W1:",W1)
print("b1:",b1)
print("W2:",W2)
print("b2:",b2)

Last iteration: 777
Losses: [0.16626296 0.16614036 0.16601809 0.16589614 0.16577452 0.16565323
 0.16553226 0.16541161 0.16529127 0.16517126 0.16505157]

Weights and biases
W1: [[0.51426612 0.38412565 0.86649269]
 [0.66441821 0.2498951  0.76796333]
 [0.93864872 0.79024567 0.43656679]
 [0.70329638 0.8004334  0.13971921]]
b1: [[-0.1655859 ]
 [-0.6611979 ]
 [ 0.20686082]
 [-0.65955053]]
W2: [[ 0.07468513  0.64029642 -0.40869096  0.75230841]]
b2: [[-1.59494196]]


## 4 Conclusion
Similar values of loss from 2 & 3 for the same number of iterations indicates that the custom gradient descent implementation is correct. The weights and biases are different because the 2 models are randomly initialised during training and the loss function of the shallow neural network has multiple maximia and minima.