# Performance evaluation

I found that adapting the logic to 4 layers with a more complex dataset was not difficult, but I continuously
got some poor results, consistently got poor results. Firstly during the activation function,
we continuously ran into overflow errors, this was solved quite easily by normalize our data which was too
large for our code for several inputs.

Secondly and most importantly, the data didn't seem to actually improve until I drastically scaled down our
change in weights through the introduction of a learning rate. We were likely originally overshooting our optimum
weights at every update step, and getting stuck in a pit where our NN suddendly become way too confident by
jumping to very large weights, and therefore through our error scaling, did not want to budge much from that point
on.

After introducing both of these fixes though, we find that our NN performes very well on our heart disease
dataset

# Code for heart disease dataset
This is my attempt at coding a machine learning algorithm from scratch to predict the probability of
heart disease using the dataset from https://www.kaggle.com/datasets/mexwell/heart-disease-dataset

## Initialize the data

In [1]:
import numpy as np
import pandas as pd

np.random.seed(1)

In [2]:
data = pd.read_csv('../data/heart_statlog_cleveland_hungary_final.csv')

In [3]:
data

Unnamed: 0,age,sex,chest pain type,resting bp s,cholesterol,fasting blood sugar,resting ecg,max heart rate,exercise angina,oldpeak,ST slope,target
0,40,1,2,140,289,0,0,172,0,0.0,1,0
1,49,0,3,160,180,0,0,156,0,1.0,2,1
2,37,1,2,130,283,0,1,98,0,0.0,1,0
3,48,0,4,138,214,0,0,108,1,1.5,2,1
4,54,1,3,150,195,0,0,122,0,0.0,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...
1185,45,1,1,110,264,0,0,132,0,1.2,2,1
1186,68,1,4,144,193,1,0,141,0,3.4,2,1
1187,57,1,4,130,131,0,0,115,1,1.2,2,1
1188,57,0,2,130,236,0,2,174,0,0.0,2,1


In [4]:
X = np.array(data.drop(columns = ['target']))

In [5]:
X

array([[40. ,  1. ,  2. , ...,  0. ,  0. ,  1. ],
       [49. ,  0. ,  3. , ...,  0. ,  1. ,  2. ],
       [37. ,  1. ,  2. , ...,  0. ,  0. ,  1. ],
       ...,
       [57. ,  1. ,  4. , ...,  1. ,  1.2,  2. ],
       [57. ,  0. ,  2. , ...,  0. ,  0. ,  2. ],
       [38. ,  1. ,  3. , ...,  0. ,  0. ,  1. ]])

In [6]:
y = np.array(pd.DataFrame(data['target']))

In [7]:
y

array([[0],
       [1],
       [0],
       ...,
       [1],
       [1],
       [0]], dtype=int64)

## Normalize the data

In [None]:
X = (X - X.min(0)) / X.ptp(0)

## Initialize weights

In [8]:
weights0 = 2 * np.random.rand(11, 8) - 1 # 11 Inputs into 8 hidden layer neurons
weights1 = 2 * np.random.rand(8, 1) - 1 # 8 hidden layer neurons into 1 output neuron

## Define non-linear activation function

In [9]:
# If deriv = false, x must be output of sigmoid for correct derivative
def Sigmoid(x, deriv = False):
    if(deriv):
        return x * (1-x)
    return 1 / (1+np.exp(-x))

## Make prediction

In [10]:
l0 = X
l1 = Sigmoid(np.dot(X, weights0)) # Each row represents one data point, each column 1 hidden neuron
l2 = Sigmoid(np.dot(l1, weights1)) # Each row represents estimated probability of heart failure

## Backpropagation

In [11]:
l2_error = y - l2 # Global error
l2_delta = l2_error * Sigmoid(l2, deriv = True) # Error scaled inversely with confidence

l1_error = np.dot(l2_delta, weights1.T) # Error contribution of l1, each row is one data point, each column 1 neuron
l1_delta = l1_error * Sigmoid(l1, deriv = True)

weights1 += np.dot(l1.T, l2_delta) # change by sum of (Input * Scaled-error)
weights0 += np.dot(l0.T, l1_delta)

# Full Code

In [25]:
import numpy as np
import pandas as pd

data = pd.read_csv('../data/heart_statlog_cleveland_hungary_final.csv')
X = np.array(data.drop(columns = ['target']))
X = (X - X.min(0)) / X.ptp(0)
y = np.array(pd.DataFrame(data['target']))

np.random.seed(1)

weights0 = 2 * np.random.rand(11, 8) - 1 # 11 Inputs into 8 hidden layer neurons
weights1 = 2 * np.random.rand(8, 8) - 1 # 8 hidden layer neurons into 8 hidden layer neurons
weights2 = 2 * np.random.rand(8, 1) - 1

# If deriv = false, x must be output of sigmoid for correct derivative
def Sigmoid(x, deriv = False):
    if(deriv):
        return x * (1-x)
    return 1 / (1+np.exp(-x))

learning_rate = 0.01

for j in range(60000):
    l0 = X
    l1 = Sigmoid(np.dot(X, weights0)) # Each row represents one data point, each column 1 hidden neuron
    l2 = Sigmoid(np.dot(l1, weights1)) # Each row represents one data point, each column 1 hidden neuron
    l3 = Sigmoid(np.dot(l2, weights2)) # Each row represents estimated probability of heart failure
    
    l3_error = y - l3 # Global error
    if (j % 5000) == 0:
        print(np.mean(np.abs(l3_error)))
    l3_delta = l3_error * Sigmoid(l3, deriv = True) # Error scaled inversely with confidence
    
    l2_error = np.dot(l3_delta, weights2.T) # Error contribution of l2, each row is one data point, each column 1 neuron
    l2_delta = l2_error * Sigmoid(l2, deriv = True)

    l1_error = np.dot(l2_delta, weights1.T) # Error contribution of l2, each row is one data point, each column 1 neuron
    l1_delta = l1_error * Sigmoid(l1, deriv = True)

    # change by sum of (Input * Scaled-error)
    weights2 += learning_rate * np.dot(l2.T, l3_delta)
    weights1 += learning_rate * np.dot(l1.T, l2_delta) 
    weights0 += learning_rate * np.dot(l0.T, l1_delta)

0.48611925259154537
0.14876355531697366
0.11195342676674461
0.09416482814483164
0.08381843936216314
0.08104814581689802
0.0758853105643381
0.08149091626322562
0.06182989881075459
0.05776433855307814
0.05457669699531732
0.05410418256510413
