# Implementing Backpropagation

Now we've seen that the error term for the output layer is:

$\delta_{k} = (y_{k} - \hat{y}_{k}) f'(a_{k})$

and the error term for the hidden layer is

$\delta_{j} = \sum (w_{jk} - \delta_{k}) f'(h_{j})$

For now we'll only consider a simple network with one hidden layer and one output unit. Here's the general algorithm for updating the weights with backpropagation:

1. Set the weights steps for each layer to zero
    - The input to hidden weights $\Delta w_{ij} = 0$
    - The hidden to output weights $\Delta W_j = 0$


2. For each record in the training data:
    - Make a foward pass through the network, calculating the output $\hat{y}$
    - Calculate the error gradient in the output unit, $\delta^o = (y - \hat y) f'(z)$ where $z = \sum_j W_j a_j$, the input to the output unit.
    - Propagate the errors to the hidden layer $\delta^h_j = \delta^o W_j f'(h_j)$
    - Update the weight steps:
        - $ΔW_j = ΔW_j +δ^o * a_j$
        - $Δw_{ij} = Δw_{ij} + δ^h_j * a_i$
        
        
3. Update the weights, where $\eta$ is the learning rate and mm is the number of records:

    - $W_j = W_j + \eta \Delta W_j/m$
    - $w_{ij} = w_{ij} + \eta \Delta w_{ij}/m$
    
    
4. Repeat 1-3 For $e$ epochs

In [21]:
import numpy as np
import pandas as pd

In [22]:
# Read Data
admissions = pd.read_csv('intro_to_neural_network_2.csv')

In [23]:
# Make dummy variables for rank, one hot encoding
data = pd.concat([admissions, pd.get_dummies(admissions['rank'], prefix='rank')], axis=1)
data = data.drop('rank', axis=1)

# Standarize features to zero mean and a standard deviation of 1
for field in ['gre', 'gpa']:
    mean, std = data[field].mean(), data[field].std()
    data.loc[:,field] = (data[field]-mean)/std
    
print(data[:10])

   admit       gre       gpa  rank_1  rank_2  rank_3  rank_4
0      0 -1.798011  0.578348       0       0       1       0
1      1  0.625884  0.736008       0       0       1       0
2      1  1.837832  1.603135       1       0       0       0
3      1  0.452749 -0.525269       0       0       0       1
4      0 -0.586063 -1.208461       0       0       0       1
5      1  1.491561 -1.024525       0       1       0       0
6      1 -0.239793 -1.077078       1       0       0       0
7      0 -1.624876 -0.814312       0       1       0       0
8      1 -0.412928  0.000263       0       0       1       0
9      0  0.972155  1.392922       0       1       0       0


In [24]:
# Split 90% to Training and 10% Testing
np.random.seed(21)
sample = np.random.choice(data.index, size=int(len(data)*0.9), replace=False)
data, test_data = data.loc[sample], data.drop(sample)

# Split into features and targets
features, targets = data.drop('admit', axis=1), data['admit']
features_test, targets_test = test_data.drop('admit', axis=1), test_data['admit']

print(features[:5])

print ("\n Targets: \n")

print(targets[:5])

          gre       gpa  rank_1  rank_2  rank_3  rank_4
106  0.972155  0.446965       1       0       0       0
9    0.972155  1.392922       0       1       0       0
61  -0.239793 -0.183673       0       0       0       1
224  1.837832 -1.287291       0       1       0       0
37  -0.586063 -1.287291       0       0       1       0

 Targets: 

106    1
9      0
61     0
224    0
37     0
Name: admit, dtype: int64


In [25]:
# Helper Functions

def sigmoid(x): # x is dot product of weights and inputs
    return 1/(1 + np.exp(-x))

In [26]:
np.random.seed(21)

#Hyper Parameters
n_hidden = 2 # Number of Hidden Units
epochs = 900
learnrate = 0.005

# Dimension of Neural Network
n_records, n_features = features.shape #(Row, Column)
last_loss = None

#Initialize Random Weights
weights_input_hidden = np.random.normal(scale=1 / n_features ** .5,
                                        size=(n_features, n_hidden))
weights_hidden_output = np.random.normal(scale=1 / n_features ** .5,
                                         size=n_hidden)

for e in range(epochs):
    
    #Initialize delta weights as zeros
    del_w_input_hidden = np.zeros(weights_input_hidden.shape)
    del_w_hidden_output = np.zeros(weights_hidden_output.shape)
    
    # For each entry
    for x, y in zip(features.values, targets):
        
        ## Foward Pass
        
        # Calculate Output
        hidden_input = np.dot(x, weights_input_hidden)
        hidden_output = sigmoid(hidden_input)
        output = sigmoid(np.dot(hidden_output, weights_hidden_output))
        
        ## Backward Pass
        
        # Calculate the Error of Prediction
        error = y - output

        # Calculate error term for the output unit
        output_error_term = error * output * (1 - output)

        # Calculate the hidden layer's contribution to the error
        hidden_error = np.dot(output_error_term, weights_hidden_output)
        
        # Calculate the error term for the hidden layer
        hidden_error_term = hidden_error * hidden_output * (1 -hidden_output)
        
        # Update the change in weights
        del_w_hidden_output += output_error_term * hidden_output
        del_w_input_hidden += hidden_error_term * x[:,None] 

    # Update weights  (don't forget to division by n_records or number of samples)
    weights_input_hidden += learnrate * del_w_input_hidden / n_records
    weights_hidden_output += learnrate * del_w_hidden_output / n_records

    # Printing out the mean square error on the training set
    if e % (epochs / 10) == 0:
        
        # Recalculate for Printing Purposes
        hidden_output = sigmoid(np.dot(x, weights_input_hidden))
        out = sigmoid(np.dot(hidden_output,
                             weights_hidden_output))
        
        #mean square error 
        loss = np.mean((out - targets) ** 2)

        if last_loss and last_loss < loss:
            print("Train loss: ", loss, "  WARNING - Loss Increasing")
        else:
            print("Train loss: ", loss)
        last_loss = loss

# Calculate accuracy on test data
hidden = sigmoid(np.dot(features_test, weights_input_hidden))
out = sigmoid(np.dot(hidden, weights_hidden_output))
predictions = out > 0.5
accuracy = np.mean(predictions == targets_test)
print("Prediction accuracy: {:.3f}".format(accuracy))


Train loss:  0.25135725242598617
Train loss:  0.24996540718842886
Train loss:  0.24862005218904654
Train loss:  0.24731993217179746
Train loss:  0.24606380465584848
Train loss:  0.24485044179257162
Train loss:  0.2436786320186832
Train loss:  0.24254718151769536
Train loss:  0.24145491550165465
Train loss:  0.24040067932493367
Prediction accuracy: 0.725
