# Backpropagation

Before the introduction of Hidden Layers we found the eror with respect to that of the output layer. Now that we have a a hidden layer we follow the same step of using the chain rule to find the error with respect to weigts connecting the input layer to the hidden layer.

Since we know the error at the output we work backwards to the hidden layer. It is like flipping the network over.

Lets assume that we have an error
\begin{equation*}
\delta^{0}_k
\end{equation*}

attribute to each output of unit k.
The error attributed to hidden later j is the output error scaled by the weights between the output and hidden layers given by

\begin{equation*}
\delta^{0}_k = \sum W_{jk}\delta^{0}_kf^|(h_j)
\end{equation*}

Now the gradient descent step would be the same but with a different error
\begin{equation*}
\Lambda w_{ij} = \eta\delta^h_jx_i
\end{equation*}



Here wij represents the weight between the inputs and hidden layer and xi the inpyt values. n the learning rate.

We can represent this also as

\begin{equation*}
\Lambda w_{ij} = \eta\delta_{output}V_{in}
\end{equation*}



Where delta output represents the output errors.
Vin represents the inputs to the layer.

Lets see this with and example

In [1]:
import numpy as np

In [2]:
def sigmoid(x):
    return 1/(1+np.exp(-x))


In [3]:
x = np.array([0.5,0.1,-0.2])
target = 0.6
learnrate = 0.5

In [4]:
weights_input_hidden = np.array([[0.5, -0.6],
                                 [0.1, -0.2],
                                 [0.1, 0.7]])

weights_hidden_output = np.array([0.1, -0.3])

Now that we have all the essentals we can move on to the fowardpass

In [5]:
hidden_layer_input = np.dot(x, weights_input_hidden)
hidden_layer_output = sigmoid(hidden_layer_input)

In [6]:
output_layer_in = np.dot(hidden_layer_output, weights_hidden_output)
output = sigmoid(output_layer_in)

Lets pause and take a step back now

In [7]:
error = target - output


In [8]:
output_error_term = error*output*(1-output)

In [9]:
hidden_error_term = np.dot(output_error_term,weights_hidden_output)* \
                    hidden_layer_output * (1 - hidden_layer_output)

In [10]:
delta_w_h_o = learnrate * output_error_term * hidden_layer_output

In [11]:
delta_w_i_h = learnrate * hidden_error_term * x[:, None]


In [12]:
print(delta_w_h_o)
print()
print(delta_w_i_h)

[0.00804047 0.00555918]

[[ 1.77005547e-04 -5.11178506e-04]
 [ 3.54011093e-05 -1.02235701e-04]
 [-7.08022187e-05  2.04471402e-04]]


## Implementing in Gradient Descent

Now that we have implemented backpropogation in one step let us see how we can implement the same in gradient descent.

For this we need to follow the following steps

1. Set the weight step for each layer to zero
    1. The input to the hidden weight = 0
    2. The hidden to output weights = 0
2. For each record in the training data:
    1. Make a foward pass through the network and calculate the output.
    2. Caluculate the error gradient in the output unit.
    3. Propogate the error to the hidden layer.
    4. Update the weights steps:
        * \begin{equation*}
            \Lambda W_{j} = \Lambda W_{j} + \delta^0a_j 
            \end{equation*}
        * \begin{equation*}
                        \Lambda w_{ij} = \Lambda w_{ij} + \delta^h_ja_i
            \end{equation*}
3. Repeat for e epochs


## Data Clean up

In [13]:
import numpy as np
import pandas as pd

admissions = pd.read_csv('binary.csv')

# Make dummy variables for rank
data = pd.concat([admissions, pd.get_dummies(admissions['rank'], prefix='rank')], axis=1)
data = data.drop('rank', axis=1)

# Standarize features
for field in ['gre', 'gpa']:
    mean, std = data[field].mean(), data[field].std()
    data.loc[:,field] = (data[field]-mean)/std
    
# Split off random 10% of the data for testing
np.random.seed(21)
sample = np.random.choice(data.index, size=int(len(data)*0.9), replace=False)
data, test_data = data.ix[sample], data.drop(sample)

# Split into features and targets
features, targets = data.drop('admit', axis=1), data['admit']
features_test, targets_test = test_data.drop('admit', axis=1), test_data['admit']

AttributeError: 'DataFrame' object has no attribute 'ix'

In [None]:
def sigmoid(x):
    return 1/(1 + np.exp(-x))

In [None]:
print(pd.__version__)
! pip install pandas=0.25.1

In [None]:
# Hyperparameters
n_hidden = 2  # number of hidden units
epochs = 900
learnrate = 0.005

n_records, n_features = features.shape
last_loss = None

In [None]:
# Initialize weights
weights_input_hidden = np.random.normal(scale=1 / n_features ** .5,
                                        size=(n_features, n_hidden))
weights_hidden_output = np.random.normal(scale=1 / n_features ** .5,
                                         size=n_hidden)

In [None]:
for e in range(epochs):
    del_w_input_hidden = np.zeros(weights_input_hidden.shape)
    del_w_hidden_output = np.zeros(weights_hidden_output.shape)
    for x, y in zip(features.values, targets):
        ## Forward pass ##
        # TODO: Calculate the output
        hidden_input = np.dot(x,weights_input_hidden)
        hidden_output = sigmoid(hidden_input)
        output = sigmoid(np.dot(hidden_output,weights_hidden_output))

        ## Backward pass ##
        # TODO: Calculate the network's prediction error
        error =  y - output

        output_error_term = error * output * (1-output)
        hidden_error = np.dot(output_error_term,weights_hidden_output)
        hidden_error_term = hidden_error * hidden_output * (1-hidden_output)
        
        # TODO: Update the change in weights
        del_w_hidden_output += output_error_term * hidden_output
        del_w_input_hidden += hidden_error_term * x[:,None]

    # TODO: Update weights  (don't forget to division by n_records or number of samples)
    weights_input_hidden += learnrate * del_w_input_hidden / n_records
    weights_hidden_output += learnrate * del_w_hidden_output / n_records

    # Printing out the mean square error on the training set
    if e % (epochs / 10) == 0:
        hidden_output = sigmoid(np.dot(x, weights_input_hidden))
        out = sigmoid(np.dot(hidden_output,
                             weights_hidden_output))
        loss = np.mean((out - targets) ** 2)

        if last_loss and last_loss < loss:
            print("Train loss: ", loss, "  WARNING - Loss Increasing")
        else:
            print("Train loss: ", loss)
        last_loss = loss

# Calculate accuracy on test data
hidden = sigmoid(np.dot(features_test, weights_input_hidden))
out = sigmoid(np.dot(hidden, weights_hidden_output))
predictions = out > 0.5
accuracy = np.mean(predictions == targets_test)
print("Prediction accuracy: {:.3f}".format(accuracy))
