# Backpropagation

This notebook was created as part of udacity deep learning nano degree! see more at [Udacity](https://br.udacity.com/course/deep-learning-nanodegree-foundation--nd101).

In [66]:
import numpy as np

## Auxiliar methods

### Sigmoid $f(x)$
We have a method to calculate sigmoid activation method. Sigmoid is defined by the following math: $h = 1 / (1 - e^{-x})$

### Sigmoid derivative $f'(h)$
There is a sigmoid_prime is a derivative function to calculate the error term for newral network layer. Sigmoid derivative is defined by the folloing math: $error\_term = h (1 - h)$ 

In [67]:
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_prime(h):
    return h * (1 - h)


## Data
To this example we are using a simple array of values and a simple scalar value as target. The idea of this example is just show a way to calculate backpropagation.

In [68]:
# Data
x = np.array([0.5, 0.1, -0.2])

# target
target = 0.6

## Network hyperparameters
Network hyperparameters are parameters used to tune the network

In [69]:
learnrate = 0.5

weights_input_hidden = np.array([[0.5, -0.6],
                                 [0.1, -0.2],
                                 [0.1, 0.7]])

weights_hidden_output = np.array([0.1, -0.3])

## Forward pass

This network has 3 layers. The layers are input, hidden and output layer.

### Input layer
The input layer is composide by 3 features. The features are not representing anything special, this is just for study porpose.

### Hidden layer
The hidden layer is defined by 3 sigmoid neurons. 

### Output layer
The output layer is defined by 2 sigmoid neurons.

In [70]:
hidden_layer_input = np.dot(x, weights_input_hidden)
hidden_layer_output = sigmoid(hidden_layer_input)

output_layer_in = np.dot(hidden_layer_output, weights_hidden_output)
output = sigmoid(output_layer_in)

## Backwards pass

### Error math

* Output error term: $\delta^o = (y - \hat{y}) f'(h)$

* Hidden error term: $\delta^h = \sum [W\delta^o]f'(h)$

* Delta weights from hidden to output: $\Delta W = \eta \delta^o a$

* Delta weights from inputs to hidden: $\Delta w = \eta \delta^h x_i$

### Formula legend

* y: Target value

* $\hat{y}$: Neural network output (prediction)

* $f'(h)$: Derivative function - h is the function parameter

* W: Upper case W, in this case, if for weight from hidden to output layer

* $\delta^o$: Error term for output layer

* $\delta^h$: Error term for hidden layer

* $\eta$: learning rate

* a: Hidden layer output

* $x_i$: Neural network input

In [71]:
## Calculate output error
error = target - output

# Calculate error term for output layer
output_error_term = error * sigmoid_prime(output)

# Calculate error term for hidden layer
hidden_error_term = weights_hidden_output * output_error_term * sigmoid_prime(hidden_layer_output)

# Calculate change in weights for hidden layer to output layer
delta_w_h_o = learnrate * output_error_term * hidden_layer_output

# Calculate change in weights for input layer to hidden layer
delta_w_i_h = learnrate * hidden_error_term * x[:, None]

In [72]:
print('Change in weights for hidden layer to output layer:')
print(delta_w_h_o)

Change in weights for hidden layer to output layer:
[ 0.00804047  0.00555918]


In [73]:
print('Change in weights for input layer to hidden layer:')
print(delta_w_i_h)

Change in weights for input layer to hidden layer:
[[  1.77005547e-04  -5.11178506e-04]
 [  3.54011093e-05  -1.02235701e-04]
 [ -7.08022187e-05   2.04471402e-04]]


# Backpropagation exercise 

In [99]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

%matplotlib notebook

## Data

We are using a dataset school admissions. This dataset has three feature columns, tey are:

* GRE score
* GPA
* rank of the undergraduate school (numbered 1 through 4)

In [100]:
data = pd.read_csv('./data.csv')
data.head(3)

Unnamed: 0,admit,gre,gpa,rank
0,0,380,3.61,3
1,1,660,3.67,3
2,1,800,4.0,1


### Standardize data

In [101]:
# transform categorized data in dummie values
data = pd.get_dummies(data, columns=[ 'rank' ])
data.head(3)

Unnamed: 0,admit,gre,gpa,rank_1,rank_2,rank_3,rank_4
0,0,380,3.61,0.0,0.0,1.0,0.0
1,1,660,3.67,0.0,0.0,1.0,0.0
2,1,800,4.0,1.0,0.0,0.0,0.0


In [102]:
# change value columns to mean zero
columns = ['gre', 'gpa']
for col in columns:
    mean, std = data[col].mean(), data[col].std()
    data[col] = (data[col] - mean) / std

data.head(3)

Unnamed: 0,admit,gre,gpa,rank_1,rank_2,rank_3,rank_4
0,0,-1.798011,0.578348,0.0,0.0,1.0,0.0
1,1,0.625884,0.736008,0.0,0.0,1.0,0.0
2,1,1.837832,1.603135,1.0,0.0,0.0,0.0


### Splitting dataset into train data and test data

In [103]:
np.random.seed(21)

feature = np.random.choice(data.index, int(len(data)*0.9), replace=False)
train, test = data.ix[feature], data.drop(feature)

In [104]:
train.index

Int64Index([106,   9,  61, 224,  37, 242, 313,  52, 347, 239,
            ...
            118, 235, 325, 371, 137,  28, 346, 271, 338, 240],
           dtype='int64', length=360)

In [105]:
test.index

Int64Index([ 48,  50,  80,  84,  98, 110, 120, 122, 133, 148, 169, 184, 188,
            202, 204, 207, 229, 233, 236, 238, 241, 246, 248, 253, 260, 261,
            268, 274, 291, 304, 309, 312, 315, 317, 328, 356, 368, 375, 386,
            396],
           dtype='int64')

### Splitting data into features and targets

In [106]:
train_feature, train_target = train.drop('admit', 'columns'), train['admit']
test_feature, test_target = test.drop('admit', 'columns'), test['admit']

In [107]:
train_feature.head(3)

Unnamed: 0,gre,gpa,rank_1,rank_2,rank_3,rank_4
106,0.972155,0.446965,1.0,0.0,0.0,0.0
9,0.972155,1.392922,0.0,1.0,0.0,0.0
61,-0.239793,-0.183673,0.0,0.0,0.0,1.0


In [108]:
train_target.head(3)

106    1
9      0
61     0
Name: admit, dtype: int64

In [109]:
test_feature.head(3)

Unnamed: 0,gre,gpa,rank_1,rank_2,rank_3,rank_4
48,-1.278605,-2.390908,0.0,0.0,0.0,1.0
50,0.452749,1.235263,0.0,0.0,1.0,0.0
80,0.972155,-1.287291,0.0,0.0,0.0,1.0


In [110]:
test_target.head(3)

48    0
50    0
80    0
Name: admit, dtype: int64

### Neural network implementation

#### Sigmoid Math:

* Sigmoid: $h = 1 / (1 + e^{-x})$
* Sigmoid error derivactive: $\delta = h * (1 - h)$

In [111]:
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_prime(h):
    return h * (1 - h)

#### Forward math:
* Forward pass: $\hat{y} = f(\sum xw)$

In [112]:
# forward pass
def forward_pass(inputs, weights):
    product_of_input_weight = np.dot(inputs, weights)
    return sigmoid(product_of_input_weight)

#### Error math:

* output error: $\delta ^o = (y - \hat{y})f'(h)$
* hidden error: $\delta ^h = \sum[W \delta^o] f'(h)$

In [113]:
# Errors
def output_error(target, prediction):
    return (target - prediction) * sigmoid_prime(prediction)

def hidden_error(output_error, hidden_output_weights, hidden_output):
    return np.dot(output_error, hidden_output_weights) * sigmoid_prime(hidden_output)

In [119]:
# Hyperparameters
n_hidden = 2  # number of hidden units
epochs = 900
learnrate = 0.005

n_records, n_features = train_feature.shape
last_loss = None

In [115]:
np.random.seed(21)

# Initialize weights
weights_input_hidden = np.random.normal(scale=1 / n_features ** .5,
                                        size=(n_features, n_hidden))
weights_hidden_output = np.random.normal(scale=1 / n_features ** .5,
                                         size=n_hidden)

In [120]:
for e in range(epochs):
    # we have to initialize with zeros
    del_w_input_hidden = np.zeros(weights_input_hidden.shape)
    del_w_hidden_output = np.zeros(weights_hidden_output.shape)
    
    for x, y in zip(train_feature.values, train_target):
        # forward pass
        hidden_output = forward_pass(x, weights_input_hidden)
        output = forward_pass(hidden_output, weights_hidden_output)
        
        # error
        output_error_term = output_error(y, output)
        hidden_error_term = hidden_error(output_error_term, weights_hidden_output, hidden_output)
        
        # Update the change in weights
        del_w_hidden_output += output_error_term * hidden_output
        del_w_input_hidden += hidden_error_term * x[:,None]
    
    # Update weights
    weights_input_hidden += learnrate * del_w_input_hidden / n_records
    weights_hidden_output += learnrate * del_w_hidden_output / n_records
    
    # Printing out the mean square error on the training set
    if e % (epochs / 10) == 0:
        hidden_output = sigmoid(np.dot(x, weights_input_hidden))
        out = sigmoid(np.dot(hidden_output,
                             weights_hidden_output))
        loss = np.mean((out - train_target) ** 2)

        if last_loss and last_loss < loss:
            print("Train loss: ", loss, "  WARNING - Loss Increasing")
        else:
            print("Train loss: ", loss)
        last_loss = loss
        

('Train loss: ', 0.25132580840265539)
('Train loss: ', 0.24993500999954013)
('Train loss: ', 0.24859067381566899)
('Train loss: ', 0.24729154485719615)
('Train loss: ', 0.2460363809490447)
('Train loss: ', 0.24482395458725709)
('Train loss: ', 0.24365305458773509)
('Train loss: ', 0.24252248754356173)
('Train loss: ', 0.24143107910289252)
('Train loss: ', 0.24037767507913652)


In [121]:
# Calculate accuracy on test data
hidden = sigmoid(np.dot(test_feature, weights_input_hidden))
out = sigmoid(np.dot(hidden, weights_hidden_output))
predictions = out > 0.5
accuracy = np.mean(predictions == test_target)
print("Prediction accuracy: {:.3f}".format(accuracy))

Prediction accuracy: 0.725
