# Gradient Descent

In [2]:
import numpy as np

# data found at http://www.ats.ucla.edu/stat/data/binary.csv
points = np.genfromtxt('binary.csv', delimiter=',', names=True)

points[-5:-1]

array([( 0.,  620.,  4.  ,  2.), ( 0.,  560.,  3.04,  3.),
       ( 0.,  460.,  2.63,  2.), ( 0.,  700.,  3.65,  2.)], 
      dtype=[('admit', '<f8'), ('gre', '<f8'), ('gpa', '<f8'), ('rank', '<f8')])

## Data preparation

The data needs to be integrated with dummy variables: a dummy variable (also known as an indicator variable, design variable, Boolean indicator, categorical variable, binary variable, or qualitative variable) is one that takes the value 0 or 1 to indicate the absence or presence of some categorical effect that may be expected to shift the outcome.

In [3]:
import pandas as pd

admissions = pd.read_csv('binary.csv')

# Make dummy variables for rank
data = pd.concat([admissions, pd.get_dummies(admissions['rank'], prefix='rank')], axis=1)
data = data.drop('rank', axis=1)

We'll also need to standardize the GRE and GPA data, which means to scale the values such they have zero mean and a standard deviation of 1. This is necessary because the sigmoid function squashes really small and really large inputs. The gradient of really small and large inputs is zero, which means that the gradient descent step will go to zero too. Since the GRE and GPA values are fairly large, we have to be really careful about how we initialize the weights or the gradient descent steps will die off and the network won't train. 

In [4]:
# Standarize features
for field in ['gre', 'gpa']:
    mean, std = data[field].mean(), data[field].std()
    data.loc[:,field] = (data[field]-mean)/std
    
# Split off random 10% of the data for testing
np.random.seed(42)
sample = np.random.choice(data.index, size=int(len(data)*0.9), replace=False)
data, test_data = data.ix[sample], data.drop(sample)

# Targets for accuracy test
# Split into features and targets
features, targets = data.drop('admit', axis=1), data['admit']
features_test, targets_test = test_data.drop('admit', axis=1), test_data['admit']

## Training loop

In [5]:
def sigmoid(x):
    """
    Calculate sigmoid
    """
    return 1 / (1 + np.exp(-x))

# Use to same seed to make debugging easier
np.random.seed(42)

n_records, n_features = features.shape
last_loss = None

# Initialize weights
weights = np.random.normal(scale=1 / n_features**.5, size=n_features)
print(weights)

# Neural Network hyperparameters
epochs = 1000
learnrate = 0.5

for e in range(epochs):
    del_w = np.zeros(weights.shape)
    for x, y in zip(features.values, targets):
        #print(x)
        # Loop through all records, x is the input, y is the target

        # TODO: Calculate the output
        output = sigmoid(np.dot(weights, x))

        # TODO: Calculate the error
        error = y - output

        # TODO: Calculate change in weights
        error_gradient = error * output * (1- output)
        #print("error gradient", error_gradient)
        del_w += error_gradient * x
        #del_w += error * output * (1 - output) * x
        #print("delta weights", del_w)

        # TODO: Update weights
    weights += learnrate * del_w / n_records
    #if e > 10: break

    # Printing out the mean square error on the training set
    if e % (epochs / 10) == 0:
        out = sigmoid(np.dot(features, weights))
        loss = np.mean((out - targets) ** 2)
        if last_loss and last_loss < loss:
            print("Train loss: ", loss, "  WARNING - Loss Increasing")
        else:
            print("Train loss: ", loss)
        last_loss = loss


# Calculate accuracy on test data
tes_out = sigmoid(np.dot(features_test, weights))
predictions = tes_out > 0.5
accuracy = np.mean(predictions == targets_test)
print("Prediction accuracy: {:.3f}".format(accuracy))

[ 0.2027827  -0.05644616  0.26441774  0.62177434 -0.09559271 -0.09558601]
Train loss:  0.262213022107
Train loss:  0.21175289463
Train loss:  0.204042375284
Train loss:  0.201976685332
Train loss:  0.201201048563
Train loss:  0.200846619991
Train loss:  0.200664256824
Train loss:  0.200562892968
Train loss:  0.2005034469
Train loss:  0.200467204385
Prediction accuracy: 0.750


# Multilayer Network

Implement a forward pass through a 4x3x2 (4 units/nodes in the input layer, 3 units in the hidden layer, 2 units in the output layer) network, with sigmoid activation functions for both layers.

Things to do:

* Calculate the input to the hidden layer.
* Calculate the hidden layer output.
* Calculate the input to the output layer.
* Calculate the output of the network.

In [6]:
import numpy as np

def sigmoid(x):
    """
    Calculate sigmoid
    """
    return 1/(1+np.exp(-x))

# Network size
N_input = 4
N_hidden = 3
N_output = 2

np.random.seed(42)
# Make some fake data (inputs)
X = np.random.randn(4)

weights_in_hidden = np.random.normal(0, scale=0.1, size=(N_input, N_hidden))
weights_hidden_out = np.random.normal(0, scale=0.1, size=(N_hidden, N_output))


# TODO: Make a forward pass through the network

hidden_layer_in = X     # the input layer passes the inputs to the hidden layer
hidden_layer_out = sigmoid(np.dot(hidden_layer_in, weights_in_hidden))   # calculate the output for the hidden layer

print('Hidden-layer Output:')
print(hidden_layer_out)

output_layer_in = hidden_layer_out   # the inputs for the output layers are the output from the hidden layer
output_layer_out = sigmoid(np.dot(output_layer_in, weights_hidden_out))  # calculate the output for the output layer

print('Output-layer Output:')
print(output_layer_out)

Hidden-layer Output:
[ 0.41492192  0.42604313  0.5002434 ]
Output-layer Output:
[ 0.49815196  0.48539772]


            INPUT LAYER

        +-------------------+
        |                   |
        | X[1]              |
        |                   |                  HIDDEN LAYER
        |                   |
     i  +-------------------+
                        |                      +----------------+
                        +---------------------->                |
                                               | sigmoid(X * w) |
                        +---------------------->                |          OUTPUT LAYER
     n                  |                      |                |
        +-------------------+                  +----------------+
        |                   |                        |                     +-----------------+
        | X[2]              |                        +--------------------->                 |
        |                   |                                              | sigmoid(X * w)  |           o
        |                   |                        +--------------------->                 +------>
     p  +-------------------+                        |                     |                 |
                        |                      +----------------+          +-----------------+           u
                        +---------------------->                |
                                               |                |
                        +---------------------->                |                                        t
                        |                      |                |
     u  +-------------------+                  +----------------+
        |                   |                        |                     +-----------------+           p
        | X[3]              |                        +--------------------->                 |
        |                   |                                              |                 |
        |                   |                        +--------------------->                 +----->     u
        +-------------------+                        |                     |                 |
     t                  |                      +---------------+           +-----------------+
                        +---------------------->               |                                         t
                                               |               |
                        +---------------------->               |                                         s
                        |                      |               |
       +--------------------+                  +---------------+
     s |                    |
       |  X[4]              |
       |                    |
       |                    |
       +--------------------+




