# PS3: Deep learning

In this problem set, you will experiment with fully-connected neural networks.

To start with, let's load the "breast cancer" data set from scikit-learn:

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import os
from sklearn.datasets import load_breast_cancer

data = load_breast_cancer()
y = np.matrix(data.target).T
X = np.matrix(data.data)
M = X.shape[0]
N = X.shape[1]

# Normalize each input feature
def normalize(X):
    M = X.shape[0]
    XX = X - np.tile(np.mean(X,0),[M,1])
    XX = np.divide(XX, np.tile(np.std(XX,0),[M,1]))
    return XX

XX = normalize(X)

Next, let's represent a fully-connected neural network by two arrays W and b containing the weights and biases for each layer.

In [None]:
h1 = 10
h2 = 5
h3 = 3
W = [[]]
b = [[]]

# units is the number of units each layer, e.g. [N,h,1] is input layer N unit, second layer h ...
def initializer(no_of_layers, units):
    for i in range(0,no_of_layers-1):
        W.append(np.random.normal(0,0.1,[units[i],units[i+1]]))
        b.append(np.random.normal(0,0.1,[units[i+1],1]))
        
        
initializer(5,[N,h1,h2,h3,1])

L = len(W)-1

def act(z):
    return 1/(1+np.exp(-z))

#activation derivative
def actder(z):
    az = act(z)
    prod = np.multiply(az,1-az)
    return prod

def ff(x,W,b):
    L = len(W)-1
    a = x
    for l in range(1,L+1):
        z = W[l].T*a+b[l]
        a = act(z)
    return a

def loss(y,yhat):
    return -((1-y) * np.log(1-yhat) + y * np.log(yhat))

## Question 1

Write Python code to separate $\texttt{X},\texttt{y}$ randomly into a training set containing 80% of the data and a validation set consisting of the remaining 20% of the data.

In [19]:
order = np.random.permutation(range(0,len(X)))
split_index = int(len(X)*0.8)
X_train = XX[order[0:split_index]]
y_train = y[order[0:split_index]]
X_validate = XX[order[split_index:]]
y_validate = y[order[split_index:]]

## Question 2

Beginning with the training code we wrote together in class, write Python code to execute backpropagation with mini-batch size 1 on the training set, and plot the training loss and validation loss as a function of training iteration. Show the plot in this sheet.

In [1]:
alpha = 0.01

def ff(L,X_t,split_index,y_t=None,type="trainer"):
    loss_this_iter = 0
    order = np.random.permutation(split_index)

    for i in range(0,split_index):
        # Grab the pattern order[i]

        x_this = X_t[order[i],:].T
        if y_t.all:
            y_this = y_t[order[i],0]

        # Feed forward step

        a = [x_this]
        z = [[]]
        delta = [[]]
        dW = [[]]
        db = [[]]
        
        for l in range(1,L+1):
            z.append(W[l].T*a[l-1]+b[l])
            a.append(act(z[l]))
            # Just to give arrays the right shape for the backprop step
            delta.append([]); dW.append([]); db.append([])
            
        loss_this_pattern = loss(y_this, a[L][0,0])
        loss_this_iter = loss_this_iter + loss_this_pattern
        
        if type == "trainer":
            backprop(L,a,z,delta,dW,db,y_this)

    return loss_this_iter
        
        
# Backprop step
def backprop(L,a,z,delta,dW,db,y_this):        
    delta[L] = a[L] - y_this
    for l in range(L,0,-1):
        db[l] = delta[l].copy()
        dW[l] = a[l-1] * delta[l].T
        if l > 1:
            delta[l-1] = np.multiply(actder(z[l-1]), W[l] * delta[l])

    for l in range(1,L+1):            
        W[l] = W[l] - alpha * dW[l]
        b[l] = b[l] - alpha * db[l]

#this function is for testing and validation     
def trainer(W,b,L,X_t,y_t,split_index,X_v,y_v,max_iter=1000):
    #loss in training set every iteration
    loss_every_iter_train = []
    
    #loss in validation set every iteration
    loss_every_iter_val = []
    
    for iter in range(0, max_iter):
        
        #for training set
        loss_this_iter = ff(L,X_t,split_index,y_t,"trainer")
        loss_every_iter_train.append(loss_this_iter)
        
        #for validation set
        loss_this_iter = ff(L,X_v,len(X)-split_index,y_v)
        loss_every_iter_val.append(loss_this_iter)
    
    #for training set
    plt.plot(range(0,len(loss_every_iter_train)),loss_every_iter_train)
    plt.show()
    
    #for validation set
    plt.plot(range(0,len(loss_every_iter_val)),loss_every_iter_val)
    plt.show()
    print(loss_every_iter_val)

print("Please Wait...")
trainer(W,b,L,X_train,y_train,split_index,X_validate,y_validate,100)


Please Wait...


NameError: name 'W' is not defined

## Question 3

Perform several experiments with different numbers of layers and different numbers of hidden units. Demonstrate the phenomenon of overtraining, make a table showing the training and validation set performance of each of your models, and make a recommendation about which model is best based on validation set performance.

In [None]:
# h1 = 10
# h2 = 5

# W = [[]]
# b = [[]]

# initializer(4,[N,h1,h2,1])

# L = len(W)-1

# trainer(W,b,L,X_train,y_train,split_index,X_validate,y_validate,100)

*Results table and discussion goes here.*

## Question 4

Modify the backpropagation procedure to use mini-batches of a few different sizes such as 10, 20, and 40. Take care that each mathematical operation is efficient (avoid any for loops over the examples in a mini-batch). Repeat your experiments and report the results. Do you observe any differences in terms of accuracy and number of iterations to converge?

In [None]:
# Code goes here

*Results table and discussion goes here.*

## Question 5

Modify the model to use the ReLU activation function in the hidden layers rather than logistic sigmoid. Repeat your experiments and report the results. Do you observe any differences in terms of accuracy and number of iterations to converge?

In [None]:
# Code goes here

*Results table and discussion goes here.*