# Neural Networks

In this exercise you will learn how to implement a feedforward neural network and train it with backpropagation.

In [20]:
import numpy as np
from numpy.random import multivariate_normal
from numpy.random import uniform
from scipy.stats import zscore

We define two helper functions "init_toy_data" and "init_model" to create a simple data set to work on and a 2 layer neural network. 

First, we create toy data with categorical labels by sampling from different multivariate normal distributions for each class. 

In [19]:
def init_toy_data(num_samples,num_features, num_classes, seed=3):
    # num_samples: number of samples *per class*
    # num_features: number of features (excluding bias)
    # num_classes: number of class labels
    # seed: random seed
    np.random.seed(seed)
    X=np.zeros((num_samples*num_classes, num_features))
    y=np.zeros(num_samples*num_classes)
    for c in range(num_classes):
        # initialize multivariate normal distribution for this class:
        # choose a mean for each feature
        means = uniform(low=-10, high=10, size=num_features)
        # choose a variance for each feature
        var = uniform(low=1.0, high=5, size=num_features)
        # for simplicity, all features are uncorrelated (covariance between any two features is 0)
        cov = var * np.eye(num_features)
        # draw samples from normal distribution
        X[c*num_samples:c*num_samples+num_samples,:] = multivariate_normal(means, cov, size=num_samples)
        # set label
        y[c*num_samples:c*num_samples+num_samples] = c
    return X,y


In [14]:
X,y = init_toy_data(10,3,2)
print(X)
print(zscore(X,axis=0))

[[ 0.87161482  3.40442005 -4.77587701]
 [ 0.18344707  4.06926559 -5.52450275]
 [ 2.553425    6.05443298 -6.99528586]
 [ 0.3099948   4.26993708 -0.52119376]
 [ 2.72970654  0.85631895 -5.34968533]
 [ 0.65720002  1.62912085 -6.53962305]
 [-0.77004398  4.66909701 -0.99961344]
 [ 0.7359411   5.49983895 -5.70863867]
 [ 2.31571515  3.67111076 -5.8282157 ]
 [-0.07683167  1.50280437  0.04954387]
 [-2.39407907  0.46682477 -8.03658564]
 [-2.73608999 -1.62504729 -8.28919778]
 [-5.17933271 -0.57120207 -6.4571155 ]
 [-0.61417112  1.08346785 -5.17963337]
 [ 0.3202574   1.26215957 -2.27963095]
 [-4.60907573  0.80190444 -6.97226085]
 [-0.32659072 -3.84225973 -5.88595166]
 [ 0.74290915 -0.95334167 -1.80425265]
 [-1.64444549 -0.36778486 -6.03736941]
 [ 0.98482956  0.11564225 -6.85822626]]
[[ 0.56327438  0.72058119  0.09191445]
 [ 0.23165606  0.98605791 -0.21556086]
 [ 1.37371494  1.77874688 -0.81964042]
 [ 0.29263763  1.06618721  1.83939655]
 [ 1.45866252 -0.29689051 -0.14375991]
 [ 0.45995092  0.0116938

In [10]:
def init_model(input_size,hidden_size,num_classes, seed=3):
    # input size: number of input features
    # hidden_size: number of units in the hidden layer
    # num_classes: number of class labels, i.e., number of output units
    np.random.seed(seed)
    model = {}
    # initialize weight matrices and biases randomly
    model['W1'] = uniform(low=-1, high=1, size=(input_size, hidden_size))
    model['b1'] = uniform(low=-1, high=1, size=hidden_size)
    model['W2'] = uniform(low=-1, high=1, size=(hidden_size, num_classes))
    model['b2'] = uniform(low=-1, high=1, size=num_classes)
    return model

In [180]:
# create toy data
X,y= init_toy_data(2,4,3) # 2 samples per class; 4 features, 3 classes
# Normalize data
X = zscore(X, axis=0)
print('X: ' + str(X))
print('y: ' + str(y))

X: [[ 0.39636145  1.09468144 -0.89360845  0.91815536]
 [ 0.94419323 -0.94027869  1.22268078  1.29597409]
 [-1.41577399  1.15477931 -0.62099631  0.08323307]
 [-1.35264614 -0.13598976 -1.14221784  0.26928935]
 [ 0.9352123   0.38225626  1.419864   -1.51152157]
 [ 0.49265316 -1.55544856  0.01427781 -1.0551303 ]]
y: [0. 0. 1. 1. 2. 2.]


We now initialise our neural net with one hidden layer consisting of $10$ units and and an output layer consisting of $3$ units. Here we expect (any number of) training samples with $4$ features. We do not apply any activation functions yet. The following figure shows a graphical representation of this neuronal net. 
<img src="nn.graphviz.png"  width="30%" height="30%">

In [181]:
# initialize model
model = init_model(input_size=4, hidden_size=10, num_classes=3)

print('model: ' + str(model))
print('model[\'W1\'].shape: ' + str(model['W1'].shape))
print('model[\'W2\'].shape: ' + str(model['W2'].shape))
print('model[\'b1\'].shape: ' + str(model['b1'].shape))
print('model[\'b12\'].shape: ' + str(model['b2'].shape))
print('number of parameters: ' + str((model['W1'].shape[0] * model['W1'].shape[1]) + 
     np.sum(model['W2'].shape[0] * model['W2'].shape[1]) + 
     np.sum(model['b1'].shape[0]) +
     np.sum(model['b2'].shape[0] )))

model: {'W1': array([[ 0.10159581,  0.41629565, -0.41819052,  0.02165521,  0.78589391,
         0.79258618, -0.74882938, -0.58551424, -0.89706559, -0.11838031],
       [-0.94024758, -0.08633355,  0.2982881 , -0.44302543,  0.3525098 ,
         0.18172563, -0.95203624,  0.11770818, -0.48149511, -0.16979761],
       [-0.43294984,  0.38627584, -0.11909256, -0.68626452,  0.08929804,
         0.56062953, -0.38727294, -0.55608423, -0.22405748,  0.8727673 ],
       [ 0.95199084,  0.34476735,  0.80566822,  0.69150174, -0.24401192,
        -0.81556598,  0.30682181,  0.11568152, -0.27687047, -0.54989099]]), 'b1': array([-0.18696017, -0.0621195 , -0.46152884, -0.41641445, -0.0846272 ,
        0.72106783,  0.17250581, -0.43302428, -0.44404499, -0.09075585]), 'W2': array([[-0.58917931, -0.59724258,  0.02807012],
       [-0.82554126, -0.03282894, -0.27564758],
       [ 0.41537324,  0.49349245,  0.38218584],
       [ 0.37836083, -0.25279975,  0.33626961],
       [-0.32030267,  0.14558774, -0.34838568]

<b>Exercise 1</b>: Implement softmax layer.

Implement the softmax function given by 

$softmax(x_i) = \frac{e^{x_i}}{{\sum_{j\in 1...J}e^{x_j}}}$, 

where $J$ is the total number of classes, i.e. the length of  **x** .

Note: Implement the function such that it takes a matrix X of shape (N, J) as input rather than a single instance **x**; N is the number of instances.

In [197]:
def softmax(X):
    #######################################
    # INSERT YOUR CODE HERE
    #######################################
    result=np.zeros_like(X)
    for i,row in enumerate(X):
        result[i]=np.exp(row)/np.sum(np.exp(row))
    
    return result
    

Check if everything is correct.

In [198]:
x = np.array([[0.1, 0.7],[0.7,0.4]])
exact_softmax = np.array([[ 0.35434369,  0.64565631],
                         [ 0.57444252,  0.42555748]])
sm = softmax(x)
difference = np.sum(np.abs(exact_softmax - sm))
try:
    assert difference < 0.000001   
    print("Testing successful.")
except:
    print("Tests failed.")

Testing successful.


<b>Exercise 2</b>: Implement the forward propagation algorithm for the model defined above.

The activation function of the hidden neurons is a Rectified Linear Unit $relu(x)=max(0,x)$ (to be applied element-wise to the hidden units)
The activation function of the output layer is a softmax function as (as implemented in Exercise 1).

The function should return both the activation of the hidden units (after having applied the $relu$ activation function) (shape: $(N, num\_hidden)$) and the softmax model output (shape: $(N, num\_classes)$). 

In [201]:
def forward_prop(X,model):
    ###############################################
    # INSERT YOUR CODE HERE                       #
    ###############################################
    a1 = np.maximum(0,np.add(np.matmul(X,model['W1']),model['b1']))
    
    a2 = np.add(np.matmul(a1,model['W2']), model['b2'])
    out = softmax(a2)
    return a1,out

In [202]:
acts,probs = forward_prop(X, model)
correct_probs = np.array([[0.22836388, 0.51816433, 0.25347179],
                            [0.15853289, 0.33057078, 0.51089632],
                            [0.40710319, 0.41765056, 0.17524624],
                            [0.85151353, 0.03656425, 0.11192222],
                            [0.66016592, 0.19839791, 0.14143618],
                            [0.70362036, 0.08667923, 0.20970041]])

# the difference should be very small.
difference =  np.sum(np.abs(probs - correct_probs))

try:
    assert probs.shape==(X.shape[0],len(set(y)))
    assert difference < 0.00001   
    print("Testing successful.")
except:
    print("Tests failed.")

Testing successful.


In [3]:
pip install keras

Collecting keras
  Using cached Keras-2.4.3-py2.py3-none-any.whl (36 kB)
Installing collected packages: keras
Successfully installed keras-2.4.3
Note: you may need to restart the kernel to use updated packages.


In [2]:
pip install tensorflow

Collecting tensorflow
  Using cached tensorflow-2.5.0-cp38-cp38-win_amd64.whl (422.6 MB)
Collecting termcolor~=1.1.0
  Using cached termcolor-1.1.0.tar.gz (3.9 kB)
Collecting gast==0.4.0
  Using cached gast-0.4.0-py3-none-any.whl (9.8 kB)
Collecting google-pasta~=0.2
  Using cached google_pasta-0.2.0-py3-none-any.whl (57 kB)
Collecting absl-py~=0.10
  Using cached absl_py-0.13.0-py3-none-any.whl (132 kB)
Collecting protobuf>=3.9.2
  Using cached protobuf-3.17.3-py2.py3-none-any.whl (173 kB)
Collecting keras-nightly~=2.5.0.dev
  Using cached keras_nightly-2.5.0.dev2021032900-py2.py3-none-any.whl (1.2 MB)
Collecting numpy~=1.19.2
  Using cached numpy-1.19.5-cp38-cp38-win_amd64.whl (13.3 MB)
Collecting keras-preprocessing~=1.1.2
  Using cached Keras_Preprocessing-1.1.2-py2.py3-none-any.whl (42 kB)
Collecting tensorboard~=2.5
  Using cached tensorboard-2.5.0-py3-none-any.whl (6.0 MB)
Collecting astunparse~=1.6.3
  Using cached astunparse-1.6.3-py2.py3-none-any.whl (12 kB)
Collecting h5py~=

<b>Exercise 3:</b> How would you train the above defined neural network? Which loss-function would you use? You do not need to implement this.

<b>Part 2 (Neural Net using Keras)</b>

Instead of implementing the model learning ourselves, we can use the neural network library Keras for Python (https://keras.io/). Keras is an abstraction layer that either builds on top of Theano or Google's Tensorflow. So please install Keras and Tensorflow/Theano for this lab.

<b>Exercise 4:</b>
    Implement the same model as above using Keras:
    
    ** 1 hidden layer à 10 units
    ** softmax output layer à three units
    ** 4 input features
    
Compile the model using categorical cross-entropy (also referred to as 'softmax-loss') as loss function and using categorical crossentropy together categorical accuracy as metrics for runtime evaluation during training.

In [2]:
from keras.models import Sequential
from keras.layers.core import Dense
from keras.layers.core import Activation

# define the model 
################################################
# INSERT YOUR CODE HERE                        #
################################################

model = Sequential()
model.add(Dense(10,input_dim = 4,activation = 'relu'))
model.add(Dense(3,activation='softmax'))

# compile the model
################################################
# INSERT YOUR CODE HERE                        #
################################################
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

The description of the current network can always be looked at via the summary method. The layers can be accessed via model.layers and weights can be obtained with the method get_weights. Check if your model is as expected. 

In [6]:
# Check model architecture and initial weights.

#############################################
# INSERT YOUR CODE HERE                     #
#############################################
model.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_2 (Dense)              (None, 10)                50        
_________________________________________________________________
dense_3 (Dense)              (None, 3)                 33        
Total params: 83
Trainable params: 83
Non-trainable params: 0
_________________________________________________________________


In [17]:
for layer in model.layers:
    print(layer.get_weights())

[array([[-0.34277642, -0.5813798 ,  0.4183488 , -0.46621323, -0.4852491 ,
        -0.3190562 , -0.4091835 , -0.00275517, -0.15084428,  0.21051723],
       [-0.3190904 ,  0.63825154,  0.6236813 ,  0.475106  , -0.12454623,
        -0.40144983, -0.41297394, -0.32373804, -0.19516036, -0.3453104 ],
       [-0.16077918, -0.39110193,  0.15646493, -0.51652646, -0.44150442,
         0.3281082 ,  0.07540148,  0.05318654,  0.3246246 , -0.6261151 ],
       [-0.14124262, -0.6382946 , -0.3699037 , -0.36522517, -0.02580839,
        -0.07660466,  0.4276508 , -0.09739679, -0.16129503,  0.3242061 ]],
      dtype=float32), array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)]
[array([[ 0.42243457,  0.49078906, -0.10360295],
       [-0.37795016, -0.5292926 , -0.06808525],
       [-0.08380789, -0.13367432, -0.24949497],
       [ 0.2637216 , -0.4528955 , -0.16099262],
       [ 0.47686303, -0.38811514, -0.04353267],
       [ 0.17042887,  0.307509  ,  0.44068336],
       [-0.17505306,  0.20978212,  

<b>Exercise 5:</b> Train the model on the toy data set generated below: 

Hints: 

* Keras expects one-hot-coded labels 

* Don't forget to normalize the data

In [34]:
X, y = init_toy_data(1000,4,3, seed=3)

In [27]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
print('Data instance before normalization: '+str(X[0]))
X = sc.fit_transform(X)
print('Data instance after normalization: '+str(X[0]))

Data instance before normalization: [-0.32467846  3.98578199 -4.76683151  0.15729264]
Data instance after normalization: [ 0.52324084  1.27080131  0.27083063 -0.85289313]


In [35]:
from sklearn.preprocessing import OneHotEncoder
ohe = OneHotEncoder()
print('Labels before encoding : ' + str(y))
y = ohe.fit_transform(y.reshape(-1,1)).toarray()
print('Labels after encoding : ' + str(y))

Labels before encoding : [0. 0. 0. ... 2. 2. 2.]
Labels after encoding : [[1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 ...
 [0. 0. 1.]
 [0. 0. 1.]
 [0. 0. 1.]]


In [36]:
model.fit(X, y, epochs=10, batch_size=64)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x1f32602b3a0>