## Compact MLP

This script is to illustrate the architecture of an MLP and is supposed to help understanding the matrix-wise mulitplication of input-vetor/action and weights. In addition, the backpropagaten algorithm will be implemented according to the formulas given in chapter 4.5 of the lecture notes 

In [1]:
import numpy as np

## Helper functions

In [2]:
def print_value(val, str = 'var'):
    print('\n %s is of shape %r and has values\n %r' % (str, val.shape, val))


def sig(z):
    return 1/(1+np.exp(-z))


def dsig(z):
    return sig(z)*(1-sig(z))


## Class NeuralNetwork

Implementation of a MLP with two layers, where the weights and biases are given as arguments in the constructor. The backpropagation is not implemented yet.

In [3]:
class NeuralNetwork:
    """
    simple MLP class (first illustration purposes only)
    """
        
    def __init__(self, W1, b1, W2, b2):
        """
        constructor

        Arguments:
        W1, W2 -- weight matrices of the two layers
        b1, b2 -- biases of the two layers
        """
        self.W1 = W1
        self.b1 = b1
        self.W2 = W2
        self.b2 = b2


    def activation_function(self, z):
        """
        apply activation function defined above
        """
        return sig(z)
        
        
    def propagate(self, x):
        """
        predicted outcome for x
        """
        #first layer
        z1 = self.W1 @ x + self.b1
        a1 = self.activation_function(z1)
        
        #second layer
        z2 = self.W2 @ a1 + self.b2
        a2 = self.activation_function(z2)

        self.a1 = a1 #access from outside
        y_pred = a2
        
        return y_pred


    def back_propagate(self):
        return 0
 
    
    def cost_funct(self, y_pred):
        """
        MSE loss with y == 0
        """
        #in case we have a batch
        m = y_pred.shape[1] 

        cost = np.sum(y_pred**2)/(2*m)
        
        return cost
        

## Propagation step

Observe in detail the sizes of the input vector, the weights and biases and the corresponding activations

In [4]:
#IMPORTANT: pay attention to define all array as float i.e., not using '1' but '1.0' (dtype=np.float64)
x = np.array([[1, -1, -0.5]]).T

W1 = np.array([[1, -1, -0.5],[1, 0.5, -1],[-0.2, 0.3, 0.5], [-1, 1, 1]])
b1 = np.array([[-1., -1., 1., 1.]]).T

W2 = np.array([[1, 0.5, -1, -1],[1, 0.5, -1, -0.5],[1, 1, -0.2, 0.3]])
b2 = np.array([[0.3, -0.4, -1]]).T

NNet = NeuralNetwork(W1, b1, W2, b2)

y_pred = NNet.propagate(x)
L = NNet.cost_funct(y_pred)

print_value(NNet.a1,'a1')
print_value(y_pred,'a2')
print_value(L,'L')


 a1 is of shape (4, 1) and has values
 array([[0.77729986],
       [0.5       ],
       [0.5621765 ],
       [0.18242552]])

 a2 is of shape (3, 1) and has values
 array([[0.64168794],
       [0.49347802],
       [0.55467851]])

 L is of shape () and has values
 0.4814761106026716


## Propagation step (batch)

Implement a copy of the above cell but with a full batch (minimum size 2)

In [9]:
### START YOUR CODE ###

X = np.array([[1, -1, -0.5],[1,-1,-0.5]]).T
 
### END YOUR CODE ###

W1 = np.array([[1, -1, -0.5],[1, 0.5, -1],[-0.2, 0.3, 0.5], [-1, 1, 1]])
b1 = np.array([[-1., -1., 1., 1.]]).T

W2 = np.array([[1, 0.5, -1, -1],[1, 0.5, -1, -0.5],[1, 1, -0.2, 0.3]])
b2 = np.array([[0.3, -0.4, -1]]).T

NNet = NeuralNetwork(W1, b1, W2, b2)

y_pred = NNet.propagate(X)
L = NNet.cost_funct(y_pred)

print_value(NNet.a1,'a1')
print_value(y_pred,'a2')
print_value(L,'L')


 a1 is of shape (4, 2) and has values
 array([[0.77729986, 0.77729986],
       [0.5       , 0.5       ],
       [0.5621765 , 0.5621765 ],
       [0.18242552, 0.18242552]])

 a2 is of shape (3, 2) and has values
 array([[0.64168794, 0.64168794],
       [0.49347802, 0.49347802],
       [0.55467851, 0.55467851]])

 L is of shape () and has values
 0.4814761106026716


## Backpropagation step

Implement using the set of formulas in equation 9, chapter 4.5 in the lecture notes the backpropagation step.


In [12]:
x = np.array([[1., -1., -0.5]]).T

W1 = np.array([[1, -1, -0.5],[1, 0.5, -1],[-0.2, 0.3, 0.5], [-1, 1, 1]])
b1 = np.array([[-1., -1., 1., 1.]]).T

W2 = np.array([[1, 0.5, -1, -1],[1, 0.5, -1, -0.5],[1, 1, -0.2, 0.3]])
b2 = np.array([[0.3, -0.4, -1]]).T

#here we rewrite the formulas explicitly outside NNet to have easier access to all values
#forward path
z1 = W1 @ x + b1
a1 = sig(z1)

#second layer
z2 = W2 @ a1 + b2
a2 = sig(z2)

#backward: this is the derivative of the cost function (1/3 due to three output values)
dL_da2 = a2

### START YOUR CODE ###
dg2_dz = dsig(z2)
dL_dz2 = dg2_dz * dL_da2
dL_dW2 = dL_dz2 @ a1.T
dL_db2 = np.sum(dL_dz2, axis=1,keepdims=True)
dL_da1 = W2.T @ dL_dz2

dg1_dz = dsig(z1)
dL_dz1 = dg1_dz * dL_da1
dL_dW1 = dL_dz1 @ x.T
dL_db1 = sum(dL_dz1)
### END YOUR CODE ###

print_value(dL_dW2,'dL_dW2')
print_value(dL_db2,'dL_db2')
print_value(dL_dW1,'dL_dW1')
print_value(dL_db1,'dL_db1')


 dL_dW2 is of shape (3, 4) and has values
 array([[0.11468266, 0.0737699 , 0.08294341, 0.02691502],
       [0.09587878, 0.06167426, 0.06934364, 0.02250192],
       [0.10649885, 0.06850564, 0.07702452, 0.02499436]])

 dL_db2 is of shape (3, 1) and has values
 array([[0.1475398 ],
       [0.12334851],
       [0.13701128]])

 dL_dW1 is of shape (4, 3) and has values
 array([[ 0.07060937, -0.07060937, -0.03530469],
       [ 0.06811386, -0.06811386, -0.03405693],
       [-0.07341948,  0.07341948,  0.03670974],
       [-0.02507311,  0.02507311,  0.01253655]])

 dL_db1 is of shape (1,) and has values
 array([0.04023065])


## Backpropagation step (batch)

Implement a copy of the above cell but with a full batch (minimum size 2). Use formulas in equation 10, chapter 4.5.5 in the lecture notes.


In [13]:
### START YOUR CODE ###
X = np.array([[1, -1, -0.5],[1,-1,-0.5]]).T
### END YOUR CODE ###

W1 = np.array([[1, -1, -0.5],[1, 0.5, -1],[-0.2, 0.3, 0.5], [-1, 1, 1]])
b1 = np.array([[-1., -1., 1., 1.]]).T

W2 = np.array([[1, 0.5, -1, -1],[1, 0.5, -1, -0.5],[1, 1, -0.2, 0.3]])
b2 = np.array([[0.3, -0.4, -1]]).T

#here we rewrite the formulas explicitly outside NNet to have easier access to all values
#forward path
z1 = W1 @ X + b1
a1 = sig(z1)

#second layer
z2 = W2 @ a1 + b2
a2 = sig(z2)

#backward: this is the derivative of the cost function (1/m due to size of batch)
m = a2.shape[1]
dL_da2 = a2/m

### START YOUR CODE ###
dg2_dz = dsig(z2)
dL_dz2 = dg2_dz * dL_da2 
dL_dW2 = dL_dz2 @ a1.T
dL_db2 = np.sum(dL_dz2, axis=1, keepdims=True)
dL_da1 = W2.T @ dL_dz2

dg1_dz = dsig(z1)
dL_dz1 = dg1_dz * dL_da1
dL_dW1 = dL_dz1 @ X.T
dL_db1 = np.sum(dL_dz1, axis=1, keepdims=True)
### END YOUR CODE ###

print_value(dL_dW2,'dL_dW2')
print_value(dL_db2,'dL_db2')
print_value(dL_dW1,'dL_dW1')
print_value(dL_db1,'dL_db1')


 dL_dW2 is of shape (3, 4) and has values
 array([[0.11468266, 0.0737699 , 0.08294341, 0.02691502],
       [0.09587878, 0.06167426, 0.06934364, 0.02250192],
       [0.10649885, 0.06850564, 0.07702452, 0.02499436]])

 dL_db2 is of shape (3, 1) and has values
 array([[0.1475398 ],
       [0.12334851],
       [0.13701128]])

 dL_dW1 is of shape (4, 3) and has values
 array([[ 0.07060937, -0.07060937, -0.03530469],
       [ 0.06811386, -0.06811386, -0.03405693],
       [-0.07341948,  0.07341948,  0.03670974],
       [-0.02507311,  0.02507311,  0.01253655]])

 dL_db1 is of shape (4, 1) and has values
 array([[ 0.07060937],
       [ 0.06811386],
       [-0.07341948],
       [-0.02507311]])


## Numerical gradient check

We calculate the derivatives numerically and compare the results with our analytical calculation.

In [14]:
eps = 1e-6

#choose value to use (x or X) depending on type of input (single sample or batch)
x_n = x

y_pred = NNet.propagate(x_n)
L = NNet.cost_funct(y_pred)

dL_dW2_n = np.zeros(W2.shape)
for r in range(W2.shape[0]):
    for c in range(W2.shape[1]):
        NNet.W2[r,c] += eps
        y_pred = NNet.propagate(x_n)
        L_eps = NNet.cost_funct(y_pred)
        NNet.W2[r,c] -= eps

        dL_dW2_n[r,c] = (L_eps-L)/eps

print_value(dL_dW2_n,'dL_dW2 (numerical check)')

dL_db2_n = np.zeros(b2.shape)
for r in range(b2.shape[0]):
    NNet.b2[r,0] += eps
    y_pred = NNet.propagate(x_n)
    L_eps = NNet.cost_funct(y_pred)
    NNet.b2[r,0] -= eps

    dL_db2_n[r,0] = (L_eps-L)/eps

print_value(dL_db2_n,'dL_db2 (numerical check)')

dL_dW1_n = np.zeros(W1.shape)
for r in range(W1.shape[0]):
    for c in range(W1.shape[1]):
        NNet.W1[r,c] += eps
        y_pred = NNet.propagate(x_n)
        L_eps = NNet.cost_funct(y_pred)
        NNet.W1[r,c] -= eps

        dL_dW1_n[r,c] = (L_eps-L)/eps

print_value(dL_dW1_n,'dL_dW1 (numerical check)')

dL_db1_n = np.zeros(b1.shape)
for r in range(b1.shape[0]):
    NNet.b1[r,0] += eps
    y_pred = NNet.propagate(x_n)
    L_eps = NNet.cost_funct(y_pred)
    NNet.b1[r,0] -= eps

    dL_db1_n[r,0] = (L_eps-L)/eps

print_value(dL_db1_n,'dL_db1 (numerical check)')

#check the difference
print('\nthe respective maximum differences:')
print(np.max(np.abs(dL_dW2_n - dL_dW2)))
print(np.max(np.abs(dL_db2_n - dL_db2)))
print(np.max(np.abs(dL_dW1_n - dL_dW1)))
print(np.max(np.abs(dL_db1_n - dL_db1)))


 dL_dW2 (numerical check) is of shape (3, 4) and has values
 array([[0.11468267, 0.0737699 , 0.08294341, 0.02691502],
       [0.0958788 , 0.06167427, 0.06934365, 0.02250192],
       [0.10649887, 0.06850565, 0.07702453, 0.02499436]])

 dL_db2 (numerical check) is of shape (3, 1) and has values
 array([[0.1475398 ],
       [0.12334855],
       [0.13701131]])

 dL_dW1 (numerical check) is of shape (4, 3) and has values
 array([[ 0.07060935, -0.07060939, -0.03530469],
       [ 0.06811386, -0.06811386, -0.03405693],
       [-0.07341947,  0.07341948,  0.03670974],
       [-0.02507312,  0.0250731 ,  0.01253655]])

 dL_db1 (numerical check) is of shape (4, 1) and has values
 array([[ 0.07060935],
       [ 0.06811386],
       [-0.07341947],
       [-0.02507312]])

the respective maximum differences:
1.9276619286912045e-08
3.196706895025603e-08
1.7845128250093545e-08
1.779303084037398e-08
