# Problem 5.1.2 
Consider binary classification with the case of 2 features and 3 data points (m=3):

Let $X = \begin{bmatrix} 1 & 2 & 4 \\ -2 & -5 & -8 \end{bmatrix}, Y = \begin{bmatrix} 0 & 1 & 0 \end{bmatrix}$

Assume that layer 1 has 2 units and that layer 2 has 1 units with parameter matrices:

$W^{[1]} = \begin{bmatrix} 0.5 & 0.5 \\ 0.5 & -0.5 \end{bmatrix}, b^{[1]} = \begin{bmatrix} 0.5 \\ 0.5 \end{bmatrix},  W^{[2]} = \begin{bmatrix} -1 & 1  \end{bmatrix}, b^{[2]} = \begin{bmatrix} -0.1  \end{bmatrix}$

Assume activation functions $f^{[1]}  (z)=log⁡(1+e^z$) and $f^{[2]}  (z)=\frac{1}{1+e^{-z}}$ and binary cross entropy loss function.


In [51]:
import numpy as np

In [52]:
# inputs
X = np.array([[1,2,4],[-2,-5,-8]])
Y = np.array([[0,1,0]])
W1 = np.array([[0.5,0.5],[0.5,-0.5]])
b1 = np.array([[0.5],[0.5]])
W2 = np.array([[-1,1]])
b2 = np.array([[-0.1]])
m = X.shape[1]

**(a)**	Compute the value of the loss function for the above $W^{[1]}, b^{[1]},  W^{[2]}, b^{[2]}$


In [53]:
# forward propagation
# layer 1
Z1 = np.dot(W1,X)+b1
A1 = np.log(1+np.exp(Z1))
print("Z1: \n{}".format(Z1))
print("A1: \n{}".format(A1))
# layer2
Z2 = np.dot(W2,A1)+b2
A2 = 1/(1+np.exp(-Z2))
print("Z2: {}".format(Z2))
print("A2: {}".format(A2))
# compute loss
L = -np.mean(Y*np.log(A2+1e-16)+(1-Y)*np.log(1-A2+1e-16))
print("Loss: {}".format(L))

Z1: 
[[ 0.  -1.  -1.5]
 [ 2.   4.   6.5]]
A1: 
[[0.69314718 0.31326169 0.20141328]
 [2.12692801 4.01814993 6.50150231]]
Z2: [[1.33378083 3.60488824 6.20008903]]
A2: [[0.79146534 0.97352927 0.99797486]]
Loss: 2.5988645452351697


**(b)**	Perform 1  epoch of training using Gradient Descent with learning rate of 0.1 and recompute the loss function with the updated $W^{[1]}, b^{[1]},  W^{[2]}, b^{[2]}$

In [54]:
# back propagation
# derivative of loss
dLdA2 = -1/m*(Y/A2 - (1-Y)/(1-A2))
print("dLdA2: {}".format(dLdA2))
# layer 2
dA2dZ2 = A2 - np.square(A2)
print("dA2dZ2: {}".format(dA2dZ2))
dLdZ2 = dLdA2*dA2dZ2
print("dLdZ2: {}".format(dLdZ2))
grad_W2L = np.dot(dLdZ2,A1.T)
grad_b2L = np.sum(dLdZ2,axis=1,keepdims=True)
print("grad_W2L: {}".format(grad_W2L))
print("grad_b2L: {}".format(grad_b2L))
# layer 1
dLdA1 = np.dot(W2.T,dLdZ2)
print("dLdA1: \n{}".format(dLdA1))
dA1dZ1 = 1 - np.square(A1)
print("dA1dZ1: \n{}".format(dA1dZ1))
dLdZ1 = dLdA1*dA1dZ1
print("dLdZ1: \n{}".format(dLdZ1))
grad_W1L = np.dot(dLdZ1,X.T)
grad_b1L = np.sum(dLdZ1,axis=1,keepdims=True)
print("grad_W1L: \n{}".format(grad_W1L))
print("grad_b1L: \n{}".format(grad_b1L))
# gradient descent epoch 1
alpha = 0.1
# update parameters
W1 = W1 - alpha*grad_W1L
b1 = b1 - alpha*grad_b1L
W2 = W2 - alpha*grad_W2L
b2 = b2 - alpha*grad_b2L
print("W1 update: \n{}".format(W1))
print("b1 update: \n{}".format(b1))
print("W2 update: {}".format(W2))
print("b2 update: {}".format(b2))
# forward propagation
# layer 1
Z1 = np.dot(W1,X)+b1
A1 = np.log(1+np.exp(Z1))
print("Z1: \n{}".format(Z1))
print("A1: \n{}".format(A1))
# layer2
Z2 = np.dot(W2,A1)+b2
A2 = 1/(1+np.exp(-Z2))
print("Z2: {}".format(Z2))
print("A2: {}".format(A2))
# recompute loss
L = -np.mean(Y*np.log(A2+1e-16)+(1-Y)*np.log(1-A2+1e-16))
print("Loss epoch 1: {}".format(L))

dLdA2: [[  1.59845531  -0.34239683 164.59763786]]
dA2dZ2: [[0.16504796 0.02577003 0.00202104]]
dLdZ2: [[ 0.26382178 -0.00882358  0.33265829]]
grad_W2L: [[0.24710503 2.6884541 ]]
grad_b2L: [[0.58765649]]
dLdA1: 
[[-0.26382178  0.00882358 -0.33265829]
 [ 0.26382178 -0.00882358  0.33265829]]
dA1dZ1: 
[[  0.51954699   0.90186712   0.95943269]
 [ -3.52382276 -15.14552884 -41.26953229]]
dLdZ1: 
[[-1.37067811e-01  7.95769407e-03 -3.19163235e-01]
 [-9.29661195e-01  1.33637742e-01 -1.37286519e+01]]
grad_W1L: 
[[ -1.39780536   2.78765303]
 [-55.5769933  111.02034885]]
grad_b1L: 
[[ -0.44827335]
 [-14.52467535]]
W1 update: 
[[  0.63978054   0.2212347 ]
 [  6.05769933 -11.60203489]]
b1 update: 
[[0.54482734]
 [1.95246753]]
W2 update: [[-1.0247105   0.73115459]]
b2 update: [[-0.15876565]]
Z1: 
[[  0.74213848   0.71821492   1.33407191]
 [ 31.21423664  72.07804062 118.99954394]]
A1: 
[[  1.13153837   1.11539374   1.5678804 ]
 [ 31.21423664  72.07804062 118.99954394]]
Z2: [[21.5041675  51.39846893 85.

**(c)**	Compute the prediction based on input feature matrix X above after the 1 epoch.

In [55]:
# forward propagation
# layer 1
Z1 = np.dot(W1,X)+b1
A1 = np.log(1+np.exp(Z1))
print("Z1: \n{}".format(Z1))
print("A1: \n{}".format(A1))
# layer2
Z2 = np.dot(W2,A1)+b2
A2 = 1/(1+np.exp(-Z2))
print("Z2: {}".format(Z2))
print("A2: {}".format(A2))
# prediction
Y_pred = np.round(A2)
print("Prediction: {}".format(Y_pred))

Z1: 
[[  0.74213848   0.71821492   1.33407191]
 [ 31.21423664  72.07804062 118.99954394]]
A1: 
[[  1.13153837   1.11539374   1.5678804 ]
 [ 31.21423664  72.07804062 118.99954394]]
Z2: [[21.5041675  51.39846893 85.24167363]]
A2: [[1. 1. 1.]]
Prediction: [[1. 1. 1.]]


**(d)**	Compute the accuracy of the prediction in (c) when compared against the actual Y specified above.

In [46]:
# compute accuracy
accuracy = np.mean(np.absolute(Y-Y_pred)<1e-7)
print("accuracy: {}".format(accuracy))

accuracy: 0.3333333333333333
