# Homework #2: Simple Neural Network Implementation using Numpy

In [46]:
import numpy as np
np.set_printoptions(suppress=True,precision=5)

### Define the Sigmoid Function and its Derivative
- Construct a function returning a sigmoid function:
$ \sigma(x) = \frac{1}{1 + e^{-x}} $
- Construct a function returning the derivative of a sigmoid function:
$ \frac{d\sigma(x)}{dx} = \sigma(x)(1 - \sigma(x)) $

In [47]:
def sigmoid(x):
    return 1/(1+np.exp(-x))
#i did not define derivative of sigmiod and write 1-sigmoid instead

### Initialize Weights
Build an array of three weights (3x1 array â€“ think why these dimensions!) and initialize their value randomly. (It is good practice to use weights with normal distribution of $ \mu = 0 $ and  $ \sigma = \frac{1}{3}  $ )

In [48]:
X = np.array([[0, 0, 1],
              [0, 1, 1],
              [1, 0, 1],
              [1, 1, 1]])
y = np.array([[0],
              [0],
              [1],
              [1]])
weights = np.random.normal(0,1/3,4) #we want to have a (output layer, input layer + 1) matrix of weights (+1 because of bias)
#in this case since the output layer has one node, we can use a vector of size 4 (3 nodes in the input layer + 1 bias)

X = np.hstack((X,np.ones((X.shape[0],1)))) # we add ones at the end of X to account for bias

# we want to use gradiant decent<br>

so we need to use the formula:
$$
\theta _{t+1} = \theta _k - \alpha \nabla L
$$
### we need to find $\frac{dL}{dw^l_{ij}}$
1. use chain rule
$$
\frac{\partial L}{\partial w^l_{ij}} = \frac{\partial L}{\partial O^l_j}\frac{\partial O^l_j}{\partial w^l_{ij}}\\
$$
2. we need to find each derivative <br> <br>
we will start with $\frac{\partial L}{\partial O^l_i}$ (we assume $l$ is the output layer):
$$ L = \frac{1}{2}\sum_i (O^l_i - y_i)^2 $$
$$\boxed{\frac{\partial L}{\partial O^l_i} = (O^l_i - y_i)}$$

now we will find $\frac{\partial O^l_i}{\partial w^l_{ij}}$:

$$z^l_j = \sum_i w^l_{ij}O^{l-1}_{i}$$
$$O^l_j = \sigma(\sum_i w^l_{ij}O^{l-1}_i) = \sigma(z^l_j)$$
$$\frac{\partial O^l_j}{\partial w^l_{ij}} = \frac{\partial O^l_j}{\partial z^l_{j}}\frac{\partial z^l_{j}}{\partial w^l_{ij}}$$
$$\frac{\partial O^l_j}{\partial z^l_{j}} = \sigma'(z^l_{j})\quad \quad
\frac{\partial z^l_{j}}{\partial w^l_{ij}} = O^{l-1}_{i}$$
$${\big \Downarrow}$$
$$\boxed{\frac{\partial O^l_j}{\partial w^l_{ij}} = O^{l-1}_{i}\sigma'(z^l_{j})}$$
now we just combine them together to get:

$$\frac{\partial L}{\partial w^l_{ij}} = \frac{\partial L}{\partial O^l_j}\frac{\partial O^l_j}{\partial w^l_{ij}} = (O^l_j - y_i)(O^{l-1}_{i}\sigma'(z^l_{j})) = O^{l-1}_{i}(O^l_j - y_i)\sigma(z^l_{j})(1-\sigma(z^l_{j}))$$
$$\boxed{\frac{\partial L}{\partial w^l_{ij}} = O^{l-1}_{i}(O^l_j - y_i)\sigma(z^l_{j})(1-\sigma(z^l_{j}))}
$$

notice:
we have 0 hidden layers so $l = 2$ <br>
we also define $O^1_4 = 1$ which means it defines the bias

### Training the Neural Network
Create a loop, iterating 1000 times (equal to the desired number of learning steps). For each iteration, calculate the difference between the network prediction and the real value of y. Multiply that difference with the sigmoid derivative and use the dot product of this number with the input layer to update your weights for the next iteration.

In [49]:
def prediction(X,weights):
    return sigmoid(X @ weights)
def loss(y_pred,y_true):
    return 0.5*np.sum((y_pred-y_true[:,0])**2)
def train(X,y,weights,epoch=1000,learning_rate=1):
    print("starting training:")
    for i in range(epoch):
        y_pred = prediction(X,weights)
        sigmoid_z = y_pred
        dL_dw = X*(y_pred-y[:,0])*sigmoid_z*(1-sigmoid_z)
        weights = weights - learning_rate*np.sum(dL_dw,axis=1)
        if (i % 200 == 0):
            print("epoch =",i)
            print("Loss =",loss(y_pred,y))
    return weights

In [50]:
model = train(X,y,weights) 
print("-----------")
print("trained model:")  
y_pred = prediction(X,model)

print('Loss =',loss(y_pred,y))
print("prediction =",y_pred)
print("ground truth =",y[:,0])

starting training:
epoch = 0
Loss = 0.674335077116371
epoch = 200
Loss = 0.009084894139190383
epoch = 400
Loss = 0.004161735796729394
epoch = 600
Loss = 0.0026720308653233583
epoch = 800
Loss = 0.0019604128874348603
-----------
trained model:
Loss = 0.001545275635414038
prediction = [0.02892 0.03373 0.97462 0.97827]
ground truth = [0 0 1 1]
