## Day1:
- Artificial Neuron
- Feedforward neural network
- Measuring Loss 

In [1]:
import numpy as np

def relu(z):
    return max(z,0)

def sigmoid(z):
    return 1/(1+np.exp(-z))

def neuron(x,w, activation):
    if len(x)+1!=len(w):
        raise("The input and the weights supplied should have compatible dimensions")
    z = 0
    for data,wt in zip(x,w[1:]):
        z+=data*wt
    return activation(z)

In [2]:
data = [3,8,9,11]
weights = [6,7,8,9,10]
neuron(data,weights,relu)

276

**Coding a forward pass**

The forward pass of any neural network can be easily represented as a sequence of matrix multiplication steps.

![](./imgs/nn.png)

This network can be described as follows:

- Input vector = $X = (x1,x2)$
$$X = \begin{bmatrix}
x1\\
x2\\
\end{bmatrix}
$$
- Weight Matrix (hidden layer) = $$W = \begin{bmatrix}
W_{11}&&W_{21}\\
W_{12}&&W_{22}\\
W_{13}&&W_{23}\\
W_{14}&&W_{24}\\
\end{bmatrix}
$$
*note the subscripts are being mapped to weights in the figure

- Bias/offset Matrix (hidden layer) = $$
W_0 = \begin{bmatrix}
W_{01}\\
W_{02}\\
W_{03}\\
W_{04}\\
\end{bmatrix}
$$

Now the forward pass for the hidden layer can be described as 

$$W \times X + W_0= Z = \begin{bmatrix}
z_{1}\\
z_{2}\\
z_3\\
z_4\\
\end{bmatrix}
$$

Applying the activation function $f$ over the matrix $Z$ will complete the forward pass.

$$f(W \times X + W_0)= f(Z) = f(\begin{bmatrix}
z_{1}\\
z_{2}\\
z_3\\
z_4\\
\end{bmatrix}) = 
\begin{bmatrix}
f(z_{1})\\
f(z_{2})\\
f(z_3)\\
f(z_4)\\
\end{bmatrix}
$$

For the output layer:

- The weight matrix is $$V = \begin{bmatrix}
V_{11}&&V_{21}&&V_{31}&&V_{41}\\
V_{12}&&V_{22}&&V_{32}&&V_{42}\\
\end{bmatrix}
$$

- The bias/offset matrix is $$V_0 = \begin{bmatrix}
V_{01}\\
V_{02}\\
\end{bmatrix}
$$

The rest of the forward pass can be described as follows:

$$softmax(V \times f(Z) + V_0) = softmax(U) = \begin{bmatrix} 
\frac{e^{(u_1)}}{e^{(u_1)}+e^{(u_2)}}\\
\frac{e^{(u_2)}}{e^(u_1)+e^{(u_2)}}\\
\end{bmatrix}
$$


Lets see a piece of code to impliment the above math

In [3]:
X = np.matrix([[3,14]])
W = np.matrix([[1,0],
             [0,1],
             [-1,0],
             [0,-1]])
W0 = np.matrix([[-1,-1,-1,-1]])
V = np.matrix([[1,1,1,1],
             [-1,-1,-1,-1]])
V0 = np.matrix([[0,2]])

def relu(z):
    z[z<0]=0
    return z

def softmax(z):
    z = np.exp(z)
    z = z/z.sum(axis=0)
    return z

In [4]:
X.shape

(1, 2)

In [5]:
X

matrix([[ 3, 14]])

In [6]:
X = X.reshape((2,1))

In [7]:
X

matrix([[ 3],
        [14]])

In [8]:
W.shape

(4, 2)

In [9]:
W0.shape

(1, 4)

In [10]:
W0 = W0.reshape((4,1))

In [11]:
Z = W@X+W0

In [12]:
Z

matrix([[  2],
        [ 13],
        [ -4],
        [-15]])

In [13]:
Z.shape

(4, 1)

In [14]:
V.shape

(2, 4)

In [15]:
V@Z

matrix([[-4],
        [ 4]])

In [16]:
V0.shape

(1, 2)

In [17]:
V0 = V0.reshape((2,1))

In [18]:
V@Z+V0

matrix([[-4],
        [ 6]])

In [19]:
V@relu(Z)+V0

matrix([[ 15],
        [-13]])

In [20]:
relu(V@relu(Z)+V0)

matrix([[15],
        [ 0]])

In [21]:
class NN():
    def __init__(self,X):
        self.W = np.matrix([[1,0],
                             [0,1],
                             [-1,0],
                             [0,-1]])
        self.W0 = np.matrix([[-1,-1,-1,-1]]).reshape((4,1))
        self.V = np.matrix([[1,1,1,1],
                         [-1,-1,-1,-1]])
        self.V0 = np.matrix([[0,2]]).reshape((2,1))
        self.X = X
    def _relu(self,z):
        z[z<0]=0
        return z
    def forward(self):
        Z = W@X+W0
        f_Z = self._relu(Z)
        U = V@f_Z+V0
        f_U = self._relu(U)
        return f_U

In [22]:
model = NN(X)

In [23]:
model.forward()

matrix([[15],
        [ 0]])

## Measuring loss

**Mean squared loss**
$MSL = \frac{1}{N}*\sum^{N}_{i = 1}(yhat_i-y_i)^2$

**Cross entropy loss**
For an observation $o$, the loss is defined as
$CEL = -\sum^{m}_{c=1}y_{o,c}log(p_{o,c})$

In [24]:
y_hat_reg = [10,12,11.5,9.8]
y_reg = [9,11,13,10]
y_hat_cls = [[0.3,0.3,0.4],[0.9,0.05,0.05],[0.10,0.8,0.10],[0.20,0.10,0.70]]
y_cls = [[0,0,1],[1,0,0],[1,0,0],[0,0,1]]

In [27]:
def mse(y,y_hat):
    err = 0
    for actual,pred in zip(y,y_hat):
        err+=(actual-pred)**2
    return err/len(y)      

In [28]:
mse(y_reg,y_hat_reg)

1.0725

In [29]:
import math
def ce(y,y_hat):
    err = 0 
    for actuals,preds in zip(y,y_hat):
        for actual,pred in zip(actuals,preds):
            err+=actual*math.log(pred)
    err = err/len(y)
    return err       

In [30]:
ce(y_cls,y_hat_cls)

-0.9202278211161898