## Lecture 8: Introduction to feedforward neural networks

**Motivation**

The traditional linear and non-linear classifiers that we have seen so far project the data into a **fixed** feature representation. For example in non-linear classification the classifier used to be of the form:

$sign(\theta.\phi(x))$, where $\phi(x)$ was always a fixed feature transformation. In Neural Networks we try to learn both:

1. The feature transformation $\phi(x)$
2. The ML task (Classification or Regression)

**Neural Network Units, Introduction to deep neural networks**

Refer to the slide `./decks/Neural Networks.pptx` The idea of activation functions, flow of data through the network etc is described from slide 1 to 26.

The rest of the slides build on the intuition about what parameter learning means for neural networks, builds the intution about the loss functions and data flow. Refer to the `./excel/Numerical Examples.xlsx` for more detailed explanation.

**Coding a forward pass**

The forward pass of any neural network can be easily represented as a sequence of matrix multiplication steps.

![](./imgs/nn.png)

This network can be described as follows:

- Input vector = $X = (x1,x2)$
- Weight Matrix (hidden layer) = $$W = \begin{bmatrix}
W_{11}&&W_{21}\\
W_{12}&&W_{22}\\
W_{13}&&W_{23}\\
W_{14}&&W_{24}\\
\end{bmatrix}
$$
*note the subscripts are being mapped to weights in the figure

- Bias/offset Matrix (hidden layer) = $$
W_0 = \begin{bmatrix}
W_{01}\\
W_{02}\\
W_{03}\\
W_{04}\\
\end{bmatrix}
$$

Now the forward pass for the hidden layer can be described as 

$$W \times X + W_0= Z = \begin{bmatrix}
z_{1}\\
z_{2}\\
z_3\\
z_4\\
\end{bmatrix}
$$

Applying the activation function $f$ over the matrix $Z$ will complete the forward pass.

$$f(W \times X + W_0)= f(Z) = f(\begin{bmatrix}
z_{1}\\
z_{2}\\
z_3\\
z_4\\
\end{bmatrix}) = 
\begin{bmatrix}
f(z_{1})\\
f(z_{2})\\
f(z_3)\\
f(z_4)\\
\end{bmatrix}
$$

For the output layer:

- The weight matrix is $$V = \begin{bmatrix}
V_{11}&&V_{21}&&V_{31}&&V_{41}\\
V_{12}&&V_{22}&&V_{32}&&V_{42}\\
\end{bmatrix}
$$

- The bias/offset matrix is $$V_0 = \begin{bmatrix}
V_{01}\\
V_{02}\\
\end{bmatrix}
$$

The rest of the forward pass can be described as follows:

$$softmax(V \times f(Z) + V_0) = softmax(U) = \begin{bmatrix} 
\frac{e^{(u_1)}}{e^{(u_1)}+e^{(u_2)}}\\
\frac{e^{(u_2)}}{e^(u_1)+e^{(u_2)}}\\
\end{bmatrix}
$$


Lets see a piece of code to impliment the above math

In [29]:
import numpy as np

X = np.array([[3,14]])
W = np.array([[1,0],
             [0,1],
             [-1,0],
             [0,-1]])
W0 = np.array([[-1,-1,-1,-1]])
V = np.array([[1,1,1,1],
             [-1,-1,-1,-1]])
V0 = np.array([[0,2]])

def relu(z):
    z[z<0]=0
    return z
def softmax(z):
    z = np.exp(z)
    z = z/z.sum(axis=0)
    return z

relu(V@(relu(W@X.T+W0.T))+V0.T)

array([[15],
       [ 0]])

One can also write the same logic and encapsulate it in a python class.

In [40]:
class NN():
    
    def __init__(self,W,W0,V,V0,X):
        self.W = W
        self.W0 = W0
        self.V = V
        self.V0 = V0
        self.X = X
    def relu(self,z):
        z[z<0]=0
        return z
    def forward(self):
        return relu(self.V@(self.relu(self.W@self.X.T+self.W0.T))+self.V0.T)

In [41]:
model = NN(W,W0,V,V0,X)

In [42]:
model.forward()

array([[15],
       [ 0]])