## Lecture 8: Introduction to feedforward neural networks

**Motivation**

The traditional linear and non-linear classifiers that we have seen so far project the data into a **fixed** feature representation. For example in non-linear classification the classifier used to be of the form:

$sign(\theta.\phi(x))$, where $\phi(x)$ was always a fixed feature transformation. In Neural Networks we try to learn both:

1. The feature transformation $\phi(x)$
2. The ML task (Classification or Regression)

**Neural Network Units, Introduction to deep neural networks**

Refer to the slide `./decks/Neural Networks.pptx` The idea of activation functions, flow of data through the network etc is described from slide 1 to 26.

The rest of the slides build on the intuition about what parameter learning means for neural networks, builds the intution about the loss functions and data flow. Refer to the `./excel/Numerical Examples.xlsx` for more detailed explanation.

**Coding a forward pass**

The forward pass of any neural network can be easily represented as a sequence of matrix multiplication steps.

![](./imgs/nn.png)

This network can be described as follows:

- Input vector = $X = (x1,x2)$
- Weight Matrix (hidden layer) = $$W = \begin{bmatrix}
W_{11}&&W_{21}\\
W_{12}&&W_{22}\\
W_{13}&&W_{23}\\
W_{14}&&W_{24}\\
\end{bmatrix}
$$
*note the subscripts are being mapped to weights in the figure

- Bias/offset Matrix (hidden layer) = $$
W_0 = \begin{bmatrix}
W_{01}\\
W_{02}\\
W_{03}\\
W_{04}\\
\end{bmatrix}
$$

Now the forward pass for the hidden layer can be described as 

$$W \times X + W_0= Z = \begin{bmatrix}
z_{1}\\
z_{2}\\
z_3\\
z_4\\
\end{bmatrix}
$$

Applying the activation function $f$ over the matrix $Z$ will complete the forward pass.

$$f(W \times X + W_0)= f(Z) = f(\begin{bmatrix}
z_{1}\\
z_{2}\\
z_3\\
z_4\\
\end{bmatrix}) = 
\begin{bmatrix}
f(z_{1})\\
f(z_{2})\\
f(z_3)\\
f(z_4)\\
\end{bmatrix}
$$

For the output layer:

- The weight matrix is $$V = \begin{bmatrix}
V_{11}&&V_{21}&&V_{31}&&V_{41}\\
V_{12}&&V_{22}&&V_{32}&&V_{42}\\
\end{bmatrix}
$$

- The bias/offset matrix is $$V_0 = \begin{bmatrix}
V_{01}\\
V_{02}\\
\end{bmatrix}
$$

The rest of the forward pass can be described as follows:

$$softmax(V \times f(Z) + V_0) = softmax(U) = \begin{bmatrix} 
\frac{e^{(u_1)}}{e^{(u_1)}+e^{(u_2)}}\\
\frac{e^{(u_2)}}{e^(u_1)+e^{(u_2)}}\\
\end{bmatrix}
$$


Lets see a piece of code to impliment the above math

In [1]:
import numpy as np
X = np.matrix([[3,14]])
W = np.matrix([[1,0],
             [0,1],
             [-1,0],
             [0,-1]])
W0 = np.matrix([[-1,-1,-1,-1]])
V = np.matrix([[1,1,1,1],
             [-1,-1,-1,-1]])
V0 = np.matrix([[0,2]])

def relu(z):
    z[z<0]=0
    return z

def softmax(z):
    z = np.exp(z)
    z = z/z.sum(axis=0)
    return z

In [2]:
X.shape

(1, 2)

In [3]:
X

matrix([[ 3, 14]])

In [4]:
X = X.reshape((2,1))

In [5]:
X

matrix([[ 3],
        [14]])

In [6]:
W.shape

(4, 2)

In [7]:
W0.shape

(1, 4)

In [8]:
W0 = W0.reshape((4,1))

In [9]:
Z = W@X+W0

In [10]:
Z

matrix([[  2],
        [ 13],
        [ -4],
        [-15]])

In [11]:
Z.shape

(4, 1)

In [12]:
V.shape

(2, 4)

In [13]:
V@Z

matrix([[-4],
        [ 4]])

In [14]:
V0.shape

(1, 2)

In [15]:
V0 = V0.reshape((2,1))

In [16]:
V@Z+V0

matrix([[-4],
        [ 6]])

In [17]:
V@relu(Z)+V0

matrix([[ 15],
        [-13]])

In [18]:
relu(V@relu(Z)+V0)

matrix([[15],
        [ 0]])

### Introduction to OOPs

We will try to write neural networks using the idea of python classes. First we will learn how to write basic python classes and then we will use the same core logic as we have come up with till now to create class that can do a forward pass.

In [19]:
class MLMIT():
    def __init__(self,session_name,duration):
        self.session_name=session_name
        self.duration=duration
    def display(self):
        print("The session names are {}".format(self.session_name))
        print("The durations are {}".format(self.duration))
c=MLMIT(session_name=["Maths","Linear Classifiers","Reccommendation Engines","Kernels"],
           duration=[90,90,90,90])        
c.display()
print(dir(c))

The session names are ['Maths', 'Linear Classifiers', 'Reccommendation Engines', 'Kernels']
The durations are [90, 90, 90, 90]
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'display', 'duration', 'session_name']


In [20]:
c.session_name

['Maths', 'Linear Classifiers', 'Reccommendation Engines', 'Kernels']

In [21]:
c.duration

[90, 90, 90, 90]

#### Class Excercise: 
Create a python class that can take the session_names, durations as input and output impliment a method called avg_duration() with computes the average duration of all the sessions.

In [22]:
class MLMIT():
    def __init__(self,session_name,duration):
        self.session_name = session_name
        self.duration = duration
    
    def avg_duration(self):
        n = len(self.duration)
        avg = sum(self.duration)/n
        return avg

In [23]:
c = MLMIT(session_name=["Maths","Linear Classifiers","Reccommendation Engines","Kernels"],
           duration=[90,90,90,90])

In [24]:
c.avg_duration()

90.0

One can also write the same logic and encapsulate it in a python class.

In [25]:
class NN():
    def __init__(self,X):
        self.W = np.matrix([[1,0],
                             [0,1],
                             [-1,0],
                             [0,-1]])
        self.W0 = np.matrix([[-1,-1,-1,-1]]).reshape((4,1))
        self.V = np.matrix([[1,1,1,1],
                         [-1,-1,-1,-1]])
        self.V0 = np.matrix([[0,2]]).reshape((2,1))
        self.X = X
    def _relu(self,z):
        z[z<0]=0
        return z
    def forward(self):
        Z = W@X+W0
        f_Z = self._relu(Z)
        U = V@f_Z+V0
        f_U = self._relu(U)
        return f_U

In [26]:
model = NN(X)

In [27]:
model.forward()

matrix([[15],
        [ 0]])