### 10.4: Neural Networks: Multilayer Perceptron Part 1 - The Nature of Code

[Playlist link](https://www.youtube.com/watch?v=ntKn5TPHHAk&list=PLRqwX-V7Uu6Y7MdSCaIfsxc561QI0U0Tb&index=2)

[README link](https://www.youtube.com/redirect?v=ntKn5TPHHAk&event=video_description&redir_token=-D8YTb5wSbqv2u0AQ2wgfaQYApl8MTUzNzEyNjY3MkAxNTM3MDQwMjcy&q=https%3A%2F%2Fgithub.com%2Fshiffman%2FNOC-S17-2-Intelligence-Learning%2Ftree%2Fmaster%2Fweek4-neural-networks)

---

#### Some recap

We had a canvas full of pts and there was a line separating the 2 classes of pts. We had a simple perceptron trying to learn that line

The scenario looked like this:

![](./data/img/diag20.png)

#### Why is this not enough?

**What if we want any number of ips to generate any number of ops**. This is very common in many ML applications.
Let us take a very classic classification problem

Say we have a handwritten digit (say 8). I have all of the pixels (28x28) of that digit. These pixels (784 greyscale values) will be the ips to the Perceptron. The op will be a set of probabilities as to which digit it is

![](./data/img/diag21.png)

But we can simply have a whole bunch of ips, a whole bunch of ops but a single processing unit - Why cant we do that?



#### Need of a MultiLayer Perceptron

A single Perceptron can only solve **linearly separable problems**

When we used a Perceptron in our prev example to classify the data we were dealing with a linearly separable prob. This means that it was possible to draw a st line that separated the 2 classes of data

If it was in 3d I could put a plane and that would also be linearly separable because we can divide the space in half

But most interesting probs are not LS

![](./data/img/diag22.png)

There is data in the canvas that clusters near the center. It is not LS



#### Simple use case for a Multilayer Perceptron

Consider the Boolean expr A && B and see the truth table below:

![](./data/img/diag23.png)


This is a linearly seoparable prob as shown by the dotted line in the diagram

This mean we can create a Perceptron with 2 ips which can be T or F and an op

Same thing for A || B

OR is also linearly separable

So we can have Perceptrons learn to do A&&B and A||B

![](./data/img/diag24.png)


There is another bool expr called XOR. Its only true if one is T and one is F

However it is not possible to draw a single line to divide the Ts and Fs

![](./data/img/diag25.png)

**So XOR is not a linearly separable prob**

However we can draw **2 lines** to separate the Ts and Fs. That means a single simplest Perceptron cannot solve a simple operation like XOR.

![](./data/img/diag26.png)

---

Say we have a perceptron that knows how to solve AND. If AND is linearly separable !AND ie **NAND** is also LS
Also we have another perceptron that knows how to solve OR

**More complex probs that are not linearly separable can be solved by liking multiple perceptrons together**

![](./data/img/diag27.png)

Suppose A and B are the ips 

The !AND or NAND computes $\overline{AB}$

The OR computes $A + B$

The final AND does

$\overline{AB}\cdot (A + B)$


$\overline{AB}\cdot (A + B) = (\overline{A} + \overline{B})(A + B) = \overline{A}B + A\overline{B} = A \bigoplus B $

In the diag it is a 3 layered NN

There is the ip layer, op layer and the hidden layer

Hidden layer consists of the neurons that sit bw the ips and ops. They are called hidden because the user of the system does not see them. The user simply feeds in data and looks at the op. But the hidden layer is where the magic happens. This hidden layer is what allows us to get around the LS constraint. More hidden layers, more neurons, more complexity, more wts and params that need to be tweaked.

---

### 10.5: Neural Networks: Multilayer Perceptron Part 2 - The Nature of Code

[GitHub link 1](https://github.com/CodingTrain/website/tree/master/Courses/natureofcode/10.18-toy_neural_network)
[GitHub link 2](https://github.com/CodingTrain/Toy-Neural-Network-JS/)

Here we want to design the very basic architecture of a NN library

We want to create 3 things

1. Input layer
2. Hidden layer
3. Output layer

The way we want to design this library is:

    new NeuralNetwork(num_ip_neurons, num_hidden_neurons, num_op_neurons)
    
Say there are going to be 3 ip neurons, 4 hidden neurons and 2 op neurons

Say we consider the House Price Predn problem. The 3 ips will then be no of bedrooms, no of bathrooms, sq footage area

We want a **Fully Connected Network**. This means that every ip is connected to every hidden neuron and every hidden neuron is connected to every op neuron

So the 3 ips come in, the data feeds forward and the 2 ops come out

![](./data/img/diag28.png)

This is the overall structure of our **Multi Layered Feed Forward Fully Connected Perceptron**

We want to first figure out in what shape our ips come in. Thats how many neurons we want for ip layer. What shape is the op that I want, that is how many op neurons I want. How many hidden neurons is kind of an open question

Also this is an oversimplification of how NN architectures can be. This is by defn a 3 layered nw and this library is only going to allow for a 3 layered nw. But a lot of NN systems need multiple hidden layers as well

Setting things up in code:

``` javascript
// constructor function

function NeuralNetwork(numI, numH, numO){
    this.input_nodes = numI;
    this.hidden_nodes = numH;
    this.output_nodes = numO;

}
```



#### Feed Forward Process

Lets pretend our NN is designed to solve the House pred prob

The 3 ips will then be no of bedrooms (say 3), no of bathrooms (say 2), sq footage area (say 1000)

![](./data/img/diag29.png)

The number 3 comes in through the first ip neuron. Each connection bw the ips and the hidden neurons has a wt associated with it. Ultimately we want to tweak those wts to get good results (ops). 

Initially, these wts are going to have random values bw -1 and +1. 

Lets examone all the connections flowing in to the first hidden neuron. This particular neuron has 3 conn flowing in. Lets say the wts are 0.5, -0.5 and 1

The hidden neuron computes a **weighted sum**

Weighted sum for 1st hidden neuron 

    = 3 * 0.5 + 2 * (-0.5) + 1000 * 1

    = 1000.5
    
The fact that sq footage is a large number weights the sum in its favour so we have to normalize the ips

![](./data/img/diag30.png)

Once the wt sum is computed, it gets passed through an **Activation Function** and then gets passed on to the ops

#### Data Structure for storing the weighted connections

We use a **Matrix**

The idea here is to **store the weighted connections in a Matrix. The ips is an array. We take the ip array and mul it by the matrix of wts and generate the ops of the hidden layer**

Lets look at a simpler diag with fewer neuron for understanding this

![](./data/img/diag31.png)

X1 and X2 are the 2 ips

1 and 2 are the hidden layer neurons

wij: weight bw input i and hidden neuron j

The weight matrix:

\begin{bmatrix}w_{11} & w_{21} \\w_{12} & w_{22} \end{bmatrix}

The ip matrix:

\begin{bmatrix}x_{1} \\x_{2} \end{bmatrix}

Since this is one dimensional it is also called a **Vector**

Now we need a wt sum that comes out of neuron 1. This wt sum is:

$x_{1}\times w_{11}\times x_{2}\times w_{21} \rightarrow h_{1}$

The wt sum of neuron 2 is:

$x_{1}\times w_{12}\times x_{2}\times w_{22} \rightarrow h_{2}$

Now we have:

$\begin{bmatrix}w_{11} & w_{21} \\w_{12} & w_{22} \end {bmatrix}\times\begin{bmatrix}x_{1} \\x_{2} \end{bmatrix}=\begin{bmatrix}h_{1} \\h_{2} \end{bmatrix}$



#### Understanding the Math - Linear Algebra + Coding the basic functions

There are 2 key concepts in Linear Algebra:

- Vector

- Matrix

Vector Dot Product:

$\begin{bmatrix}a_{x} & a_{y}  \end {bmatrix}\cdot\begin{bmatrix}b_{x} \\b_{y} \end{bmatrix}=a_{x}b_{x} + a_{y}b_{y}$

Creating a Simple Matrix library:

We basically want to have a constructor function which creates a matrix with specified no of rows and cols and initializes eaxh elem of the matrix to 0:

``` javascript

function Matrix(rows, cols){
    this.rows = rows;
    this.cols = cols;
    this.matrix = [];

    // loop through each row
    for (var i = 0; i < this.rows; ++ i){
        // every single row is also an array
        this.matrix[i] = [];
        for (var j = 0; j < this.cols; ++j){
            // initialize each value to 0
            this.matrix[i][j] = 0;
        }
    }
}

```

Coding the scalar functions:

``` javascript

// scalar addition function

Matrix.prototype.add = function(n){
    for(var i = 0; i < this.rows; ++i){
        for(var j = 0; j < this.cols; ++j){
            this.matrix[i][j] += n;
        }
    }
}

// scalar multiplication function

Matrix.prototype.multiply = function(n){
    for(var i = 0; i < this.rows; ++i){
        for(var j = 0; j < this.cols; ++j){
            this.matrix[i][j] *= n;
        }
    }
}

```

Randomly populate values in the matrix:

``` javascript

Matrix.prototype.randomize = function(){
    for(var i = 0; i < this.rows; ++i){
        for(var j = 0; j < this.cols; ++j){
            this.matrix[i][j] = Math.floor(Math.random() * 10);
        }
    }
}

```

**A really cool way to view Matrices in the console:**
    
    var m = new Matrix(3,2);
    console.table(m.matrix);

---

Now we will look at **elementwise ops**

While adding or subtracting 2 matrices we need to check that they have same dimensions

We want to keep the same function add() which can receive scalars as well as matrices as ips and compute the result accordingly

``` javascript

Matrix.prototype.add = function(n){

    // check if n is a matrix

    if (n instanceof Matrix){
        for(var i = 0; i < this.rows; ++i){
            for(var j = 0; j < this.cols; ++j){
                this.matrix[i][j] += n.matrix[i][j];
            }
        }
    } else 
    // n is just a scalar number 
    {
        for(var i = 0; i < this.rows; ++i){
            for(var j = 0; j < this.cols; ++j){
                this.matrix[i][j] += n;
            }
        }
    }

}

```

Hadamard Product: elementwise product

![](https://wikimedia.org/api/rest_v1/media/math/render/svg/06e3f6abf1511656029ce58b89695b687789aa9c)

``` javascript

multiply(n){
    
        if (n instanceof Matrix){
            for(let i = 0; i < this.rows; ++i){
                for(let j = 0; j < this.cols; ++j){
                    this.matrix[i][j] *= n.matrix[i][j];
                }
            }
        } else {
            for(let i = 0; i < this.rows; ++i){
                for(let j = 0; j < this.cols; ++j){
                    this.matrix[i][j] *= n;
                }
            }
        }
        
    }

```

