## A friendly introduction to Recurrent Neural Networks

> Notes on the tutorial by [Luis Serrano](https://www.youtube.com/watch?v=UNmqTiOnRfg)

---

### Foundations and Simple NN architecture

Your roommate cooks apple pie, burger or chicken

If weather is sunny -> apple pie
if rainy -> burger

We can imagine a very simple NN to model this 

The NN has ip and op
If ip is sunny day, op is apple pie

Lets model this in terms of vectors

<img src='./img/diag1.png'>

So in our NN

When ip is `[1,0]` output should be `[1,0,0]` and when ip is `[0, 1]` op should be `[0,1,0]`

ip is 2x1 and op is 3x1

so matrix should be `3x2 as 3x2 . 2x1 = 3x1`

This matrix is
```
[
    [1,0],
    [0,1],
    [0,0]
]
```
<img src='./img/diag2.png'>

<img src='./img/diag3.png'>

So this NN is just like a linear map






For the sunny part the ip vector is `[1, 0]`. The matrix product is:

```
[
    [1.1 + 0.0]
    [0.1 + 1.0]
    [0.1 + 0.0]
]
```
For the rainy part the ip vector is `[0, 1]`. The matrix product is:

```
[
    [1.0 + 0.1]
    [0.0 + 1.1]
    [0.0 + 0.1]
]
```

Now consider the diag below:

<img src='./img/diag4.png'>


The ip nodes have 1 and 0 in them. Let us label the nodes in the 2nd layer as A, B and C from top to bottom

```
A = 1.1 + 0.0

B = 1.0 + 0.1

C = 1.0 + 0.0
```

Which comes to the same as the above matrix product 

So we can kind of generalize this. Here 2 ips and 3 ops so the matrix is 2x3 and we have 3 (3x1) op

**Also from the NN diag can we determine the matrix operation?**

Not the ip is `[1, 0]` as in the ip nodes so that is fixed

Now in the matrix prod we can see A is `1(x1).1(w1) + 0(x2).0(w2)` where x1 and x2 are the ips and w1 and w2 are the wts connecetd to node A

So the first row of the wt matrix (the one responsible for generating the op A is `[w1, w2]` i.e `[1, 0]` or simply the arrows coming into node A

Similarly 2nd row = `[0, 1]` (arrows coming into node B)

And the 3rd row = `[0, 0]` (arrows coming into node C)

That was a simple NN, lets change things a bit now

### Simple RNN architecture

Say your roommate cooks in the following order: pie->burger->chicken

So in the RNN the op (say pie) goes back in as ip. When pie is the ip, the op is burger, then burger goes back in as ip and op is chicke.. this is repeated...

So basically the ip is the op in the prev step

Remeber our ips were `pie = [1,0,0] burger = [0,1,0] chicken = [0,0,1]`

Now our matrix will be :
```
[
    [0 0 1]
    [1 0 0]
    [0 1 0]
]
```

Lets verify this, when ip is  `pie = [1,0,0]`:

<img src='./img/diag5.png'>

Similarly for the remaining items..

Now we can represent the NN arch as before, in the following diag only the edges with wt=1 are shown:

<img src='./img/diag6.png'>





Now that we have defined the NN architecture for this problem, lets take a deeper look. The nodes on the right are op and the op comes back in as ip.

So what the NN should look like is:

<img src='./img/diag7.png'>

### Slightly more complicated problem

We still have the sequence pie->burger->chicken

But our roommate also takes into account the weather

If sunny, he goes out to enjoy and we get the same food as yday

If rainy, we get the next food in the sequence

So the NN looks like:

<img src='./img/diag8.png'>

This basically says that the food yday was pie and we have another ip i.e the weather which is rainy, here op will be a burger

Lets look at vectors again

<img src='./img/diag9.png'>

Now the NN cant be represented by a single matrix, we will now require a bunch of matrices

<img src='./img/diag10.png'>

Lets break this down, consider the food matrix

#### Food matrix

It is a 6x3 matrix but we have shown tehe line for explanation purpose

When we multiply this by the pie vector (6x3.3x1 = 6x1)

This 6x1 vector is like a concatenation of 2 3x1 vectors. The top one is same (pie) and the bottom one is food for next day (burger)

<img src='./img/diag11.png'>

Another example:

<img src='./img/diag12.png'>


> So what the food matrix does is it takes the vector for today's food and it returns the vector for today's food concatenated 
with the vector for next day's food



#### Weather matrix

The weather matrix is a concat of 2 3x2 matrices

When we pass the vector of sunny(2x1 matrix) we get a (6x2.2x1)=6x1 matrix

This resulting vector tells us that its the same day (food to be cooked for same day)

<img src='./img/diag13.png'>

Similarly if we mul it by the vector for rainy day, we get 0s on top for the same day and 1s on bottom for next day, so the op
is next day i.e cook food for next day

<img src='./img/diag14.png'>

> Note: Here we just want to get same or diff which can be encoded in 1 bit why use 6. This is because the matrix from the prev step is 6x1 and we need to merge these two

> So this weather matrix is basically telling we should I cook ydays food or the next day food based on the ip

