# Introduction

In the last lesson, we saw how to represent our neural network with matrices, now let's see how matrix multiplication comes into play.

### Beginning with linear regression

Now our general formula for linear regression is the following.

$f(x) = w_1x_1 + w_2x_2 + w_3x_3 + w_4x_4 + b$

Let's review how to represent the linear component of a single neuron, that is a linear regression function with matrix multiplication.

In [1]:
import torch

In [2]:
torch.manual_seed(0)
X = torch.randint(-10, 10, (1, 4))
X

tensor([[ -6,   9,   3, -10]])

So the above initializes an matrix that has one observation, and four features.

Now, we something to represent our linear regression model.

$f(x) = 3x_1 + 2x_2 + 10x_3 + 6x_4 + 10$

We can evaluate our function above with our matrix, X, above with the following.

In [12]:
w_1 = torch.tensor([3, 2, 10, 6])
w_1

tensor([ 3,  2, 10,  6])

We'll worry about representing our value of $b$, here 10, later on.  Let's use `view` to allocate a separate row for each parameter.

In [13]:
w_1 = w_1.view(4, 1)
w_1

tensor([[ 3],
        [ 2],
        [10],
        [ 6]])

And $x$ is the following.

In [14]:
X

tensor([[ -6,   9,   3, -10]])

Ok, now for linear regression, our goal is to multiply the rows of X, by our column of weights.  This way we will have something like the following. 

In [15]:
-6*3 + 9*2 + 3*10 + -10*6

-30

How do we accomplish this?

$x \cdot w_1 = \begin{bmatrix}
 -6& 9 &3  &-10
\end{bmatrix} \cdot
\begin{bmatrix} 
3 \\
 2
\\ 
10
\\
6
\end{bmatrix}
$

In [16]:
X@w_1

tensor([[-30]])

Finally we can add `b`.

In [8]:
b = 10
X@w_1 + b

tensor([[-20]])

So that would be the hypothesis of our single linear function.

### Multiple neurons

Now once again, we'll work with only one observation, except that this time we'll see what it's like to work with two linear functions.

Again, let's start with our observation with four features.

In [9]:
X

tensor([[ -6,   9,   3, -10]])

Now to feed our observation through two neurons, we can add a second row to our weight matrix.

> Here was our first linear function.

$f(x) = 3x_1 + 2x_2 + 10x_3 + 6x_4 + 10$

In [17]:
w_1

tensor([[ 3],
        [ 2],
        [10],
        [ 6]])

> Here is our second.

$g(x) = 4x_1 + 5x_2 + 6x_3 + x_4 + 2$

In [21]:
w_2 = torch.tensor([4, 5, 6, 1]).view(4, 1)
w_2

tensor([[4],
        [5],
        [6],
        [1]])

> Let's add them together to build a network.

In [84]:
W = torch.stack((w_1, w_2), 2).view(4, 2)
W

tensor([[ 3,  4],
        [ 2,  5],
        [10,  6],
        [ 6,  1]])

In [57]:
W = W.squeeze()
W

tensor([[ 3,  4],
        [ 2,  5],
        [10,  6],
        [ 6,  1]])

Now how do we get each of these linear functions to make their own hypothesis?

We can do simple matrix multiplication.

$x \cdot W = \begin{bmatrix}
 -6& 9 &3  &-10
\end{bmatrix} \cdot \begin{bmatrix}
3 &4 \\ 
 2&5 \\ 
 10& 6\\ 
6 & 1
\end{bmatrix} = \begin{bmatrix} -30 \\ -29 \end{bmatrix}$

Implementing this in Python isn't so bad.

In [18]:
X

tensor([[ -6,   9,   3, -10]])

In [51]:
l1 = X@W 
l1

tensor([[-30,  29]])

Finally, we add in our bias terms.

$b = \begin{bmatrix} 10 \\ 2 \end{bmatrix}$

In [52]:
b = torch.tensor([10, 2])
b

tensor([10,  2])

Let's make sure the dimensions of our bias vector is the same as our prediction.

In [55]:
b = b.view(1, 2)
b

tensor([[10,  2]])

In [56]:
l1 + b

tensor([[-20,  31]])

Great.  We just made two predictions.

### Refactoring

Now that we saw how we can use a weight matrix to make our predictions with some matrix multiplication, let's move this into a function.

In [58]:
X

tensor([[ -6,   9,   3, -10]])

In [59]:
W

tensor([[ 3,  4],
        [ 2,  5],
        [10,  6],
        [ 6,  1]])

In [135]:
def execute_linear(X, W, b):
    return (X@W) + b

In [136]:
execute_linear(X, W, b)

tensor([[-25,  41]])

Looks like we made just built a first layer of a neural network.

### Activating our neuron

After passing our data through two linear functions, the next thing to do is to apply the outputs of our linear functions to a non-linear component. 

For now we can use the sigmoid function.

In [73]:
import numpy as np
def sigmoid(layer):
    return 1/(1 + np.exp(layer))

In [76]:
l2 = sigmoid(l1)
l2

tensor([[1.0000e+00, 2.5437e-13]], dtype=torch.float64)

### Summary

In this lesson we saw that we can apply multiple linear functions to an observation simultaneously with a weight matrix.

$x \cdot W = \begin{bmatrix}
 -6& 9 &3  &-10
\end{bmatrix} \cdot \begin{bmatrix}
3 &4 \\ 
 2&5 \\ 
 10& 6\\ 
6 & 1
\end{bmatrix} + \begin{bmatrix}
10 \\ 
 2 \\ 
\end{bmatrix} = \begin{bmatrix} -20 \\ 31 \end{bmatrix}$

# Stacking layers

### Introduction

In the previous lesson, we saw how we can create a layer of a neural network with a linear component followed by a non-linearity.  We created the linear component with a weight matrix which we multiplied to our observation.  

$x \cdot W = \begin{bmatrix}
 -6& 9 &3  &-10
\end{bmatrix} \cdot \begin{bmatrix}
3 &4 \\ 
 2&5 \\ 
 10& 6\\ 
6 & 1
\end{bmatrix} + \begin{bmatrix}
10 \\ 
 2 \\ 
\end{bmatrix} = \begin{bmatrix} -20 \\ 31 \end{bmatrix}$

In [78]:
import torch
torch.manual_seed(0)
X = torch.randint(-10, 10, (1, 4))
X

tensor([[ -6,   9,   3, -10]])

In [86]:
w_1 = torch.tensor([3, 2, 10, 6]).view(4, 1)
w_2 = torch.tensor([4, 5, 6, 1]).view(4, 1)
W = torch.stack((w_1, w_2), 2).view(4, 2)
W

b = torch.tensor([10, 2])

In [88]:
def execute_linear(X, W, b):
    return (X@W) + b

In [87]:
execute_linear(X, W, b)

tensor([[-20,  31]])

$\sigma(\begin{bmatrix} -20 \\ 31 \end{bmatrix}) = \begin{bmatrix} \sigma(-20) \\ \sigma(31) \end{bmatrix}$

In [75]:
import numpy as np
def execute_sigmoid(layer):
    return 1/(1 + np.exp(layer))

In [91]:
def execute_layer(X, W, b):
    l1 = execute_layer(X, W, b)
    l2 = execute_sigmoid(l1)
    return l2

In [105]:
L1 = execute_layer(X, W, b)
L1

tensor([[1.0000e+00, 1.5629e-18]], dtype=torch.float64)

Well we can pass this output into another layer, where we can have as many linear functions as we want, each of which takes two parameters and a bias term. And then apply a non-linearity to each of the linear function outputs.

In [110]:
w_3 = torch.tensor([6, 1]).view(2, 1)
w_4 = torch.tensor([8, 9]).view(2, 1)

b_2 = torch.tensor([5, 12])

W_2 = torch.stack([w_3, w_4], 0).squeeze()
W_2


tensor([[6, 1],
        [8, 9]])

In [132]:
build_layer(L1, W_2.double(), b.double())

tensor([[1.6701e-05, 2.2603e-06]], dtype=torch.float64)

> We apply `.double` to make out type int tensors into `float64` tensors.

### Completing the network

At the end of the network, we would like to turn these different numbers into a prediction.  For example, for a classification function, like classifying a set of images into digits of 0-9.  To do that we can feed the output from our final layer into a softmax function.

A softmax function exaggerates our positive numbers so that they are closer to one, and minimizes our smaller numbers.

### Summary

### Resources

[neural networks intro](https://www.jeremyjordan.me/intro-to-neural-networks/)

* Perhaps in lab make them write checks that inputs are of the correct dimensions
* Add that in component about building the layer