A really good explainer on how Neural Networks approximate functions using only linear functions and activation functions is Michael Nielsen's *A visual proof that neural nets can compute any function*. See [this link.](https://neuralnetworksanddeeplearning.com/chap4.html)

**MAKE SURE TO ADD NOTES ON ACTIVATION FUNCTIONS ONCE YOU HAVE A BETTER GRASP**

# Using PyTorch's NN modules
In our previous linear model (chap 05), we wrote a simple linear function to map a line through our data points. Below is the function.

In [3]:
# where w is the 'weight' and b is the 'bias'
def model(unknown_unit, w, b):
    return w * unknown_unit + b

PyTorch provides *modules* (known as *layers* in other frameworks) in the sub-library `torch.nn`. A PyTorch module is a class deriving from the `torch.nn.Module` base class. A module can have one or more `Parameter` instances as attributes which are the tensors to be optimized during training (in our case *w* and *b*). A module can also have one or more *submodules* which are subclasses of `nn.Module` as attributes and will be able to track their parameters as well. <br><br>
If we wanted to replace our above model function, we could use the subclass of `nn.Module`, `nn.Linear`. `nn.Linear` has the following parameters:
* in_features(int) - the size of the input.
* out_features(int) - the size of the output
* bias(bool) - defaults to `True`. If set to `False`, the model will not learn a bias. No *b* parameter!

---
### A Little tangent on matrix-vector products to help me full understand the above parameters


In [13]:
import torch
import torch.nn as nn # import torch.nn with a convenient alias

In [5]:
linear_model = nn.Linear(2,3,False) # input 2D vector, output 3D vector, NO BIAS
linear_model.weight # weights are a 3 X 2 matrix

Parameter containing:
tensor([[ 0.0369, -0.6950],
        [-0.6591,  0.6533],
        [ 0.4035,  0.4890]], requires_grad=True)

In [6]:
dummy_input = torch.Tensor([1.0, 2.0]) # the input 2D vector

# since our model is a linear function with no bias, we are essentially doing a matrix-vector multiplication
# between our input 2D vector and a 3X2 matrix. This means, we will get the prescribed 3D vector as output!
linear_model.weight.matmul(dummy_input)

tensor([-1.3530,  0.6475,  1.3814], grad_fn=<MvBackward0>)

---
Back to the book... <br><br>
In chapter 5, we defined our model as the linear function above. We passed in a tensor with `requires_grad=True`, for *w* and *b*. In doing so, Pytorch is able to solve the derivative of the loss function in respect to *w* and *b* through all the steps those two parameters travelled, including our model.<br><br>
Below is how we set this up previously.

In [7]:
# defining our parameters w and b with autograd enabled
params = torch.tensor([0.0, 1.0], requires_grad=True)

# defining our model
def model(unknown_unit, w, b):
    return w * unknown_unit + b

# placeholder unknown_unit
t_u = torch.tensor([1.0, 2.0, 3.0])

# we can then call this in our training loop like so:
model(t_u, *params)

tensor([1., 1., 1.], grad_fn=<AddBackward0>)

We can combine the *autograd enabled* parameters and the model with the submodule `nn.Linear` described above. If we want to view the *weight* we can used the property `.weight`, and to get the *bias* we can use the property `.bias`. Furthermore, if we need to pass the parameters to the optimizer, we can call the method `.parameters()`<br><br>
To replace the `params` tensor and `model` function above, we can simply write this:

In [8]:
linear_model = nn.Linear(1, 1) #bias defaults to true

linear_model.weight, linear_model.bias, linear_model.parameters()

(Parameter containing:
 tensor([[-0.8310]], requires_grad=True),
 Parameter containing:
 tensor([-0.5477], requires_grad=True),
 <generator object Module.parameters at 0x7fb01eb035a0>)

To run a forward pass, we no longer pass the parameters and data into the model; we only pass the the data by calling the model. For instance, to pass `t_u` as above, we would write `model_output = linear_model(t_u)`. We can try to pass our placeholder *unknown unit tensor* `t_u` into `linear_model` and see what happens.

In [9]:
model_output = linear_model(t_u)

RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x3 and 1x1)

### Batching
We can see above that our data input `t_u` is the wrong shape. We can't multiply the 1x3 vector `t_u` by the 1x1 matrix that is the *weight* parameter in our linear model. <br><br>
We need to *batch* our input data to create tensor of size $B\times Nin$ where $B$ is the size of the batch, and $Nin$ is the size of the input. Since we have an input size of 1 in our `linear_model` we need a tensor of size 3X1 for our placeholder data `t_u`. We can easily do this using `unsqueeze` which will add an extra dimension at axis 1 of our input.<br><br>
Batching also allows our GPU to compute the inputs in paralell!

In [10]:
# batching our input data
t_u = t_u.clone().unsqueeze(1)
t_u

tensor([[1.],
        [2.],
        [3.]])

In [11]:
# passing this data into our linear model will now work!
model_output = linear_model(t_u)

### Replacing the Loss Function
PyTorch also provides a bunch of pre-coded loss functions. Previously, we coded our own *mean squared loss* function like so:

In [12]:
# defining our loss function - mean squared loss
def loss_fn (temps_predicted, temps_actual):
    squared_diff = (temps_predicted - temps_actual) ** 2
    return squared_diff.mean() # taking the mean of the squared dif vector

We can simply replace this with `nn.MSELoss`.

### The revised training loop.
To keep things module, we will now pass in our optimizer, model, and loss function as parameters.

In [15]:
def training_loop(num_epochs, optimizer, model, loss_fn, t_u_train, t_u_val, t_c_train, t_c_val):
    for epoch in range(1, num_epochs + 1):
        t_p_train = model(t_u_train) # forward pass on training data
        loss_train = loss_fn(t_p_train, t_c_train) # evaluating the loss

        with torch.no_grad(): # we don't need a graph for validation - switch it off!
            t_p_val = model(t_u_val) # evaluating the validation data
            loss_val = loss_fn(t_p_val, t_c_val) # calculating loss for validation set

        optimizer.zero_grad() # zeroing the gradient
        loss_train.backward() # backward pass on loss_train
        optimizer.step() # update the parameters

        if epoch == 1 or epoch % 1000 == 0:
            print(f"Epoch {epoch}, Training Loss {loss_train.item():.4f}, Validation loss {loss_val.item():.4f}")

Bring our training data back in

In [17]:
#just copying our split data from the last chapter - t_u already normalized
t_c_train = torch.tensor([6.0, 13.0, 3.0, 11.0, 15.0, -4.0, 0.5, 21.0, 28.0])
t_u_train = torch.tensor([4.84, 6.04, 3.39, 5.63, 5.82, 2.18, 3.57, 6.84, 8.19])

t_c_val = torch.tensor([8.0, 14.0])
t_u_val = torch.tensor([4.89, 5.59])

# batching the data
t_c_train = t_c_train.unsqueeze(1)
t_u_train = t_u_train.unsqueeze(1)
t_c_val = t_c_val.unsqueeze(1)
t_u_val = t_u_val.unsqueeze(1)

t_c_train, t_u_train, t_c_val, t_u_val

(tensor([[ 6.0000],
         [13.0000],
         [ 3.0000],
         [11.0000],
         [15.0000],
         [-4.0000],
         [ 0.5000],
         [21.0000],
         [28.0000]]),
 tensor([[4.8400],
         [6.0400],
         [3.3900],
         [5.6300],
         [5.8200],
         [2.1800],
         [3.5700],
         [6.8400],
         [8.1900]]),
 tensor([[ 8.],
         [14.]]),
 tensor([[4.8900],
         [5.5900]]))

Defining our model, optimizer, and loss function.

In [20]:
linear_model = nn.Linear(1, 1) # our linear model - 1X1 tensor in and out, bias = true

# passing our linear model's parameters into our optimizer and setting the learning rate
optimizer = torch.optim.SGD(linear_model.parameters(), lr=1e-2) 

# defining our loss function
loss_function = nn.MSELoss()

And now we can start training!

In [21]:
# running the training loop
training_loop(
    num_epochs= 3000,
    optimizer= optimizer,
    model=linear_model,
    loss_fn=loss_function,
    t_u_train=t_u_train,
    t_u_val=t_u_val,
    t_c_train=t_c_train,
    t_c_val=t_c_val)

Epoch 1, Training Loss 132.1691, Validation loss 74.8418
Epoch 1000, Training Loss 3.8270, Validation loss 1.6122
Epoch 2000, Training Loss 3.2969, Validation loss 1.3587
Epoch 3000, Training Loss 3.2878, Validation loss 1.3322
