A really good explainer on how Neural Networks approximate functions using only linear functions and activation functions is Michael Nielsen's *A visual proof that neural nets can compute any function*. See [this link.](https://neuralnetworksanddeeplearning.com/chap4.html)

**MAKE SURE TO ADD NOTES ON ACTIVATION FUNCTIONS ONCE YOU HAVE A BETTER GRASP**

# Using PyTorch's NN modules
In our previous linear model (chap 05), we wrote a simple linear function to map a line through our data points. Below is the function.

In [2]:
# where w is the 'weight' and b is the 'bias'
def model(unknown_unit, w, b):
    return w * unknown_unit + b

PyTorch provides *modules* (known as *layers* in other frameworks) in the sub-library `torch.nn`. A PyTorch module is a class deriving from the `torch.nn.Module` base class. A module can have one or more `Parameter` instances as attributes which are the tensors to be optimized during training (in our case *w* and *b*). A module can also have one or more *submodules* which are subclasses of `nn.Module` as attributes and will be able to track their parameters as well. <br><br>
If we wanted to replace our above model function, we could use the subclass of `nn.Module`, `nn.Linear`. `nn.Linear` has the following parameters:
* in_features(int) - the size of the input.
* out_features(int) - the size of the output
* bias(bool) - defaults to `True`. If set to `False`, the model will not learn a bias. No *b* parameter!

---
### A Little tangent on matrix-vector products to help me full understand the above parameters


In [3]:
import torch
import torch.nn as nn # import torch.nn with a convenient alias

In [4]:
linear_model = nn.Linear(2,3,False) # input 2D vector, output 3D vector, NO BIAS
linear_model.weight # weights are a 3 X 2 matrix

Parameter containing:
tensor([[-0.6363,  0.2977],
        [-0.2531,  0.6166],
        [ 0.5990, -0.1929]], requires_grad=True)

In [5]:
dummy_input = torch.Tensor([1.0, 2.0]) # the input 2D vector

# since our model is a linear function with no bias, we are essentially doing a matrix-vector multiplication
# between our input 2D vector and a 3X2 matrix. This means, we will get the prescribed 3D vector as output!
linear_model.weight.matmul(dummy_input)

tensor([-0.0408,  0.9801,  0.2133], grad_fn=<MvBackward0>)

---
Back to the book... <br><br>
In chapter 5, we defined our model as the linear function above. We passed in a tensor with `requires_grad=True`, for *w* and *b*. In doing so, Pytorch is able to solve the derivative of the loss function in respect to *w* and *b* through all the steps those two parameters travelled, including our model.<br><br>
Below is how we set this up previously.

In [10]:
# defining our parameters w and b with autograd enabled
params = torch.tensor([0.0, 1.0], requires_grad=True)

# defining our model
def model(unknown_unit, w, b):
    return w * unknown_unit + b

# placeholder unknown_unit
t_u = torch.tensor([1.0, 2.0, 3.0])

# we can then call this in our training loop like so:
model(t_u, *params)

tensor([1., 1., 1.], grad_fn=<AddBackward0>)

We can combine the *autograd enabled* parameters and the model with the submodule `nn.Linear` described above. If we want to view the *weight* we can used the property `.weight`, and to get the *bias* we can use the property `.bias`. Furthermore, if we need to pass the parameters to the optimizer, we can call the method `.parameters()`<br><br>
To replace the `params` tensor and `model` function above, we can simply write this:

In [11]:
linear_model = nn.Linear(1, 1) #bias defaults to true

linear_model.weight, linear_model.bias, linear_model.parameters()

(Parameter containing:
 tensor([[0.9519]], requires_grad=True),
 Parameter containing:
 tensor([0.2987], requires_grad=True),
 <generator object Module.parameters at 0x7f9de0a9c900>)

To run a forward pass, we no longer pass the parameters and data into the model; we only pass the the data by calling the model. For instance, to pass `t_u` as above, we would write `model_output = linear_model(t_u)`. We can try to pass our placeholder *unknown unit tensor* `t_u` into `linear_model` and see what happens.

In [12]:
model_output = linear_model(t_u)

RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x3 and 1x1)