<a href="https://colab.research.google.com/github/dqniellew1/DLPT/blob/master/Using_a_neural_network_to_fit_the_data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
drive_dir = 'drive/My Drive/dlwpt-code/data/'

In [2]:
import torch

Basic building block of the complicated functions that make up a neural networks are `neurons`. It consists of a linear transformation of the input and the application of a fixed non-linear function.

Mathematically: `o = f(w * x + b)`
- o for output
- f for activation function
- w for weights (can be a single scalar of matrix)
- x for inputs (is a scalar or vector)
- b for bias / offset

**\*the dimensionality of the inputs and weights must match**

Generalities about activation functions
- are non-linear. The non-linearity allows the overall network to approximate more complex functions.
- are differentiable, so that gradients can be computed through them, *with exceptions

Without those, the network falls back to being a complicated polynomial or becomes difficult to train.

*Code from chapter 5

In [15]:
t_c = [0.5,  14.0, 15.0, 28.0, 11.0,  8.0,  3.0, -4.0,  6.0, 13.0, 21.0]
t_u = [35.7, 55.9, 58.2, 81.9, 56.3, 48.9, 33.9, 21.8, 48.4, 60.4, 68.4]
t_c = torch.tensor(t_c).unsqueeze(1) # temperatures in Celsius
t_u = torch.tensor(t_u).unsqueeze(1) # Unknown units

In [17]:
n_samples = t_u.shape[0]
n_val = int(0.2 * n_samples)

shuffled_indices = torch.randperm(n_samples)

In [18]:
train_indices = shuffled_indices[:-n_val]
val_indices = shuffled_indices[-n_val:]

train_indices, val_indices

(tensor([10,  6,  7,  8,  0,  3,  1,  2,  4]), tensor([5, 9]))

In [19]:
# Build training and validation set 
train_t_u = t_u[train_indices]
train_t_c = t_c[train_indices]

val_t_u = t_u[val_indices]
val_t_c = t_c[val_indices]

# scale our data
train_t_un = 0.1 * train_t_u
val_t_un = 0.1 * val_t_u

In [20]:
import torch.nn as nn

In [23]:
# nn.Linear is a subclass of nn.Module
# Linear have their own `__call__` method, therefore we can call it like a function
linear_model = nn.Linear(1, 1)
linear_model(val_t_un)

tensor([[-3.3975],
        [-4.1081]], grad_fn=<AddmmBackward>)

`y = model(x)` do this

`y = model.forward(x)` dont do this

The constructor to `nn.Linear` accepts three arguments:
1. the number of input features
2. the number of output features
3. Whether the linear model includes a bias or not (defaults to true)

\* The number of features in our case refers to the size of the input and the output tensor for the module.

If we used both temperature and barometric pressure in input, for instance, we would have two features in input and one feature in output.

Above we have an instance of `nn.Linear` with one input and one output feature. That only requires one weight and one bias.

In [25]:
linear_model.weight

Parameter containing:
tensor([[-0.6180]], requires_grad=True)

In [26]:
linear_model.bias

Parameter containing:
tensor([-0.3756], requires_grad=True)

We can call the module with some inputs

In [27]:
x = torch.ones(1)
linear_model(x)

tensor([-0.9936], grad_fn=<AddBackward0>)

Although PyTorch let us get away with it, we didn't actually provide an input with the right dimensionality. We have a model that takes one input and produces one output, but PyTorch `nn.Module` and its subclasses are designed to do so on multiple samples at the same time. To accomodate multiple samples, modules expect the **zeroth dimension of the input to be the number of samples in the batch**.

Any module in `nn` is writtern to produce outputs for a batch of multiple inputs at the same time. Thus, assuming we need to run `nn.Linear` on 10 samples, we can create an input tensor of size `B x N_inputs`, where `B` is the size of the batch and `N_inputs` the number of input features, and run it once through the model.

In [29]:
x = torch.ones(10, 1) # 10 batches, of 1 input
linear_model(x)

tensor([[-0.9936],
        [-0.9936],
        [-0.9936],
        [-0.9936],
        [-0.9936],
        [-0.9936],
        [-0.9936],
        [-0.9936],
        [-0.9936],
        [-0.9936]], grad_fn=<AddmmBackward>)

In [30]:
import torch.optim as optim

Let's update our training code.


In [32]:
# 1. replace our handmade model with nn.Linear(1,1)
linear_model = nn.Linear(1, 1)
# 2. pass the linear model parameters to the optimizer
optimizer = optim.SGD(
    linear_model.parameters(), # all the `w` and `b`'s of the model
    lr = 1e-2)

Earlier it was our responsibility to create parameters and pass them as first argument to `optim.SGD`. Now we can just ask any `nn.Module` for a list of parameters owned by it. or any of its submodules using the `paramters` method.

In [33]:
linear_model.parameters()

<generator object Module.parameters at 0x7fa6597adba0>

In [34]:
list(linear_model.parameters())

[Parameter containing:
 tensor([[-0.1122]], requires_grad=True), Parameter containing:
 tensor([0.9529], requires_grad=True)]

In [38]:
def training_loop(n_epochs, optimizer, model, loss_fn, t_u_train, t_u_val, t_c_train, t_c_val):
  for epoch in range(1, n_epochs + 1):
    t_p_train = model(t_u_train)
    loss_train = loss_fn(t_p_train , t_c_train)

    t_p_val = model(t_u_val)
    loss_val = loss_fn(t_p_val, t_c_val)

    optimizer.zero_grad()
    loss_train.backward()
    optimizer.step()

    if epoch == 1 or epoch % 1000 == 0:
      print('Epoch {}, Training loss {}, Validation loss {}'.format(
          epoch, float(loss_train), float(loss_val)))

The training loop does not change at all from our hand made example, **except that now we don't pass `params` explicitly to `model`** since the model itself holds its `Parameters` internally.

We can leverage loss functions from `torch.nn`. `nn` comes with several common loss functions amoong which `nn.MSELoss` which is exactly what we defined earlier as our `loss_fn`. Loss functions in `nn` are still subclasses of `nn.Module`, so we will create an instance and call it as a function. In our case, we got rid of the hand-written `loss_fn` and replace it.

In [39]:
linear_model = nn.Linear(1, 1) # applies an affine transformation to its input via model parameters
optimizer = optim.SGD(
    linear_model.parameters(),
    lr=1e-2)

training_loop(
    n_epochs = 3000,
    optimizer = optimizer,
    model = linear_model,
    loss_fn = nn.MSELoss(), # create an instance and call it as a function
    t_u_train = train_t_un,
    t_u_val = val_t_un,
    t_c_train = train_t_c,
    t_c_val = val_t_c)

print()
print(linear_model.weight)
print(linear_model.bias)

Epoch 1, Training loss 178.0482635498047, Validation loss 93.85664367675781
Epoch 1000, Training loss 3.456367015838623, Validation loss 4.09751033782959
Epoch 2000, Training loss 2.8643579483032227, Validation loss 3.976336717605591
Epoch 3000, Training loss 2.854109048843384, Validation loss 3.9680535793304443

Parameter containing:
tensor([[5.4238]], requires_grad=True)
Parameter containing:
tensor([-17.2474], requires_grad=True)
