# Building your first fully connected network and a CNN 

## Building a simple fully connected network (a Multi-Layer Perceptron)

Let's set up the paths and make a dataset again:

In [1]:
import os,sys
currentdir = os.getcwd()
parentdir = os.path.dirname(currentdir)
sys.path.insert(0,parentdir) 

In [2]:
from utils.data_handling import WCH5Dataset

Now Let's make our model. We'll talk about 
  - model parameters
  - inputs and the forward method
  - Modules containing modules
  - Sequential Module  
  Lets open [simpleMLP](/edit/models/simpleMLP.py)

In [3]:
from models.simpleMLP import SimpleMLP

In [4]:
model_MLP=SimpleMLP(num_classes=3)

Let's look at the parameters:

In [5]:
for name, param in model_MLP.named_parameters():
    print("name of a parameter: {}, type: {}, parameter requires a gradient?: {}".
          format(name, type(param),param.requires_grad))

name of a parameter: fc1.weight, type: <class 'torch.nn.parameter.Parameter'>, parameter requires a gradient?: True
name of a parameter: fc1.bias, type: <class 'torch.nn.parameter.Parameter'>, parameter requires a gradient?: True
name of a parameter: fc2.weight, type: <class 'torch.nn.parameter.Parameter'>, parameter requires a gradient?: True
name of a parameter: fc2.bias, type: <class 'torch.nn.parameter.Parameter'>, parameter requires a gradient?: True
name of a parameter: fc3.weight, type: <class 'torch.nn.parameter.Parameter'>, parameter requires a gradient?: True
name of a parameter: fc3.bias, type: <class 'torch.nn.parameter.Parameter'>, parameter requires a gradient?: True
name of a parameter: fc4.weight, type: <class 'torch.nn.parameter.Parameter'>, parameter requires a gradient?: True
name of a parameter: fc4.bias, type: <class 'torch.nn.parameter.Parameter'>, parameter requires a gradient?: True
name of a parameter: fc5.weight, type: <class 'torch.nn.parameter.Parameter'>, p

As we can see by default the parameters have `requires_grad` set - i.e. we will be able to obtain gradient of the loss function with respect to these parameters.

Let's quickly look at the [source](https://pytorch.org/docs/stable/_modules/torch/nn/modules/linear.html#Linear) for the linear module

The parameters descend from the `Tensor` class. When `Parameter` object is instantiated as a member of a `Module` object class the parameter is added to `Module`s list of parameters automatically. This list and values are captured in the 'state dictionary' of a module:

In [6]:
model_MLP.state_dict()

OrderedDict([('fc1.weight',
              tensor([[ 0.0057,  0.0003,  0.0003,  ...,  0.0026, -0.0030,  0.0006],
                      [-0.0004,  0.0002, -0.0049,  ..., -0.0060,  0.0018,  0.0029],
                      [-0.0051,  0.0022, -0.0015,  ..., -0.0038, -0.0008,  0.0051],
                      ...,
                      [ 0.0003, -0.0045,  0.0050,  ..., -0.0064,  0.0036,  0.0058],
                      [ 0.0050,  0.0001, -0.0041,  ...,  0.0007,  0.0044,  0.0041],
                      [ 0.0055, -0.0026, -0.0062,  ..., -0.0029,  0.0047,  0.0021]])),
             ('fc1.bias',
              tensor([ 6.1454e-03, -3.8093e-03,  3.7682e-03,  1.2780e-03, -4.2391e-03,
                      -1.6603e-03,  4.3777e-03,  3.7202e-03, -1.6040e-03,  2.9438e-03,
                      -2.7895e-04, -4.2055e-03, -1.3591e-03, -3.8535e-04,  1.7130e-03,
                      -2.5235e-03,  3.5872e-03, -5.5334e-03, -4.4186e-03, -6.3407e-03,
                       2.4332e-03, -3.0362e-04,  2.9208e-03,  5.

Did you notice that the values are not 0? This is actually by design - by default that initialization follows an accepted scheme - but many strategies are possible

Now let's look at sequential version

In [7]:
from models.simpleMLP import SimpleMLPSEQ
model_MLPSEQ=SimpleMLPSEQ(num_classes=3)

In [8]:
for name, param in model_MLPSEQ.named_parameters():
    print("name of a parameter: {}, type: {}, parameter requires a gradient?: {}".
          format(name, type(param),param.requires_grad))

name of a parameter: _sequence.0.weight, type: <class 'torch.nn.parameter.Parameter'>, parameter requires a gradient?: True
name of a parameter: _sequence.0.bias, type: <class 'torch.nn.parameter.Parameter'>, parameter requires a gradient?: True
name of a parameter: _sequence.2.weight, type: <class 'torch.nn.parameter.Parameter'>, parameter requires a gradient?: True
name of a parameter: _sequence.2.bias, type: <class 'torch.nn.parameter.Parameter'>, parameter requires a gradient?: True
name of a parameter: _sequence.4.weight, type: <class 'torch.nn.parameter.Parameter'>, parameter requires a gradient?: True
name of a parameter: _sequence.4.bias, type: <class 'torch.nn.parameter.Parameter'>, parameter requires a gradient?: True
name of a parameter: _sequence.6.weight, type: <class 'torch.nn.parameter.Parameter'>, parameter requires a gradient?: True
name of a parameter: _sequence.6.bias, type: <class 'torch.nn.parameter.Parameter'>, parameter requires a gradient?: True
name of a parame

In [9]:
print(model_MLPSEQ.state_dict())

OrderedDict([('_sequence.0.weight', tensor([[ 1.1240e-03, -5.0516e-03, -4.5424e-03,  ...,  1.3199e-03,
         -5.0092e-03, -4.8747e-04],
        [ 2.4317e-03,  1.6055e-04,  3.8558e-03,  ..., -9.1580e-04,
         -2.0747e-03, -6.1886e-04],
        [ 3.0045e-03, -3.4215e-03,  3.0121e-03,  ...,  3.5183e-03,
          2.1762e-03,  1.1849e-03],
        ...,
        [-3.2619e-03,  1.5118e-04, -1.4491e-03,  ...,  4.7094e-03,
         -5.8456e-03,  4.2822e-03],
        [ 4.4278e-03, -8.9968e-04, -6.4112e-03,  ...,  5.8452e-03,
         -1.5233e-03,  2.3909e-03],
        [-8.7048e-05,  1.0830e-03, -2.7301e-03,  ...,  3.2229e-03,
         -1.5551e-04, -4.6648e-03]])), ('_sequence.0.bias', tensor([-5.4659e-03,  5.6283e-04, -3.1679e-03, -5.2000e-03, -3.8694e-03,
        -4.4652e-03, -6.5826e-04,  1.6300e-03,  4.4858e-03,  1.5823e-03,
        -6.3505e-03, -3.8530e-03, -4.0163e-04, -4.4037e-03,  2.7814e-03,
        -4.6127e-03,  1.0991e-03,  3.1771e-03,  3.3054e-03,  4.4030e-03,
         3.3014e-

As we can see the parameters look similar but have different names

## Training a model

First let's make a dataset object

In [10]:
dset=WCH5Dataset("/scratch/fcormier/Public/NUPRISM.h5",reduced_dataset_size=100000,val_split=0.1,test_split=0.1)

Reduced size: 100000


Let's make a dataloader and grab a first batch

In [11]:
from torch.utils.data import DataLoader
from torch.utils.data.sampler import SubsetRandomSampler

train_dldr=DataLoader(dset,
                      batch_size=32,
                      shuffle=False,
                      sampler=SubsetRandomSampler(dset.train_indices))
train_iter=iter(train_dldr)

In [12]:
batch0=next(train_iter)

In [13]:
data=batch0[0]
labels=batch0[1]

Now compute the model output on the data

In [14]:
model_out=model_MLP(data)


In [15]:
print(labels)

tensor([2, 0, 0, 2, 2, 1, 2, 1, 1, 2, 1, 1, 2, 2, 2, 2, 2, 1, 0, 0, 1, 0, 2, 2,
        2, 2, 1, 0, 1, 1, 1, 0])


In [16]:
print(model_out)

tensor([[4.2698e-01, 7.9713e-02, 4.9330e-01],
        [5.1634e-04, 1.2869e-04, 9.9936e-01],
        [6.3779e-06, 4.5251e-05, 9.9995e-01],
        [3.4432e-04, 9.9307e-01, 6.5846e-03],
        [2.0709e-07, 5.1921e-04, 9.9948e-01],
        [3.3927e-04, 8.6314e-03, 9.9103e-01],
        [2.5257e-03, 2.1832e-03, 9.9529e-01],
        [3.0196e-09, 1.1320e-06, 1.0000e+00],
        [5.5010e-06, 1.0070e-06, 9.9999e-01],
        [7.8259e-05, 1.3893e-01, 8.6100e-01],
        [4.5265e-08, 2.1939e-05, 9.9998e-01],
        [9.8275e-01, 1.6069e-05, 1.7234e-02],
        [1.1381e-02, 1.0933e-01, 8.7929e-01],
        [2.9443e-03, 1.3460e-02, 9.8360e-01],
        [2.6970e-07, 3.2683e-06, 1.0000e+00],
        [3.6976e-06, 3.2520e-05, 9.9996e-01],
        [1.3955e-03, 8.9605e-02, 9.0900e-01],
        [1.8714e-03, 3.8200e-04, 9.9775e-01],
        [2.6542e-06, 1.6722e-04, 9.9983e-01],
        [2.8492e-04, 7.7103e-04, 9.9894e-01],
        [3.7364e-01, 4.3748e-02, 5.8261e-01],
        [3.5151e-06, 1.5342e-04, 9

Now we have model's predictions and we above got 'true' labels from the dataset, so we can now compute the loss - CrossEntropyLoss is the apropropriate one to use here. We will use `CrossEntropyLoss` from `torch.nn` - btw it is also a `Module`. First create it:

In [17]:
from torch.nn import CrossEntropyLoss
loss_module=CrossEntropyLoss()

Now evaluate the loss. 

In [18]:
loss_tensor=loss_module(model_out,labels)

In [19]:
print(loss_tensor)

tensor(1.1145, grad_fn=<NllLossBackward0>)


This was a 'forward pass'. We should now have a computational graph available - let's plot it for the kicks...

In [20]:
#Can't get torchivz in compute canada
#from torchviz import make_dot
#make_dot(loss_tensor,params=dict(model_MLP.named_parameters()))

Before we calculate the gradients - let's check what they are now...

In [21]:
for name, param in model_MLP.named_parameters():
    print("name of a parameter: {}, gradient: {}".
          format(name, param.grad))

name of a parameter: fc1.weight, gradient: None
name of a parameter: fc1.bias, gradient: None
name of a parameter: fc2.weight, gradient: None
name of a parameter: fc2.bias, gradient: None
name of a parameter: fc3.weight, gradient: None
name of a parameter: fc3.bias, gradient: None
name of a parameter: fc4.weight, gradient: None
name of a parameter: fc4.bias, gradient: None
name of a parameter: fc5.weight, gradient: None
name of a parameter: fc5.bias, gradient: None


No wonder - let's calculate them

In [22]:
loss_tensor.backward()

In [23]:
for name, param in model_MLP.named_parameters():
    print("name of a parameter: {}, gradient: {}".
          format(name, param.grad))

name of a parameter: fc1.weight, gradient: tensor([[ 0.0000e+00,  6.3469e-08, -2.4750e+01,  ...,  0.0000e+00,
          2.7509e-04,  2.0785e+04],
        [ 0.0000e+00,  0.0000e+00, -1.9043e+01,  ...,  0.0000e+00,
          0.0000e+00,  0.0000e+00],
        [ 0.0000e+00, -8.7695e-08, -2.1030e+01,  ...,  3.3834e+03,
          0.0000e+00,  6.9443e+03],
        ...,
        [ 0.0000e+00, -1.7934e+00,  0.0000e+00,  ..., -3.2453e-04,
          1.8825e+05,  0.0000e+00],
        [ 0.0000e+00,  5.1661e+01, -2.4404e+01,  ...,  2.2799e-04,
         -9.7513e+04, -2.5372e+04],
        [ 0.0000e+00,  1.0332e+02, -1.1783e-05,  ..., -2.2310e-04,
          4.2236e+04,  0.0000e+00]])
name of a parameter: fc1.bias, gradient: tensor([-8.5976e+00,  8.9086e+00, -4.2629e+00, -3.5771e+01,  2.1132e+02,
         2.2888e+01, -3.9025e+01,  6.4370e+01,  9.5124e+01, -5.0377e+01,
        -6.0272e+01, -1.5987e+02, -2.0071e+01, -1.1248e+02, -2.2747e-01,
        -1.2991e+02, -7.7232e+01, -8.8527e+00, -2.0095e+01,  8.48

All we have to do now is subtract the gradient of a given parameter from the parameter tensor itself and do it for all parameters of the model - that should decrease the loss. Normally the gradient is multiplied by a learning rate parameter $\lambda$ so we don't go too far in the loss landscape

In [24]:
lr=0.0001
for param in model_MLP.parameters():
    param.data.add_(-lr*param.grad.data)

call to backward **accumulates** gradients - so we also need to zero the gradient tensors if we want to keep going

In [25]:
for param in model_MLP.parameters():
    param.grad.data.zero_()

There is a much simpler way of doing this - we can use the pytorch [optim](https://pytorch.org/docs/stable/optim.html) classes. This allows us to easily use more advanced optimization options (like momentum or adaptive optimizers like [Adam](https://arxiv.org/abs/1412.6980)):

In [26]:
from torch import optim
optimizer = optim.SGD(model_MLP.parameters(), lr=0.0001)

Lets get a new batch of events

In [27]:
batch1=next(train_iter)

In [28]:
data=batch1[0]
labels=batch1[1]

In [29]:
print(labels)

tensor([2, 0, 2, 1, 1, 1, 2, 0, 2, 0, 1, 2, 1, 0, 2, 0, 2, 2, 0, 2, 1, 1, 1, 1,
        0, 1, 1, 2, 2, 0, 0, 2])


In [30]:
model_out=model_MLP(data)
loss_tensor=loss_module(model_out,labels)
loss_tensor.backward()
optimizer.step()

In [31]:
print(model_out)

tensor([[0., 0., 1.],
        [0., 0., 1.],
        [1., 0., 0.],
        [0., 1., 0.],
        [1., 0., 0.],
        [0., 0., 1.],
        [0., 0., 1.],
        [0., 0., 1.],
        [0., 1., 0.],
        [0., 0., 1.],
        [0., 0., 1.],
        [0., 0., 1.],
        [0., 0., 1.],
        [0., 0., 1.],
        [0., 0., 1.],
        [0., 1., 0.],
        [0., 0., 1.],
        [0., 1., 0.],
        [0., 0., 1.],
        [0., 0., 1.],
        [0., 0., 1.],
        [0., 1., 0.],
        [1., 0., 0.],
        [0., 0., 1.],
        [0., 0., 1.],
        [0., 0., 1.],
        [0., 0., 1.],
        [0., 0., 1.],
        [0., 1., 0.],
        [0., 0., 1.],
        [0., 0., 1.],
        [0., 0., 1.]], grad_fn=<SoftmaxBackward0>)


We could just put the code above in a loop and be done with it, but the usual practice would be to wrap this functionality in a training object. Here we'll use the [engine](/edit/utils/engine.py) class. Let's examine it. We'll talk about:
  1. Implementation of the training loop
  2. Evaluation on validation set and training and test modes.
  3. Turning evaluation of gradients on and off.
  4. Saving and retrieving the model and optimizer state.

In [32]:
from utils.engine import Engine

Let's first create a configuration object -we'll use this to set up our training engine

In [33]:
class CONFIG:
    pass
config=CONFIG()
config.batch_size_test =512
config.batch_size_train = 256
config.batch_size_val = 512
config.lr=0.01
config.device = 'cpu'
config.num_workers_train=2
config.num_workers_val=2
config.num_workers_test=2
config.dump_path = '../model_state_dumps'


In [34]:
engine=Engine(model_MLP,dset,config)

Sticking to CPU
Creating a directory for run dump: ../model_state_dumps/20240425_150733/


In [35]:
print(vars(config))

{'batch_size_test': 512, 'batch_size_train': 256, 'batch_size_val': 512, 'lr': 0.01, 'device': 'cpu', 'num_workers_train': 2, 'num_workers_val': 2, 'num_workers_test': 2, 'dump_path': '../model_state_dumps'}


In [36]:
%%time
engine.train(epochs=1,report_interval=5,valid_interval=20)

Epoch 0 Starting @ 2024-04-25 15:07:34
Label: tensor([1, 2, 1, 1, 2, 2, 2, 2, 0, 0, 0, 2, 1, 2, 2, 1, 1, 1, 1, 1, 2, 1, 0, 2,
        0, 2, 0, 1, 0, 2, 0, 2, 0, 0, 1, 1, 1, 2, 2, 2, 0, 0, 0, 1, 2, 1, 2, 1,
        0, 0, 1, 2, 0, 1, 1, 1, 0, 1, 1, 0, 0, 2, 0, 0, 2, 0, 2, 2, 2, 0, 2, 0,
        0, 1, 1, 1, 2, 1, 1, 0, 2, 2, 1, 2, 1, 0, 0, 1, 1, 1, 2, 1, 1, 1, 2, 0,
        0, 0, 0, 1, 2, 0, 2, 2, 1, 2, 2, 0, 2, 2, 2, 0, 1, 1, 2, 0, 1, 0, 1, 1,
        1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 0, 1, 0, 0,
        0, 2, 1, 2, 0, 1, 0, 0, 0, 1, 0, 1, 2, 1, 1, 1, 0, 1, 2, 0, 1, 1, 0, 0,
        1, 1, 0, 2, 1, 0, 1, 1, 1, 0, 2, 1, 2, 0, 1, 0, 1, 0, 2, 2, 2, 1, 0, 0,
        2, 1, 1, 0, 1, 2, 1, 2, 0, 1, 1, 1, 0, 0, 0, 0, 2, 2, 0, 2, 0, 1, 1, 1,
        1, 2, 2, 0, 0, 0, 0, 1, 2, 1, 0, 0, 2, 1, 1, 1, 0, 1, 0, 0, 2, 1, 2, 0,
        1, 2, 2, 0, 2, 0, 0, 0, 2, 1, 0, 2, 1, 1, 0, 0, 0, 1, 2, 1, 2, 2, 1, 0,
        2, 1, 1, 1, 1, 1, 2, 0, 0, 2, 2, 0, 2, 0, 1, 0, 2, 1, 2, 2, 2, 2, 

KeyboardInterrupt: 

## Defining a simple Convolutional Network

Let's open [simpleCNN](http://localhost:8888/edit/models/simpleCNN.py)

In [5]:
from models.simpleCNN import SimpleCNN
model_CNN=SimpleCNN(num_input_channels=38,num_classes=3)

In [6]:
import numpy as np
def rotate_chan(x):
    return np.transpose(x,(2,0,1))

In [7]:
dset=WCH5Dataset("/scratch/fcormier/Public/NUPRISM.h5",val_split=0.1,test_split=0.1,transform=rotate_chan)

In [8]:
engine=Engine(model_CNN,dset,config)

Sticking to CPU
Creating a directory for run dump: ../model_state_dumps/20240424_223542/


In [9]:
for name, param in model_CNN.named_parameters():
    print("name of a parameter: {}, type: {}, parameter requires a gradient?: {}".
          format(name, type(param),param.requires_grad))

name of a parameter: f_embed.weight, type: <class 'torch.nn.parameter.Parameter'>, parameter requires a gradient?: True
name of a parameter: f_embed.bias, type: <class 'torch.nn.parameter.Parameter'>, parameter requires a gradient?: True
name of a parameter: f_conv1.weight, type: <class 'torch.nn.parameter.Parameter'>, parameter requires a gradient?: True
name of a parameter: f_conv1.bias, type: <class 'torch.nn.parameter.Parameter'>, parameter requires a gradient?: True
name of a parameter: f_conv2a.weight, type: <class 'torch.nn.parameter.Parameter'>, parameter requires a gradient?: True
name of a parameter: f_conv2a.bias, type: <class 'torch.nn.parameter.Parameter'>, parameter requires a gradient?: True
name of a parameter: f_conv2b.weight, type: <class 'torch.nn.parameter.Parameter'>, parameter requires a gradient?: True
name of a parameter: f_conv2b.bias, type: <class 'torch.nn.parameter.Parameter'>, parameter requires a gradient?: True
name of a parameter: f_conv3a.weight, type: 

In [None]:
#Unfortunately this seems to hang and not train
%%time
engine.train(epochs=5,report_interval=1,valid_interval=10)

Epoch 0 Starting @ 2024-04-24 22:35:47
tensor([[-0.2139,  0.2188,  0.1590],
        [-0.2356,  0.1918,  0.1596]])
... Iteration 0 ... Epoch 0.00 ... Validation Loss 1.004 ... Validation Accuracy 0.000
Saved checkpoint as: ../model_state_dumps/20240424_223542/SimpleCNN.pth
best validation loss so far!: 1.0036206245422363
Saved checkpoint as: ../model_state_dumps/20240424_223542/SimpleCNNBEST.pth
tensor([[-0.2629,  0.2181,  0.2056]], grad_fn=<AddmmBackward0>)
