# Building Models with PyTorch

## `torch.nn.Module` and `torch.nn.Parameter`

One important behaviours of `torch.nn.Module`: registering parameters, e.g. learning weights for `Module` subclass would be instances of `torch.nn.Parameter`.

In [3]:
import torch

class TinyModel(torch.nn.Module):
    def __init__(self):
        super(TinyModel, self).__init__()

        self.linear1 = torch.nn.Linear(100, 200)
        self.activation = torch.nn.ReLU()
        self.linear2 = torch.nn.Linear(200, 10)
        self.softmax = torch.nn.Softmax()

    def forward(self, x):
        x = self.linear1(x)
        x = self.activation(x)
        x = self.linear2(x)
        x = self.softmax(x)
        return x

tinymodel = TinyModel()

print("The model:")
print(tinymodel)

print("\nLayer 2:")
print(tinymodel.linear2)

print("\nModel params:")
for param in tinymodel.parameters():
    print(param)

print("\nLayer 2 params:")
for param in tinymodel.linear2.parameters():
    print(param)

The model:
TinyModel(
  (linear1): Linear(in_features=100, out_features=200, bias=True)
  (activation): ReLU()
  (linear2): Linear(in_features=200, out_features=10, bias=True)
  (softmax): Softmax(dim=None)
)

Layer 2:
Linear(in_features=200, out_features=10, bias=True)

Model params:
Parameter containing:
tensor([[-0.0410,  0.0239, -0.0756,  ...,  0.0793,  0.0449,  0.0881],
        [-0.0026,  0.0443, -0.0817,  ..., -0.0576,  0.0830,  0.0984],
        [-0.0799,  0.0598,  0.0191,  ...,  0.0053, -0.0855,  0.0849],
        ...,
        [-0.0841, -0.0678,  0.0033,  ...,  0.0114, -0.0412,  0.0588],
        [ 0.0691, -0.0117, -0.0402,  ...,  0.0024, -0.0211,  0.0092],
        [ 0.0369,  0.0289, -0.0612,  ...,  0.0120,  0.0480, -0.0913]],
       requires_grad=True)
Parameter containing:
tensor([ 0.0513,  0.0253, -0.0273,  0.0580,  0.0653,  0.0655,  0.0396,  0.0080,
         0.0801, -0.0178, -0.0042,  0.0803,  0.0829, -0.0999, -0.0732, -0.0672,
         0.0544,  0.0312, -0.0453,  0.0924,  0.03

## Common Layer Types

### Linear Layers

Linear layers fully connected; every input influences every layer output to degree specified by layer's weights

In [4]:
lin = torch.nn.Linear(3, 2)
x = torch.rand(1, 3)
print("Input:")
print(x)

print("\nWeight and Bias parameters:")
for param in lin.parameters():
    print(param)

y = lin(x)
print("\nOutput:")
print(y)

Input:
tensor([[0.1872, 0.3584, 0.8722]])

Weight and Bias parameters:
Parameter containing:
tensor([[-0.2549, -0.3049,  0.4524],
        [ 0.0330,  0.0098, -0.3964]], requires_grad=True)
Parameter containing:
tensor([-0.2718, -0.3738], requires_grad=True)

Output:
tensor([[-0.0343, -0.7098]], grad_fn=<AddmmBackward0>)


### Convolutional Layers

In [None]:
import torch.functional as F


class LeNet(torch.nn.Module):

    def __init__(self):
        super(LeNet, self).__init__()
        
        self.conv1 = torch.nn.Conv2d(1, 6, 5)
        self.conv2 = torch.nn.Conv2d(6, 16, 3)

        self.fc1 = torch.nn.Linear(16 * 6 * 6, 120)
        self.fc2 = torch.nn.Linear(120, 84)
        self.fc3 = torch.nn.Linear(84, 10)

    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), 2)
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features

LeNet takes 1x32x32 b&w image.

`conv1` input
* `in_channels`: no. input channels, i.e. 1 (if RGB, would be 3) 
* `out_channels`: no. features we want layer to learn, i.e. 6
* `kernel_size`: kernel dimensions, i.e. 5x5 (if non-sqaure, use tuple)

`conv1` output
* *Activation map* - spatial representation of presence of features in input tensor - 6x28x28 tensor, from no. features (6) and valid kernel positions (28x28)
* Passed through ReLU activation func, then max pooling layer with 2x2 pooling region
* Gives lower-res 6x14x14 activation map version

`conv2`
* Expects 6 input channels (no. features sought by `conv1`), has 16 out channels and 3x3 kernel
* Ouputs 16x12x12 activation map, reduced by pooling layer to 16x6x6
* Reshaped to 16\*6\*6 = 576-ele vector for next layer

### Recurrent Layers

RNNS used for sequential data, e.g. time-series measurements, natural lang sentences, DNA nucleotides. They maintain *hidden state* that acts as 'memory' for what it has seen in sequence so far.

Internal structure for RNN - or variants LSTM (long short-term memory), GRU (gated recurrent unit) - moderately complex.

In [None]:
class LSTMTagger(torch.nn.Module):
    """Classifer to tell if a word is a noun, verb, etc."""

    #TODO

### Transformers

In [None]:
#TODO

## Other Layers and Functions

### Data Manipulation Layers

In [5]:
# Max pooling layers
# Reduce tensor by combining cells and assigning max value of input cells to output cell

my_tensor = torch.rand(1, 6, 6)
print(my_tensor)

maxpool_layer = torch.nn.MaxPool2d(3)
print(maxpool_layer(my_tensor))

tensor([[[0.3926, 0.6974, 0.1467, 0.0350, 0.0382, 0.9606],
         [0.1009, 0.8235, 0.7125, 0.7284, 0.3182, 0.5734],
         [0.0473, 0.5504, 0.8472, 0.8232, 0.3161, 0.3695],
         [0.8336, 0.7650, 0.9640, 0.7852, 0.6229, 0.7383],
         [0.9466, 0.0603, 0.3900, 0.2318, 0.4900, 0.8594],
         [0.6215, 0.3917, 0.0095, 0.6899, 0.1740, 0.8187]]])
tensor([[[0.8472, 0.9606],
         [0.9640, 0.8594]]])


In [6]:
# Normalisation layers
# Re-centre and nomralise output of layer before feeding to another
# Benefits include allowing use of higher learning rates without exploding/vanishing gradients

my_tensor = torch.rand(1, 4, 4) * 20 + 5
print(my_tensor)

print(my_tensor.mean())

norm_layer = torch.nn.BatchNorm1d(4)
normed_tensor = norm_layer(my_tensor)
print(normed_tensor)

print(normed_tensor.mean())

tensor([[[14.4795, 14.4802, 12.8516, 20.0436],
         [19.9820, 11.0803,  7.1885, 22.6962],
         [21.6681, 19.1464, 24.1293,  8.0033],
         [14.8729, 14.7074, 19.7055, 24.5430]]])
tensor(16.8486)
tensor([[[-0.3610, -0.3607, -0.9581,  1.6798],
         [ 0.7498, -0.6568, -1.2717,  1.1787],
         [ 0.5566,  0.1475,  0.9557, -1.6598],
         [-0.8857, -0.9266,  0.3085,  1.5039]]],
       grad_fn=<NativeBatchNormBackward0>)
tensor(-2.9802e-08, grad_fn=<MeanBackward0>)


In [7]:
# Dropout layers
# Tool for encouraging sparse represetations in model; pushing it to infer with less data
# Randomly set parts of input tensor to 0 DURING TRAINING

my_tensor = torch.rand(1,4,4)
print(my_tensor)

dropout = torch.nn.Dropout(p=0.4)  # p defaults to 0.5
print(dropout(my_tensor))
print(dropout(my_tensor))

tensor([[[0.5742, 0.4417, 0.8778, 0.6903],
         [0.7872, 0.7314, 0.0307, 0.8266],
         [0.3286, 0.5668, 0.8089, 0.1789],
         [0.0035, 0.7300, 0.9208, 0.6146]]])
tensor([[[0.0000, 0.7362, 1.4630, 1.1506],
         [0.0000, 0.0000, 0.0512, 1.3777],
         [0.5476, 0.0000, 0.0000, 0.0000],
         [0.0000, 1.2167, 1.5347, 0.0000]]])
tensor([[[0.9570, 0.7362, 1.4630, 1.1506],
         [0.0000, 0.0000, 0.0512, 1.3777],
         [0.5476, 0.0000, 1.3481, 0.0000],
         [0.0058, 1.2167, 1.5347, 1.0243]]])
