# Building models with pytorch

Except for Parameter, the classes we discuss in this video are all subclasses of torch.nn.Module. This is the PyTorch base class meant to encapsulate behaviors specific to PyTorch Models and their components.

One important behavior of torch.nn.Module is registering parameters. If a particular Module subclass has learning weights, these weights are expressed as instances of torch.nn.Parameter. The Parameter class is a subclass of torch.Tensor, with the special behavior that when they are assigned as attributes of a Module, they are added to the list of that modules parameters. These parameters may be accessed through the parameters() method on the Module class.

In [3]:
import torch

class TinyModel (torch.nn.Module):
    def __init__(self):
        super(TinyModel, self).__init__()
        self.linear1 = torch.nn.Linear(100, 200)
        self.activation1 = torch.nn.ReLU()
        self.linear2 = torch.nn.Linear(200, 10)
        self.activation2 = torch.nn.Softmax()
        
    def forward(self, x):
        x = self.linear1(x)
        x = self.activation1(x)
        x = self.linear2(x)
        x = self.activation2(x)
        
        return x
  


tinymodel = TinyModel()

print("\n\nthe model is: ", tinymodel)
print("\n\nthe second layer is: ", tinymodel.linear2)
print("\n\nModel parameters: ")

print(" === parameters for the model === ")
i = 0
for param in tinymodel.parameters():
    print(f"parameters for layer: {i}")
    print(f" param shape: {param.shape}")
    
    print(f"parameters are: \n {param}")
    i += 1


print("\n\nLayer parameters: ")
for param in tinymodel.linear2.parameters():
    print(param)




the model is:  TinyModel(
  (linear1): Linear(in_features=100, out_features=200, bias=True)
  (activation1): ReLU()
  (linear2): Linear(in_features=200, out_features=10, bias=True)
  (activation2): Softmax(dim=None)
)


the second layer is:  Linear(in_features=200, out_features=10, bias=True)


Model parameters: 
 === parameters for the model === 
parameters for layer: 0
 param shape: torch.Size([200, 100])
parameters are: 
 Parameter containing:
tensor([[-0.0055,  0.0250,  0.0438,  ...,  0.0833, -0.0740,  0.0096],
        [ 0.0424, -0.0706, -0.0407,  ..., -0.0273, -0.0934, -0.0322],
        [-0.0571,  0.0439, -0.0869,  ...,  0.0190, -0.0533,  0.0232],
        ...,
        [-0.0463,  0.0288,  0.0873,  ..., -0.0767, -0.0849,  0.0370],
        [ 0.0470,  0.0939,  0.0225,  ..., -0.0176, -0.0340,  0.0528],
        [-0.0579, -0.0891, -0.0328,  ...,  0.0437,  0.0957,  0.0766]],
       requires_grad=True)
parameters for layer: 1
 param shape: torch.Size([200])
parameters are: 
 Parameter co

## Layer types

### Linear layers
The most basic type of neural network layer is a linear or fully connected layer. This is a layer where every input influences every output of the layer to a degree specified by the layer’s weights. If a model/layer has m inputs and n outputs, the weights will be an m x n matrix. For example:

In [11]:

# a linear layer with bias
lin = torch.nn.Linear(3, 2)
x = torch.randn(1,3)

print(f" input shape: {x.shape}")
print(" input:", x)

print("\n weights and bias parameters:")

for param in lin.parameters():
    print(f" param shape: {param.shape}")
    print(param)


print("\n weights: ", lin.weight)
print("\n bias: ", lin.bias)


y = lin(x)

print("\n output:", y)



# a linear layer without bias
print("\n\n a linear layer without bias")
lin_woBias = torch.nn.Linear(3, 2, bias=False)

print(f"\n weights parameters:")

for param in lin_woBias.parameters():
    print(f" param shape: {param.shape}")
    print(param)

print("\n weights: ", lin_woBias.weight)
print("\n bias: ", lin_woBias.bias)



 input shape: torch.Size([1, 3])
 input: tensor([[-0.7298, -0.4829, -1.9811]])

 weights and bias parameters:
 param shape: torch.Size([2, 3])
Parameter containing:
tensor([[ 0.2192,  0.5458, -0.0265],
        [ 0.1908, -0.1495, -0.1196]], requires_grad=True)
 param shape: torch.Size([2])
Parameter containing:
tensor([-0.1025,  0.4180], requires_grad=True)

 weights:  Parameter containing:
tensor([[ 0.2192,  0.5458, -0.0265],
        [ 0.1908, -0.1495, -0.1196]], requires_grad=True)

 bias:  Parameter containing:
tensor([-0.1025,  0.4180], requires_grad=True)

 output: tensor([[-0.4735,  0.5879]], grad_fn=<AddmmBackward0>)


 a linear layer without bias

 weights parameters:
 param shape: torch.Size([2, 3])
Parameter containing:
tensor([[-0.2750,  0.0176,  0.1711],
        [ 0.0032, -0.3298, -0.3343]], requires_grad=True)

 weights:  Parameter containing:
tensor([[-0.2750,  0.0176,  0.1711],
        [ 0.0032, -0.3298, -0.3343]], requires_grad=True)

 bias:  None


### Convolutional Layers
Convolutional layers are built to handle data with a high degree of spatial correlation. They are very commonly used in computer vision, where they detect close groupings of features which the compose into higher-level features. They pop up in other contexts too - for example, in NLP applications, where a word’s immediate context (that is, the other words nearby in the sequence) can affect the meaning of a sentence.

In [None]:
import torch.functional as F


class LeNet(torch.nn.Module):

    def __init__(self):
        super(LeNet, self).__init__()
        # 1 input image channel (black & white), 6 output channels, 5x5 square convolution
        # kernel
        self.conv1 = torch.nn.Conv2d(1, 6, 5)
        self.conv2 = torch.nn.Conv2d(6, 16, 3)
        # an affine operation: y = Wx + b
        self.fc1 = torch.nn.Linear(16 * 6 * 6, 120)  # 6*6 from image dimension
        self.fc2 = torch.nn.Linear(120, 84)
        self.fc3 = torch.nn.Linear(84, 10)

    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # If the size is a square you can only specify a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features

A convolutional layer is like a window that scans over the image, looking for a pattern it recognizes. These patterns are called features, and one of the parameters of a convolutional layer is the number of features we would like it to learn. This is the second argument to the constructor is the number of output features. Here, we’re asking our layer to learn 6 features.

Just above, I likened the convolutional layer to a window - but how big is the window? The third argument is the window or kernel size. Here, the “5” means we’ve chosen a 5x5 kernel. (If you want a kernel with height different from width, you can specify a tuple for this argument - e.g., (3, 5) to get a 3x5 convolution kernel.)

The output of a convolutional layer is an activation map - a spatial representation of the presence of features in the input tensor. conv1 will give us an output tensor of 6x28x28; 6 is the number of features, and 28 is the height and width of our map. (The 28 comes from the fact that when scanning a 5-pixel window over a 32-pixel row, there are only 28 valid positions.)

We then pass the output of the convolution through a ReLU activation function (more on activation functions later), then through a max pooling layer. The max pooling layer takes features near each other in the activation map and groups them together. It does this by reducing the tensor, merging every 2x2 group of cells in the output into a single cell, and assigning that cell the maximum value of the 4 cells that went into it. This gives us a lower-resolution version of the activation map, with dimensions 6x14x14.

Our next convolutional layer, conv2, expects 6 input channels (corresponding to the 6 features sought by the first layer), has 16 output channels, and a 3x3 kernel. It puts out a 16x12x12 activation map, which is again reduced by a max pooling layer to 16x6x6. Prior to passing this output to the linear layers, it is reshaped to a 16 * 6 * 6 = 576-element vector for consumption by the next layer.

There are convolutional layers for addressing 1D, 2D, and 3D tensors. There are also many more optional arguments for a conv layer constructor, including stride length(e.g., only scanning every second or every third position) in the input, padding (so you can scan out to the edges of the input), and more. See the documentation for more information.

### Recurrent layers

Recurrent neural networks (or RNNs) are used for sequential data - anything from time-series measurements from a scientific instrument to natural language sentences to DNA nucleotides. An RNN does this by maintaining a hidden state that acts as a sort of memory for what it has seen in the sequence so far.

The internal structure of an RNN layer - or its variants, the LSTM (long short-term memory) and GRU (gated recurrent unit) - is moderately complex and beyond the scope of this video, but we’ll show you what one looks like in action with an LSTM-based part-of-speech tagger (a type of classifier that tells you if a word is a noun, verb, etc.):
