## PYTORCH basic model building practice

.

In [20]:
import torch

### Fully connected layer

Every input influences every output of the layer to a degree specified by the layer’s weight

If you do the matrix multiplication of x by the linear layer’s weights, and add the biases, you’ll find that you get the output vector y.

One other important feature to note: When we checked the weights of our layer with lin.weight, it reported itself as a Parameter (which is a subclass of Tensor), and let us know that it’s tracking gradients with autograd. This is a default behavior for Parameter that differs from Tensor

In [21]:
lin = torch.nn.Linear(3, 2)

print('\n\nWeight and Bias parameters:')
for param in lin.parameters():
    print(param)

x = torch.rand(1, 3)
print('Input:')
print(x)

y = lin(x)
print('\n\nOutput:')
print(y)



Weight and Bias parameters:
Parameter containing:
tensor([[ 0.3160,  0.0632,  0.3253],
        [ 0.3683, -0.2402, -0.0110]], requires_grad=True)
Parameter containing:
tensor([ 0.4375, -0.1672], requires_grad=True)
Input:
tensor([[0.9866, 0.9736, 0.7939]])


Output:
tensor([[ 1.0689, -0.0464]], grad_fn=<AddmmBackward0>)


### Fundamental structure of a PyTorch model: 

there is an __init__() method that defines the layers and other components of a model, and a forward() method where the computation gets done.

In [22]:
class TinyModel(torch.nn.Module):

    def __init__(self):
   
   # The first thing is writing an __init__ function that references nn.Module and inherits its properties
   # define the layers in your neural network in this function
        super(TinyModel, self).__init__()

        self.linear1 = torch.nn.Linear(100, 200)
        self.activation = torch.nn.ReLU()
        self.linear2 = torch.nn.Linear(200, 10)
        self.softmax = torch.nn.Softmax()

    
    # When you use PyTorch to build a model, you just have to define the forward function, 
    # that will pass the data into the computation graph (i.e. our neural network). 
    # This will represent our feed-forward algorithm.
    def forward(self, x):
        x = self.linear1(x)
        x = self.activation(x)
        x = self.linear2(x)
        x = self.softmax(x)
        return x

tinymodel = TinyModel()



In [23]:
print('The model:')
print(tinymodel)

print('\n\nJust one layer i.e layer 2:')
print(tinymodel.linear2)

print('\n\nModel params:')
for param in tinymodel.parameters():
    print(param)

print('\n\nLayer 2 params:')
for param in tinymodel.linear2.parameters():
    print(param)

The model:
TinyModel(
  (linear1): Linear(in_features=100, out_features=200, bias=True)
  (activation): ReLU()
  (linear2): Linear(in_features=200, out_features=10, bias=True)
  (softmax): Softmax(dim=None)
)


Just one layer i.e layer 2:
Linear(in_features=200, out_features=10, bias=True)


Model params:
Parameter containing:
tensor([[ 0.0735, -0.0662, -0.0192,  ...,  0.0976,  0.0450,  0.0197],
        [-0.0040, -0.0960,  0.0430,  ...,  0.0074,  0.0377, -0.0826],
        [ 0.0905, -0.0177, -0.0497,  ..., -0.0517,  0.0331,  0.0070],
        ...,
        [-0.0394, -0.0480, -0.0800,  ..., -0.0285, -0.0301,  0.0750],
        [-0.0744, -0.0153, -0.0002,  ..., -0.0981, -0.0740, -0.0560],
        [-0.0310, -0.0331, -0.0480,  ..., -0.0365,  0.0626, -0.0377]],
       requires_grad=True)
Parameter containing:
tensor([-0.0092,  0.0966, -0.0725,  0.0587,  0.0363,  0.0189,  0.0630, -0.0194,
        -0.0760,  0.0195,  0.0080, -0.0151, -0.0947,  0.0791, -0.0139,  0.0985,
         0.0218,  0.0876, -0

Convolutional layers are built to handle data with a high degree of spatial correlation. They are very commonly used in computer vision, where they detect close groupings of features which the compose into higher-level features.

In [24]:
import torch.functional as F


class LeNet(torch.nn.Module):

    def __init__(self):

        #inherit torch.nn.Module properties
        super(LeNet, self).__init__()
        # 1 input image channel (black & white), 6 output channels, 5x5 square convolution
        # kernel
        self.conv1 = torch.nn.Conv2d(1, 6, 5)
        self.conv2 = torch.nn.Conv2d(6, 16, 3)
        # an affine operation: y = Wx + b
        self.fc1 = torch.nn.Linear(16 * 6 * 6, 120)  # 6*6 from image dimension
        self.fc2 = torch.nn.Linear(120, 84)
        self.fc3 = torch.nn.Linear(84, 10)

    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # If the size is a square you can only specify a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features

The first argument to a convolutional layer’s constructor is the number of input channels. Here, it is 1. If we were building this model to look at 3-color channels, it would be 3.

A convolutional layer is like a window that scans over the image, looking for a pattern it recognizes. These patterns are called features

This is the second argument to the constructor is the number of output features. Here, we’re asking our layer to learn 6 features.

 The third argument is the window or kernel size. Here, the “5” means we’ve chosen a 5x5 kernel. (If you want a kernel with height different from width, you can specify a tuple for this argument - e.g., (3, 5) to get a 3x5 convolution kernel (window).)

.

LSTM Network

Recurrent neural networks (or RNNs) are used for sequential data - anything from time-series measurements from a scientific instrument to natural language sentences to DNA nucleotides. An RNN does this by maintaining a hidden state that acts as a sort of memory for what it has seen in the sequence so far.

In [25]:
class LSTMTagger(torch.nn.Module):

    def __init__(self, embedding_dim, hidden_dim, vocab_size, tagset_size):
        super(LSTMTagger, self).__init__()
        self.hidden_dim = hidden_dim

        self.word_embeddings = torch.nn.Embedding(vocab_size, embedding_dim)

        # The LSTM takes word embeddings as inputs, and outputs hidden states
        # with dimensionality hidden_dim.
        self.lstm = torch.nn.LSTM(embedding_dim, hidden_dim)

        # The linear layer that maps from hidden state space to tag space
        self.hidden2tag = torch.nn.Linear(hidden_dim, tagset_size)

    def forward(self, sentence):
        embeds = self.word_embeddings(sentence)
        lstm_out, _ = self.lstm(embeds.view(len(sentence), 1, -1))
        tag_space = self.hidden2tag(lstm_out.view(len(sentence), -1))
        tag_scores = F.log_softmax(tag_space, dim=1)
        return tag_scores

The constructor has four arguments:

vocab_size is the number of words in the input vocabulary. Each word is a one-hot vector (or unit vector) in a vocab_size-dimensional space.

tagset_size is the number of tags in the output set.

embedding_dim is the size of the embedding space for the vocabulary. An embedding maps a vocabulary onto a low-dimensional space, where words with similar meanings are close together in the space.

hidden_dim is the size of the LSTM’s memory.
The input will be a sentence with the words represented as indices of one-hot vectors. The embedding layer will then map these down to an embedding_dim-dimensional space. The LSTM takes this sequence of embeddings and iterates over it, fielding an output vector of length hidden_dim. 

The final linear layer acts as a classifier; applying log_softmax() to the output of the final layer converts the output into a normalized set of estimated probabilities that a given word maps to a given tag.

.

Max Pooling

In [26]:
#Max pooling (and its twin, min pooling) reduce a tensor by combining cells, and assigning the maximum value of the input cells to the output cell (we saw this). For example:

my_tensor = torch.rand(1, 6, 6)
print(my_tensor)

# run a stride of 3 and pick the max from each
#final will be reduced to (1, 2, 2)
maxpool_layer = torch.nn.MaxPool2d(3)
print(maxpool_layer(my_tensor))

# run a stride of 2 and pick the max from each
#final will be reduced to (1, 3, 3)
maxpool_layer2 = torch.nn.MaxPool2d(2)
print(maxpool_layer2(my_tensor))

tensor([[[0.5793, 0.5834, 0.4753, 0.3510, 0.5525, 0.1278],
         [0.8813, 0.9122, 0.3859, 0.3544, 0.0258, 0.2757],
         [0.4768, 0.0720, 0.0415, 0.6416, 0.6504, 0.0046],
         [0.1300, 0.3771, 0.1513, 0.6127, 0.9971, 0.7925],
         [0.0266, 0.4219, 0.2332, 0.9785, 0.4893, 0.7705],
         [0.6212, 0.0087, 0.3937, 0.3318, 0.3556, 0.4316]]])
tensor([[[0.9122, 0.6504],
         [0.6212, 0.9971]]])
tensor([[[0.9122, 0.4753, 0.5525],
         [0.4768, 0.6416, 0.9971],
         [0.6212, 0.9785, 0.7705]]])


Normalization layer

In [27]:
#Normalization layers re-center and normalize the output of one layer before feeding it to another. 
#Centering the and scaling the intermediate tensors has a number of beneficial effects,  such as letting you use higher learning rates without exploding/vanishing gradients.

original_tensor = torch.rand(1, 4, 4) * 20 + 5
print(original_tensor)
print(original_tensor.mean())


norm_layer = torch.nn.BatchNorm1d(4)
normed_tensor = norm_layer(original_tensor)
print(normed_tensor)
print(normed_tensor.mean())


tensor([[[ 8.0735, 15.8688, 18.2773, 24.4952],
         [ 9.5099, 10.2522, 23.2088,  6.7036],
         [14.2245, 23.6516,  8.1415, 12.5767],
         [23.3453,  6.9129, 18.5149, 19.1098]]])
tensor(15.1792)
tensor([[[-1.4631, -0.1377,  0.2718,  1.3290],
         [-0.4567, -0.3402,  1.6942, -0.8974],
         [-0.0750,  1.5923, -1.1509, -0.3664],
         [ 1.0453, -1.6493,  0.2532,  0.3508]]],
       grad_fn=<NativeBatchNormBackward0>)
tensor(1.8626e-08, grad_fn=<MeanBackward0>)


Dropout Layers

In [28]:
#Dropout layers are a tool for encouraging sparse representations in your model - that is, pushing it to do inference with less data.

#Dropout layers work by randomly setting parts of the input tensor during training - dropout layers are always turned off for inference.
# This forces the model to learn against this masked or reduced dataset. For example:

my_tensor = torch.rand(1, 4, 4)
print(my_tensor)

dropout = torch.nn.Dropout(p=0.4)
print(dropout(my_tensor))


tensor([[[0.1194, 0.7049, 0.2922, 0.4484],
         [0.9159, 0.4119, 0.2081, 0.0032],
         [0.5486, 0.8726, 0.4754, 0.4427],
         [0.1700, 0.8789, 0.0528, 0.9664]]])
tensor([[[0.0000, 0.0000, 0.0000, 0.0000],
         [1.5265, 0.6866, 0.0000, 0.0053],
         [0.9143, 0.0000, 0.7923, 0.0000],
         [0.0000, 1.4649, 0.0000, 1.6106]]])
