# Building Models with PyTorch

This notebook is referenced from the fourth video in the [PyTorch Beginner Series](https://www.youtube.com/playlist?list=PL_lsbAsL_o2CTlGHgMxNrKhzP97BaG9ZN) by Brad Heintz on YouTube. The video focuses on the basic concepts in PyTorch that are used to handle several deep learning tasks and demonstrates how these concepts come together to make PyTorch a robust machine learning framework. You can find the notebook associated with the video [here](https://pytorch.org/tutorials/beginner/introyt/modelsyt_tutorial.html).


In [1]:
# Import libraries here
import json

import numpy as np
import torch
import torch.nn.functional as F
import torch.optim as optim
from torch import Tensor

## Build a Simple Model

This model is similar to the one built in notebook-03.


In [2]:
class TinyModel(torch.nn.Module):
    """A simple model created to set a baseline."""

    def __init__(self, *args, **kwargs) -> None:
        super(TinyModel, self).__init__(*args, **kwargs)

        # Setup layers and activations
        self.linear1 = torch.nn.Linear(100, 200)
        self.activation = torch.nn.ReLU()
        self.linear2 = torch.nn.Linear(200, 10)
        self.softmax = torch.nn.Softmax()           # converts output to probabilities

    def forward(self, x: Tensor) -> Tensor:
        x = self.linear1(x)
        x = self.activation(x)
        x = self.linear2(x)
        x = self.softmax(x)

        return x

In [3]:
# Initialize the model
tiny_model = TinyModel()
print(f'The Model Architecture:\n{tiny_model}\n')
print(f'Layer `linear1`:\n{tiny_model.linear1}\n')
print(f'Layer `linear2`:\n{tiny_model.linear2}')

The Model Architecture:
TinyModel(
  (linear1): Linear(in_features=100, out_features=200, bias=True)
  (activation): ReLU()
  (linear2): Linear(in_features=200, out_features=10, bias=True)
  (softmax): Softmax(dim=None)
)

Layer `linear1`:
Linear(in_features=100, out_features=200, bias=True)

Layer `linear2`:
Linear(in_features=200, out_features=10, bias=True)


In [4]:
# Print model parameters
print('~~~ Model Parameters ~~~')
for param in tiny_model.parameters():
    print(param)

~~~ Model Parameters ~~~
Parameter containing:
tensor([[ 0.0258,  0.0937,  0.0045,  ..., -0.0874,  0.0336,  0.0357],
        [-0.0628, -0.0074, -0.0867,  ..., -0.0379,  0.0342,  0.0006],
        [-0.0621,  0.0726, -0.0609,  ..., -0.0836,  0.0264, -0.0588],
        ...,
        [ 0.0368, -0.0577, -0.0352,  ...,  0.0401,  0.0558,  0.0841],
        [-0.0010, -0.0872, -0.0848,  ...,  0.0274,  0.0201, -0.0701],
        [-0.0210, -0.0553, -0.0559,  ..., -0.0500,  0.0179,  0.0056]],
       requires_grad=True)
Parameter containing:
tensor([-0.0279,  0.0237,  0.0082,  0.0206,  0.0441, -0.0085, -0.0014, -0.0226,
         0.0264,  0.0917, -0.0581,  0.0159, -0.0744,  0.0282, -0.0757, -0.0383,
         0.0098,  0.0349, -0.0984, -0.0992,  0.0060,  0.0340,  0.0101, -0.0866,
         0.0721,  0.0925, -0.0286, -0.0372,  0.0245,  0.0150, -0.0073,  0.0936,
         0.0722, -0.0122, -0.0314, -0.0612, -0.0154, -0.0513,  0.0530, -0.0103,
        -0.0795,  0.0654, -0.0436, -0.0639,  0.0595,  0.0823,  0.0698,

In [5]:
# Print parameters for `linear1`
print('~~~ Parameters for `linear1` ~~~')
for param in tiny_model.linear1.parameters():
    print(param)

~~~ Parameters for `linear1` ~~~
Parameter containing:
tensor([[ 0.0258,  0.0937,  0.0045,  ..., -0.0874,  0.0336,  0.0357],
        [-0.0628, -0.0074, -0.0867,  ..., -0.0379,  0.0342,  0.0006],
        [-0.0621,  0.0726, -0.0609,  ..., -0.0836,  0.0264, -0.0588],
        ...,
        [ 0.0368, -0.0577, -0.0352,  ...,  0.0401,  0.0558,  0.0841],
        [-0.0010, -0.0872, -0.0848,  ...,  0.0274,  0.0201, -0.0701],
        [-0.0210, -0.0553, -0.0559,  ..., -0.0500,  0.0179,  0.0056]],
       requires_grad=True)
Parameter containing:
tensor([-0.0279,  0.0237,  0.0082,  0.0206,  0.0441, -0.0085, -0.0014, -0.0226,
         0.0264,  0.0917, -0.0581,  0.0159, -0.0744,  0.0282, -0.0757, -0.0383,
         0.0098,  0.0349, -0.0984, -0.0992,  0.0060,  0.0340,  0.0101, -0.0866,
         0.0721,  0.0925, -0.0286, -0.0372,  0.0245,  0.0150, -0.0073,  0.0936,
         0.0722, -0.0122, -0.0314, -0.0612, -0.0154, -0.0513,  0.0530, -0.0103,
        -0.0795,  0.0654, -0.0436, -0.0639,  0.0595,  0.0823, 

In [6]:
# Print parameters for `linear2`
print('~~~ Parameters for `linear2` ~~~')
for param in tiny_model.linear2.parameters():
    print(param)

~~~ Parameters for `linear2` ~~~
Parameter containing:
tensor([[-0.0392, -0.0690, -0.0435,  ...,  0.0626, -0.0227, -0.0029],
        [-0.0642,  0.0328,  0.0497,  ...,  0.0283, -0.0684, -0.0549],
        [-0.0078,  0.0105, -0.0374,  ...,  0.0345,  0.0061,  0.0190],
        ...,
        [-0.0434, -0.0383, -0.0110,  ...,  0.0667,  0.0439, -0.0597],
        [ 0.0070,  0.0041, -0.0133,  ...,  0.0058, -0.0679, -0.0530],
        [ 0.0274, -0.0384,  0.0578,  ..., -0.0381,  0.0245,  0.0189]],
       requires_grad=True)
Parameter containing:
tensor([-0.0268, -0.0579, -0.0289,  0.0318,  0.0236, -0.0059,  0.0217,  0.0586,
        -0.0181,  0.0499], requires_grad=True)


## Examining Layer Types

Some common layer types are listed below:

- Linear layers - also called fully-connected layers where every input influences every output.
- Convolutional layers - used to handle data with a high degree of spatial correlation.
- Recurrent layers - used for sequential data by maintaining a memory using hidden states.
- Transformers - multi-purpose network with in-built attention heads, encoders, decoders, etc.
- Data manipulation layers
  - Max/Average pooling layers - reduces a tensor by combining cells and assigning max/average value.
  - Normalization layers - re-centers and normalizes the output of one layer before passing it to another.
  - Dropout layers - randomly sets inputs to 0, encouraging sparse representations in the model.

Some associated functions that are important in building a model:

- Activation functions - introduces non-linearity in the model and determines if the neuron is activated.
- Loss functions - evaluates the "goodness" of the model, the weights are optimized to reduce this.


### Linear Layers


In [7]:
# Define a linear layer
linear = torch.nn.Linear(3, 2)

# Define inputs
x = torch.rand(1, 3)
print(f'Inputs:\n{x}\n')

# Print the weights and bias
print('~~~ Weights and Bias for the Linear Layer ~~~')
for param in linear.parameters():
    print(param)

# Produce outputs
y = linear(x)
print(f'\nOutputs:\n{y}')

Inputs:
tensor([[0.6985, 0.1497, 0.1617]])

~~~ Weights and Bias for the Linear Layer ~~~
Parameter containing:
tensor([[ 0.2979,  0.3921,  0.3643],
        [-0.1973, -0.2998, -0.3258]], requires_grad=True)
Parameter containing:
tensor([ 0.5375, -0.3426], requires_grad=True)

Outputs:
tensor([[ 0.8632, -0.5779]], grad_fn=<AddmmBackward0>)


### Convolutional Layers


In [8]:
# Define a convolutional neural network
class ConvNet(torch.nn.Module):
    def __init__(self, *args, **kwargs) -> None:
        super(ConvNet, self).__init__(*args, **kwargs)

        # Define model architecture
        self.conv1 = torch.nn.Conv2d(1, 6, 5)
        self.conv2 = torch.nn.Conv2d(6, 16, 5)
        self.fc1 = torch.nn.Linear(16 * 5 * 5, 120)
        self.fc2 = torch.nn.Linear(120, 84)
        self.fc3 = torch.nn.Linear(84, 10)

    def forward(self, x: Tensor) -> Tensor:
        x = F.max_pool2d(F.relu(self.conv1(x)), 2)
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)

        return x

In [9]:
# Initialize the CNN
conv_net = ConvNet()
print(f'The Model Architecture:\n{conv_net}\n')

# Define inputs
x = torch.rand(1, 1, 32, 32)
print(f'Inputs:\n{x}\n')

# Produce outputs
y = conv_net(x)
print(f'Outputs:\n{y}')

The Model Architecture:
ConvNet(
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=400, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)

Inputs:
tensor([[[[0.6148, 0.2985, 0.0058,  ..., 0.3683, 0.0458, 0.1580],
          [0.3759, 0.2041, 0.3890,  ..., 0.5775, 0.5208, 0.6082],
          [0.9327, 0.2717, 0.7199,  ..., 0.3205, 0.5737, 0.9072],
          ...,
          [0.2765, 0.1261, 0.5468,  ..., 0.7558, 0.8959, 0.3407],
          [0.0896, 0.7106, 0.4569,  ..., 0.4056, 0.5699, 0.5253],
          [0.6691, 0.9141, 0.1395,  ..., 0.1006, 0.2695, 0.9650]]]])

Outputs:
tensor([[-0.0847,  0.0921, -0.0056, -0.0332,  0.0778,  0.0320,  0.0618, -0.1132,
          0.0004,  0.0945]], grad_fn=<AddmmBackward0>)


### Recurrent Layers


In [10]:
# Define a recurrent neural network with LSTM cells
class LSTMTagger(torch.nn.Module):
    def __init__(
        self,
        embedding_dim: int,
        hidden_size: int,
        vocab_size: int,
        tagset_size: int,
    ) -> None:
        super(LSTMTagger, self).__init__()

        # Set hidden dimensions
        self.hidden_size = hidden_size

        # Define word embeddings
        self.word_embeddings = torch.nn.Embedding(
            num_embeddings=vocab_size,
            embedding_dim=embedding_dim,
        )

        # Define LSTM cell
        self.lstm = torch.nn.LSTM(
            input_size=embedding_dim,
            hidden_size=hidden_size,
        )

        # Setup a hidden layer that maps from hidden state space to tag space
        self.hidden2tag = torch.nn.Linear(hidden_size, tagset_size)

    def forward(self, sentence: Tensor) -> Tensor:
        embeds = self.word_embeddings(sentence)
        lstm_out, _ = self.lstm(embeds.view(len(sentence), 1, -1))
        tag_space = self.hidden2tag(lstm_out.view(len(sentence), -1))
        tag_scores = F.log_softmax(tag_space, dim=1)

        return tag_scores

In [11]:
# Setup training data
train_data = [
    ('The dog ate the apple'.split(), ['DET', 'NN', 'V', 'DET', 'NN']),
    ('Everybody read that book'.split(), ['NN', 'V', 'DET', 'NN']),
    ('The apple ate the book'.split(), ['DET', 'NN', 'V', 'DET', 'NN']),
    ('Everybody read the apple'.split(), ['NN', 'V', 'DET', 'NN']),
]

# Mapping words to indices
word_indices = {}
for sentence, _ in train_data:
    for word in sentence:
        if word not in word_indices:
            word_indices[word] = len(word_indices)
print(f'Word Indices = {json.dumps(word_indices, indent=4)}')

# Mapping tags to indices
tag_indices = {'DET': 0, 'NN': 1, 'V': 2}
print(f'Tag Indices = {json.dumps(tag_indices, indent=4)}')

Word Indices = {
    "The": 0,
    "dog": 1,
    "ate": 2,
    "the": 3,
    "apple": 4,
    "Everybody": 5,
    "read": 6,
    "that": 7,
    "book": 8
}
Tag Indices = {
    "DET": 0,
    "NN": 1,
    "V": 2
}


In [12]:
def encode_sequence(seq: list[str], indices: dict[str, int]) -> Tensor:
    """
    Converts a sequence of words to a tensor of indices based on the given mapping.

    Args:
        seq (list[str]): A list of words to be encoded.
        indices (dict[str, int]):\
            A dictionary mapping words to their corresponding indices.

    Returns:
        Tensor: A tensor containing the indices of the words in the input sequence.
    """
    idxs = [indices[word] for word in seq]
    return torch.tensor(idxs, dtype=torch.long)

In [13]:
# Initialize the LSTM model
lstm_tagger = LSTMTagger(
    embedding_dim=6,
    hidden_size=6,
    vocab_size=len(word_indices),
    tagset_size=len(tag_indices),
)
print(f'The Model Architecture:\n{lstm_tagger}')

The Model Architecture:
LSTMTagger(
  (word_embeddings): Embedding(9, 6)
  (lstm): LSTM(6, 6)
  (hidden2tag): Linear(in_features=6, out_features=3, bias=True)
)


In [14]:
# Setup the loss function and optimizer
loss_fn = torch.nn.NLLLoss()
optimizer = optim.SGD(lstm_tagger.parameters(), lr=0.001)

# Setup prediction collection
evaluation_results = {}

# Train the model
N_EPOCHS = 100
for epoch in range(N_EPOCHS):
    for sentence, tags in train_data:
        # Prepare the inputs and targets
        lstm_tagger.zero_grad()
        sentence_encoded = encode_sequence(sentence, word_indices)
        targets = encode_sequence(tags, tag_indices)

        # Perform forward pass
        tag_scores = lstm_tagger(sentence_encoded)
        predictions = tag_scores.argmax(dim=1)
        evaluation_results[' '.join(sentence)] = dict(
            targets=targets.numpy().tolist(),
            predictions=predictions.numpy().tolist(),
        )

        # Compute loss and perform backpropagation
        loss = loss_fn(tag_scores, targets)
        loss.backward()
        optimizer.step()

    # Print training data
    if (epoch + 1) % 10 == 0:
        print(f'Epoch [{epoch + 1:3d}/{N_EPOCHS}], Loss: {loss.item():.4f}')

Epoch [ 10/100], Loss: 1.1207
Epoch [ 20/100], Loss: 1.1183
Epoch [ 30/100], Loss: 1.1159
Epoch [ 40/100], Loss: 1.1136
Epoch [ 50/100], Loss: 1.1113
Epoch [ 60/100], Loss: 1.1090
Epoch [ 70/100], Loss: 1.1068
Epoch [ 80/100], Loss: 1.1046
Epoch [ 90/100], Loss: 1.1024
Epoch [100/100], Loss: 1.1003


In [15]:
# Get the prediction evaluation results
for sentence, result in evaluation_results.items():
    print(f'Sentence: "{sentence}"')
    targets = result['targets']
    predictions = result['predictions']
    print(f'    Targets     : {targets}')
    print(f'    Predictions : {predictions}')

Sentence: "The dog ate the apple"
    Targets     : [0, 1, 2, 0, 1]
    Predictions : [1, 2, 2, 2, 1]
Sentence: "Everybody read that book"
    Targets     : [1, 2, 0, 1]
    Predictions : [1, 0, 0, 1]
Sentence: "The apple ate the book"
    Targets     : [0, 1, 2, 0, 1]
    Predictions : [1, 1, 0, 2, 2]
Sentence: "Everybody read the apple"
    Targets     : [1, 2, 0, 1]
    Predictions : [1, 0, 0, 0]


In [16]:
# Compute the accuracy of the model
correct_predictions = 0
total_predictions = 0
for sentence, result in evaluation_results.items():
    targets = result['targets']
    predictions = result['predictions']

    correct_predictions += (
        (np.array(targets) == np.array(predictions)).sum()
    )
    total_predictions += len(predictions)

accuracy_score = correct_predictions / total_predictions
print(f'Accuracy of the Model: {(accuracy_score * 100):.4f}%')

Accuracy of the Model: 44.4444%


### Data Manipulation Layers
