### Neural Language Models
This week, we introduce neural language models, which are neural networks that handle chronologically ordered data, including 1) RNNs, 2) LSTMs, 3) GRUs, and 4) Bi-RNNs. You will learn more complex model architectures next week.

As a quick start, let's import the necessary libraries,  load and pre-process the dataset.

In [28]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F

### 1. RNN

In [29]:
class RNNModel(nn.Module):
    def __init__(self, vocab_size, embedding_dim, hidden_dim, output_size):
        super(RNNModel, self).__init__()
        self.embedding = nn.Embedding(vocab_size, embedding_dim)
        self.rnn = nn.RNN(embedding_dim, hidden_dim, batch_first=True)
        self.fc = nn.Linear(hidden_dim, output_size)

    def forward(self, x):
        x = self.embedding(x)
        out, _ = self.rnn(x)
        out = self.fc(out)
        return out

### 2. LSTM

In [30]:
class LSTMModel(nn.Module):
    def __init__(self, vocab_size, embedding_dim, hidden_dim, output_size):
        super(LSTMModel, self).__init__()
        self.embedding = nn.Embedding(vocab_size, embedding_dim)
        self.lstm = nn.LSTM(embedding_dim, hidden_dim, batch_first=True)
        self.fc = nn.Linear(hidden_dim, output_size)

    def forward(self, x):
        x = self.embedding(x)
        out, _ = self.lstm(x)
        out = self.fc(out)
        return out

### 3. GRU 

In [31]:
class GRUModel(nn.Module):
    def __init__(self, vocab_size, embedding_dim, hidden_dim, output_size):
        super(GRUModel, self).__init__()
        self.embedding = nn.Embedding(vocab_size, embedding_dim)
        self.gru = nn.GRU(embedding_dim, hidden_dim, batch_first=True)
        self.fc = nn.Linear(hidden_dim, output_size)

    def forward(self, x):
        x = self.embedding(x)
        out, _ = self.gru(x)
        out = self.fc(out)
        return out

### 4. Bi-RNNs

In [32]:
class BiRNNModel(nn.Module):
    def __init__(self, vocab_size, embedding_dim, hidden_dim, output_size):
        super(BiRNNModel, self).__init__()
        self.embedding = nn.Embedding(vocab_size, embedding_dim)
        self.rnn = nn.RNN(embedding_dim, hidden_dim, batch_first=True, bidirectional=True)
        self.fc = nn.Linear(hidden_dim * 2, output_size)

    def forward(self, x):
        x = self.embedding(x)
        out, _ = self.rnn(x)
        out = self.fc(out)
        return out

### Training a Neural Network

In [33]:
# Simulated data: a many-to-many mapping of sequences to targets
# Each sequence is a batch of data: (batch_size, sequence_length)
sequence_data = torch.tensor([
    [1, 2, 3, 4],
    [2, 3, 4, 5],
    [3, 4, 5, 6]
], dtype=torch.long)

# Corresponding targets
target_data = torch.tensor([
    [2, 3, 4, 5],
    [3, 4, 5, 6],
    [4, 5, 6, 7]
], dtype=torch.long)

# Vocabulary size and embedding size
vocab_size = 10
embedding_dim = 8
hidden_dim = 16
batch_size, seq_len = sequence_data.size()
num_epochs = 10

# RNN as our model
model = RNNModel(vocab_size, embedding_dim, hidden_dim, vocab_size)
 
# Define a loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

for epoch in range(num_epochs):
    model.train()  # Set model to training mode
    optimizer.zero_grad()  # Clear gradient accumulators
    output = model(sequence_data)  # Forward pass
    loss = criterion(output.view(-1, vocab_size), target_data.view(-1))  # Compute loss
    loss.backward()  # Backward pass
    optimizer.step()  # Update weights

    if (epoch + 1) % 2 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

Epoch [2/10], Loss: 2.1691
Epoch [4/10], Loss: 1.8826
Epoch [6/10], Loss: 1.6149
Epoch [8/10], Loss: 1.3706
Epoch [10/10], Loss: 1.1498


You can accelerate your training by using a GPU. Add .cuda() to the model and the input tensor (make sure you have installed the PyTorch GPU version and have a CUDA-enabled GPU.). When you infers using the trained model, you can using .eval() to ensure that the model is in evaluation mode and not training mode .train().

In [34]:
# check if your device has the GPU support
torch.cuda.is_available()

True

In [35]:
# Example 
model.cuda()
model.train()
model.eval()

RNNModel(
  (embedding): Embedding(10, 8)
  (rnn): RNN(8, 16, batch_first=True)
  (fc): Linear(in_features=16, out_features=10, bias=True)
)

### Practice of the Week
Below are example code to define a shallow nerual network (MLP) using PyTorch. You can use these as a starting point to build your own models. Noted that these models are randomly initialized. Your task is to set your own task and randomly generate at least 100 samples (X, Y) and train the model on these samples. You can also use available datasets as the training data. Eg., Sentiment analysis is a many-to-one regression task, where the input is a text sentence and the output is a sentiment score. You can use the IMDB dataset for sentiment analysis. Add/replace any layers to improve the performance of the model. 

Well-trained models can be obtained by training the models on large datasets and fine-tuning the hyperparameters. You will learn more about training such models in the later lectures.

In [None]:
# Define a two layer neural network model
class OwnModel(nn.Module):
    def __init__(self, input_size, output_size, hidden_size):
        super(OwnModel, self).__init__()
        self.linear1 = nn.Linear(input_size, hidden_size)
        self.linear2 = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        x = F.relu(self.linear1(x))
        x = self.linear2(x)
        return x

model = OwnModel(input_size=50, output_size=1, hidden_size=100)

# Randomly generate some data. Hint: use torch.randn()
batch_size = 32
input_size = 50
hidden_size = 100
output_size = 1  

data = torch.randn(input_size, input_size)
target = torch.randn(input_size, input_size)

# Define the necessary variables for training, such as batch size, input/output size, and training epochs
criterion = nn.CrossEntropyLoss()
# criterion = nn.MSELoss()

optimizer = optim.Adam(model.parameters(), lr=0.01)
# optimizer = optim.SGD(model.parameters(), lr=0.01)

# Define a train function
def train(model, data, target, optimizer, criterion):
    num_epochs = 10
    for epoch in range(num_epochs):
        model.train()  # Set model to training mode
        optimizer.zero_grad()  # Clear gradient accumulators
        output = model(data)  # Forward pass
        loss = criterion(output.view(-1, output_size), target.view(-1))  # Compute loss
        loss.backward()  # Backward pass
        optimizer.step()  # Update weights

train(model, data, target, optimizer, criterion)
model.eval()
    

ValueError: Expected input batch_size (32) to match target batch_size (1600).