## Problem: Quantize Your Language Model

### Problem Statement
Implement a **language model** using an LSTM and apply **dynamic quantization** to optimize it for inference. Dynamic quantization reduces the model size and enhances inference speed by quantizing the weights of the model.

### Requirements

1. **Define the Language Model**:
   - **Purpose**: Build a simple language model that predicts the next token in a sequence.
   - **Components**:
     - **Embedding Layer**: Converts input tokens into dense vector representations.
     - **LSTM Layer**: Processes the embedded sequence to capture temporal dependencies.
     - **Fully Connected Layer**: Outputs predictions for the next token.
     - **Softmax Layer**: Applies a probability distribution over the vocabulary for predictions.
   - **Forward Pass**:
     - Pass the input sequence through the embedding layer.
     - Feed the embedded sequence into the LSTM.
     - Use the final hidden state from the LSTM to make predictions via the fully connected layer.
     - Apply the softmax function to obtain probabilities over the vocabulary.

2. **Apply Dynamic Quantization**:
   - Quantize the model dynamically
   - Evaluate the quantized model's performance compared to the original model.

In [2]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.quantization import quantize_dynamic

In [30]:
# TODO: Define a simple Language Model (an LSTM-based model)
class LSTMLayer(nn.Module):
    # one layer of LSTM which maps from input_size to input_size with
    # hidden state of hidden_size.
    def __init__(self, input_size, hidden_size):
        super().__init__()
        self.hidden_size = hidden_size
        self.linear_o = nn.Linear(input_size + hidden_size, input_size)
        self.linear_h = nn.Linear(input_size + hidden_size, hidden_size)

    def forward(self, inputs):
        B, T, _ = inputs.shape
        h_t = torch.zeros(B, self.hidden_size)
        outputs = []
        for t in range(T):
            merged = torch.cat([inputs[:, t, :], h_t], dim=-1) # (B, T, input_size + hidden_size)
            o_t = self.linear_o(merged)
            h_t = self.linear_h(merged)
            outputs.append(o_t)
        return torch.stack(outputs)  # (B, T, input_size=output_size)




class LanguageModel(nn.Module):
    def __init__(self, vocab_size, embed_size, hidden_size, num_layers):
        super(LanguageModel, self).__init__()
        self.embedding = torch.nn.Embedding(vocab_size, embed_size)
        self.lstm = [LSTMLayer(embed_size, hidden_size) for _ in range(num_layers)]
        self.linear = nn.Linear(embed_size, vocab_size)


    def forward(self, x):
        # x: (B, T)
        x = self.embedding(x) # (B, T, embed_size)
        for layer in self.lstm:
            x = layer(x)
        # LSTM output: (B, T, vocab_size)
        x = x[:, -1, :] # extract the last output
        x = self.linear(x)  # (B, vocab_size) = logits
        return x


# Create synthetic training data
torch.manual_seed(42)
vocab_size = 50
seq_length = 10
batch_size = 32
X_train = torch.randint(0, vocab_size, (batch_size, seq_length))  # Random integer input
y_train = torch.randint(0, vocab_size, (batch_size,))  # Random target words

# Initialize the model, loss function, and optimizer
embed_size = 64
hidden_size = 128
num_layers = 2
model = LanguageModel(vocab_size, embed_size, hidden_size, num_layers)

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)


# Training loop
epochs = 5
for epoch in range(epochs):
    model.train()
    optimizer.zero_grad()
    output = model(X_train)
    loss = criterion(output, y_train)
    loss.backward()
    optimizer.step()

    # Log progress every epoch
    print(f"Epoch [{epoch + 1}/{epochs}] - Loss: {loss.item():.4f}")

# Now, we will quantize the model dynamically to reduce its size and improve inference speed
# Quantization: Apply dynamic quantization to the language model
quantized_model = quantize_dynamic(model, {nn.Linear, nn.LSTM}, dtype=torch.qint8)

# Save the quantized model
torch.save(quantized_model.state_dict(), "quantized_language_model.pth")


Epoch [1/5] - Loss: 3.9426
Epoch [2/5] - Loss: 3.9332
Epoch [3/5] - Loss: 3.9239
Epoch [4/5] - Loss: 3.9146
Epoch [5/5] - Loss: 3.9053


In [31]:
# Load the quantized model and test it
quantized_model = LanguageModel(vocab_size, embed_size, hidden_size, num_layers)

# Apply dynamic quantization on the model after defining it
quantized_model = quantize_dynamic(quantized_model, {nn.Linear, nn.LSTM}, dtype=torch.qint8)

quantized_model.load_state_dict(torch.load("quantized_language_model.pth"))

  device=storage.device,


<All keys matched successfully>

In [32]:
# Testing the quantized model on a sample input
quantized_model.eval()
test_input = torch.randint(0, vocab_size, (1, seq_length))
with torch.no_grad():
    prediction = quantized_model(test_input)
    print(f"Prediction for input {test_input.tolist()}: {prediction.argmax(dim=1).item()}")

Prediction for input [[35, 17, 8, 32, 37, 45, 20, 29, 21, 20]]: 36
