<a href="https://colab.research.google.com/github/Ananya10-Coder/GPT2-like-model/blob/main/GPT2LikeModel.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### **Step 1: Create a GPT-2-like Model Using Transformers from Scratch**

#### **Purpose**:
- To build a simplified version of the GPT-2 model from scratch using PyTorch.
- This step defines the architecture of the model, including:
  - Token and positional embeddings.
  - Multi-head self-attention layers.
  - Feedforward layers.
  - Layer normalization.

#### **Key Components**:
1. **Token Embeddings**:
   - Converts input token IDs into dense vectors of fixed size (`embed_dim`).

2. **Positional Embeddings**:
   - Adds information about the position of each token in the sequence.

3. **Transformer Layers**:
   - Implements multi-head self-attention and feedforward layers.
   - The number of layers (`num_layers`) and attention heads (`num_heads`) can be adjusted.

4. **Final Classification Layer**:
   - Predicts the next token in the sequence.

#### **Output**:
- A custom GPT-2-like model ready for training.



In [None]:
# Step 1: Install Required Libraries
!pip install torch transformers datasets

# Step 2: Import Libraries
import torch
import torch.nn as nn
import torch.nn.functional as F

# Step 3: Define the GPT-2-like Model
class GPT2LikeModel(nn.Module):
    def __init__(self, vocab_size, embed_dim, num_heads, num_layers, max_seq_len):
        super(GPT2LikeModel, self).__init__()
        self.embed_dim = embed_dim
        self.max_seq_len = max_seq_len

        # Token and positional embeddings
        self.token_embedding = nn.Embedding(vocab_size, embed_dim)
        self.position_embedding = nn.Embedding(max_seq_len, embed_dim)

        # Transformer layers
        self.transformer_layers = nn.ModuleList([
            nn.TransformerEncoderLayer(d_model=embed_dim, nhead=num_heads)
            for _ in range(num_layers)
        ])

        # Final layer to predict the next token
        self.fc_out = nn.Linear(embed_dim, vocab_size)

    def forward(self, x):
        batch_size, seq_len = x.size()

        # Generate positional indices
        positions = torch.arange(0, seq_len).unsqueeze(0).expand(batch_size, seq_len).to(x.device)

        # Token and positional embeddings
        token_embeds = self.token_embedding(x)
        position_embeds = self.position_embedding(positions)
        x = token_embeds + position_embeds

        # Pass through transformer layers
        for layer in self.transformer_layers:
            x = layer(x)

        # Predict the next token
        logits = self.fc_out(x)
        return logits

# Step 4: Initialize the Model
vocab_size = 50257  # GPT-2 vocab size
embed_dim = 768      # Embedding dimension
num_heads = 12       # Number of attention heads
num_layers = 6       # Number of transformer layers
max_seq_len = 512    # Maximum sequence length

model = GPT2LikeModel(vocab_size, embed_dim, num_heads, num_layers, max_seq_len)
print(model)

Collecting datasets
  Downloading datasets-3.3.2-py3-none-any.whl.metadata (19 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch)
  Downloading nvidia_cufft_cu12-11.2.1.3-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting 

### **Step 2: Train the Model on a Small Subset of WikiText-2**

#### **Purpose**:
- To train the custom GPT-2-like model on a small subset of the **WikiText-2 dataset**.
- This step teaches the model to predict the next token in a sequence, which is the core task of language modeling.

#### **Key Components**:
1. **Dataset**:
   - WikiText-2 is a collection of over 2 million tokens from Wikipedia articles.
   - A smaller subset (e.g., 10,000 samples) is used to reduce training time.

2. **Tokenization**:
   - The dataset is tokenized using the GPT-2 tokenizer.
   - Sequences are padded or truncated to a fixed length (`max_seq_len`).

3. **Training Loop**:
   - The model is trained using the **AdamW optimizer** and **CrossEntropyLoss**.
   - Mixed precision training (`autocast` and `GradScaler`) is used to speed up training and reduce memory usage.

4. **Evaluation**:
   - The model's performance is evaluated using **loss** and **accuracy**.

#### **Output**:
- A trained GPT-2-like model that can predict the next token in a sequence.


In [None]:
# Step 1: Install Required Libraries
!pip install torch torchvision transformers datasets

# Step 2: Import Libraries
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader
from transformers import GPT2Tokenizer
from datasets import load_dataset
from torch.cuda.amp import autocast, GradScaler  # For mixed precision training

# Step 3: Define the GPT-2-like Model (Reduced Size)
class GPT2LikeModel(nn.Module):
    def __init__(self, vocab_size, embed_dim, num_heads, num_layers, max_seq_len):
        super(GPT2LikeModel, self).__init__()
        self.embed_dim = embed_dim
        self.max_seq_len = max_seq_len

        # Token and positional embeddings
        self.token_embedding = nn.Embedding(vocab_size, embed_dim)
        self.position_embedding = nn.Embedding(max_seq_len, embed_dim)

        # Transformer layers (Reduced to 4 layers)
        self.transformer_layers = nn.ModuleList([
            nn.TransformerEncoderLayer(d_model=embed_dim, nhead=num_heads)
            for _ in range(num_layers)
        ])

        # Final layer to predict the next token
        self.fc_out = nn.Linear(embed_dim, vocab_size)

    def forward(self, x):
        batch_size, seq_len = x.size()

        # Generate positional indices
        positions = torch.arange(0, seq_len).unsqueeze(0).expand(batch_size, seq_len).to(x.device)

        # Token and positional embeddings
        token_embeds = self.token_embedding(x)
        position_embeds = self.position_embedding(positions)
        x = token_embeds + position_embeds

        # Pass through transformer layers
        for layer in self.transformer_layers:
            x = layer(x)

        # Predict the next token
        logits = self.fc_out(x)
        return logits

# Step 4: Initialize the Model (Reduced Size)
vocab_size = 50257  # GPT-2 vocab size
embed_dim = 512      # Reduced embedding dimension
num_heads = 8        # Number of attention heads
num_layers = 4       # Reduced number of transformer layers
max_seq_len = 512    # Maximum sequence length

model = GPT2LikeModel(vocab_size, embed_dim, num_heads, num_layers, max_seq_len)
print(model)

# Step 5: Load the Dataset (Small Subset)
dataset = load_dataset("wikitext", "wikitext-2-raw-v1")

# Use only 10,000 samples for training (smaller subset)
small_train_dataset = dataset["train"].shuffle(seed=42).select(range(10000))

# Step 6: Tokenize the Dataset
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

# Set the padding token to the end-of-sequence token
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# Tokenize the dataset
def tokenize_function(examples):
    return tokenizer(
        examples["text"],
        truncation=True,
        padding="max_length",
        max_length=max_seq_len,
    )

tokenized_dataset = small_train_dataset.map(tokenize_function, batched=True, remove_columns=["text"])

# Step 7: Format the Dataset as PyTorch Tensors
tokenized_dataset.set_format(type="torch", columns=["input_ids"])

# Step 8: Prepare DataLoader
train_dataloader = DataLoader(tokenized_dataset, batch_size=8, shuffle=True)

# Step 9: Define Loss Function and Optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.AdamW(model.parameters(), lr=5e-5)

# Step 10: Enable Mixed Precision Training
scaler = GradScaler()

# Step 11: Train the Model (Reduced to 2 Epochs)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

num_epochs = 1  # Reduced number of epochs
for epoch in range(num_epochs):
    model.train()
    total_loss = 0

    for batch in train_dataloader:
        input_ids = batch["input_ids"].to(device)
        labels = input_ids[:, 1:].contiguous()  # Shift input to get labels

        # Mixed precision training
        optimizer.zero_grad()
        with autocast():
            outputs = model(input_ids[:, :-1])  # Predict next token
            loss = criterion(outputs.view(-1, vocab_size), labels.view(-1))

        # Backward pass and optimization
        scaler.scale(loss).backward()
        scaler.step(optimizer)
        scaler.update()

        total_loss += loss.item()

    # Print average loss for the epoch
    avg_loss = total_loss / len(train_dataloader)
    print(f"Epoch {epoch + 1}/{num_epochs}, Loss: {avg_loss:.4f}")

GPT2LikeModel(
  (token_embedding): Embedding(50257, 512)
  (position_embedding): Embedding(512, 512)
  (transformer_layers): ModuleList(
    (0-3): 4 x TransformerEncoderLayer(
      (self_attn): MultiheadAttention(
        (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True)
      )
      (linear1): Linear(in_features=512, out_features=2048, bias=True)
      (dropout): Dropout(p=0.1, inplace=False)
      (linear2): Linear(in_features=2048, out_features=512, bias=True)
      (norm1): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
      (norm2): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
      (dropout1): Dropout(p=0.1, inplace=False)
      (dropout2): Dropout(p=0.1, inplace=False)
    )
  )
  (fc_out): Linear(in_features=512, out_features=50257, bias=True)
)


Map:   0%|          | 0/10000 [00:00<?, ? examples/s]

  scaler = GradScaler()
  with autocast():


Epoch 1/1, Loss: 1.0144


### **Step 3: Fine-Tune the Model on a Small Subset of IMDB**

#### **Purpose**:
- To fine-tune the pre-trained GPT-2-like model on the **IMDB dataset** for sentiment analysis.
- This step adapts the model to a specific downstream task (e.g., generating text with positive or negative sentiment).

#### **Key Components**:
1. **Dataset**:
   - IMDB is a dataset of movie reviews labeled as positive or negative.
   - A smaller subset (e.g., 5,000 samples) is used to reduce fine-tuning time.

2. **Fine-Tuning**:
   - The model is fine-tuned using the same training loop as in Step 2.
   - The model learns to generate text that aligns with the sentiment of the input.

3. **Evaluation**:
   - The model's performance is evaluated using **loss** and **accuracy**.

#### **Output**:
- A fine-tuned GPT-2-like model that can generate text with specific sentiment.


In [None]:
# Step 1: Load the IMDB Dataset
imdb_dataset = load_dataset("imdb")

# Step 2: Use a Smaller Subset of the Dataset (5,000 samples)
small_imdb_train = imdb_dataset["train"].shuffle(seed=42).select(range(5000))

# Step 3: Tokenize the Dataset
def tokenize_function(examples):
    return tokenizer(
        examples["text"],
        truncation=True,
        padding="max_length",
        max_length=max_seq_len,
    )

tokenized_imdb = small_imdb_train.map(tokenize_function, batched=True, remove_columns=["text"])

# Step 4: Format the Dataset as PyTorch Tensors
tokenized_imdb.set_format(type="torch", columns=["input_ids"])

# Step 5: Prepare DataLoader
train_imdb_loader = DataLoader(tokenized_imdb, batch_size=8, shuffle=True)

# Step 6: Fine-Tune the Model (1 Epoch)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

num_epochs = 1  # Only 1 epoch for fine-tuning
for epoch in range(num_epochs):
    model.train()
    total_loss = 0

    for batch in train_imdb_loader:
        input_ids = batch["input_ids"].to(device)
        labels = input_ids[:, 1:].contiguous()  # Shift input to get labels

        # Mixed precision training
        optimizer.zero_grad()
        with autocast():
            outputs = model(input_ids[:, :-1])  # Predict next token
            loss = criterion(outputs.view(-1, vocab_size), labels.view(-1))

        # Backward pass and optimization
        scaler.scale(loss).backward()
        scaler.step(optimizer)
        scaler.update()

        total_loss += loss.item()

    # Print average loss for the epoch
    avg_loss = total_loss / len(train_imdb_loader)
    print(f"Fine-Tuning Epoch {epoch + 1}/{num_epochs}, Loss: {avg_loss:.4f}")

README.md:   0%|          | 0.00/7.81k [00:00<?, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/21.0M [00:00<?, ?B/s]

test-00000-of-00001.parquet:   0%|          | 0.00/20.5M [00:00<?, ?B/s]

unsupervised-00000-of-00001.parquet:   0%|          | 0.00/42.0M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating unsupervised split:   0%|          | 0/50000 [00:00<?, ? examples/s]

Map:   0%|          | 0/5000 [00:00<?, ? examples/s]

  with autocast():


Fine-Tuning Epoch 1/1, Loss: 3.4442


### **Step 4: Apply Reinforcement Learning with Human Feedback (RLHF)**

#### **Purpose**:
- To improve the model's text generation using **Reinforcement Learning with Human Feedback (RLHF)**.
- This step trains the model to generate text that aligns with human preferences (e.g., coherence, relevance, and sentiment).

#### **Key Components**:
1. **Reward Model**:
   - A smaller neural network that predicts a scalar reward based on the model's generated text.
   - The reward model is trained using human feedback or predefined rules.

2. **Reinforcement Learning Environment**:
   - A custom environment (`TextEnv`) is created to interact with the model and compute rewards.

3. **Reinforcement Learning Algorithm**:
   - A simpler algorithm like **A2C** (Advantage Actor-Critic) is used to train the model.
   - The model is trained to maximize the reward predicted by the reward model.

4. **Evaluation**:
   - The model's performance is evaluated based on the reward and generated text quality.

#### **Output**:
- A GPT-2-like model that generates text aligned with human preferences.

In [None]:
# Step 1: Install Required Libraries
!pip install stable-baselines3 gymnasium shimmy

# Step 2: Define a Smaller Reward Model
class RewardModel(nn.Module):
    def __init__(self, embed_dim):
        super(RewardModel, self).__init__()
        self.fc1 = nn.Linear(embed_dim, 256)  # Smaller hidden layer
        self.fc2 = nn.Linear(256, 1)         # Outputs a scalar reward

    def forward(self, x):
        x = F.relu(self.fc1(x))
        return self.fc2(x)

# Initialize the reward model
reward_model = RewardModel(embed_dim).to(device)

# Step 3: Define a Gymnasium-Compatible RL Environment
import gymnasium as gym
from gymnasium import spaces
import numpy as np

class TextEnv(gym.Env):
    def __init__(self, model, tokenizer, dataset):
        super(TextEnv, self).__init__()
        self.model = model
        self.tokenizer = tokenizer
        self.dataset = dataset
        self.current_idx = 0

        # Define action and observation spaces
        self.action_space = spaces.Discrete(vocab_size)  # Actions are token IDs
        self.observation_space = spaces.Box(low=0, high=vocab_size, shape=(max_seq_len,), dtype=np.int32)

    def reset(self, seed=None, options=None):
        # Reset the environment and return the initial observation
        super().reset(seed=seed)
        self.current_idx = 0
        initial_input_ids = self.dataset[self.current_idx]["input_ids"]
        return initial_input_ids.numpy(), {}

    def step(self, action):
        # Generate text using the model
        input_ids = torch.tensor([[action]]).to(device)  # Add an extra dimension to create a batch of size 1

        # Get embeddings for the input tokens
        with torch.no_grad():
            token_embeds = self.model.token_embedding(input_ids)

        # Compute reward using the reward model, using the token embeddings as input
        with torch.no_grad():
            reward = reward_model(token_embeds.squeeze(0)).item()  # Squeeze to remove the batch dimension

        # Move to the next sample in the dataset
        self.current_idx += 1
        if self.current_idx >= len(self.dataset):
            done = True
            next_input_ids = self.dataset[0]["input_ids"]  # Reset to the first sample
        else:
            done = False
            next_input_ids = self.dataset[self.current_idx]["input_ids"]

        # Return observation, reward, done, and info
        return next_input_ids.numpy(), reward, done, False, {}

# Step 4: Use a Smaller Dataset for RLHF
small_rlhf_dataset = tokenized_imdb.select(range(1000))  # Use only 1,000 samples

# Step 5: Initialize the Environment
env = TextEnv(model, tokenizer, small_rlhf_dataset)

# Step 6: Use a Simpler RL Algorithm (Policy Gradient instead of PPO)
from stable_baselines3 import A2C  # Simpler than PPO

# Train using A2C (fewer steps)
rl_model = A2C("MlpPolicy", env, verbose=1)
rl_model.learn(total_timesteps=1000)  # Reduced number of steps

Using cuda device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.




------------------------------------
| time/                 |          |
|    fps                | 243      |
|    iterations         | 100      |
|    time_elapsed       | 2        |
|    total_timesteps    | 500      |
| train/                |          |
|    entropy_loss       | -10.8    |
|    explained_variance | -35.3    |
|    learning_rate      | 0.0007   |
|    n_updates          | 99       |
|    policy_loss        | -4.33    |
|    value_loss         | 0.851    |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 1e+03    |
|    ep_rew_mean        | -111     |
| time/                 |          |
|    fps                | 275      |
|    iterations         | 200      |
|    time_elapsed       | 3        |
|    total_timesteps    | 1000     |
| train/                |          |
|    entropy_loss       | -10.8    |
|    explained_variance | -5.36    |
|    learning_rate      | 0.0007   |
|

<stable_baselines3.a2c.a2c.A2C at 0x7c3af0ac8d50>

### **Step 5: Inference and Evaluation**

#### **Purpose**:
- To generate text using the fine-tuned model and evaluate its performance on a test dataset.
- This step demonstrates the model's ability to generate coherent and relevant text.

#### **Key Components**:
1. **Text Generation**:
   - A custom `generate_text` function is used to generate text token by token.
   - The function supports **temperature scaling**, **top-k filtering**, and **top-p filtering** for diverse text generation.

2. **Evaluation**:
   - The model is evaluated on the test dataset over multiple epochs.
   - Metrics like **loss** and **accuracy** are computed to measure performance.

3. **Sample Output**:
   - The model generates text based on a given prompt (e.g., "I really enjoyed this movie because").

#### **Output**:
- Generated text that demonstrates the model's capabilities.
- Evaluation metrics that quantify the model's performance.


In [None]:
# Step 1: Load and Split the IMDB Dataset
from datasets import load_dataset

# Load the IMDB dataset
imdb_dataset = load_dataset("imdb")

# Split the dataset into training and test sets
train_test_split = imdb_dataset["train"].train_test_split(test_size=0.2, seed=42)
train_imdb = train_test_split["train"]
test_imdb = train_test_split["test"]

# Step 2: Tokenize the Test Dataset
def tokenize_function(examples):
    return tokenizer(
        examples["text"],
        truncation=True,
        padding="max_length",
        max_length=max_seq_len,
    )

# Tokenize the test dataset
tokenized_test_imdb = test_imdb.map(tokenize_function, batched=True, remove_columns=["text"])

# Step 3: Prepare DataLoader for the Test Dataset
from torch.utils.data import DataLoader

# Format the test dataset as PyTorch tensors
tokenized_test_imdb.set_format(type="torch", columns=["input_ids"])

# Create a DataLoader for the test dataset
test_imdb_loader = DataLoader(tokenized_test_imdb, batch_size=8)

# Step 4: Define a Custom Text Generation Function
def generate_text(model, tokenizer, prompt, max_length=50, temperature=1.0, top_k=50, top_p=0.95):
    model.eval()
    input_ids = tokenizer.encode(prompt, return_tensors="pt").to(device)

    # Generate text token by token
    for _ in range(max_length):
        with torch.no_grad():
            outputs = model(input_ids)
            logits = outputs[:, -1, :] / temperature  # Apply temperature scaling

            # Apply top-k and top-p filtering
            if top_k > 0:
                logits = top_k_filter(logits, top_k)
            if top_p > 0:
                logits = top_p_filter(logits, top_p)

            # Sample the next token
            probs = F.softmax(logits, dim=-1)
            next_token = torch.multinomial(probs, num_samples=1)

        # Append the next token to the input sequence
        input_ids = torch.cat([input_ids, next_token], dim=-1)

    # Decode the generated text
    generated_text = tokenizer.decode(input_ids[0], skip_special_tokens=True)
    return generated_text

# Helper functions for top-k and top-p filtering
def top_k_filter(logits, top_k):
    values, indices = torch.topk(logits, top_k)
    min_values = values[:, -1].unsqueeze(-1)
    return torch.where(logits < min_values, torch.ones_like(logits) * -float("inf"), logits)

def top_p_filter(logits, top_p):
    sorted_logits, sorted_indices = torch.sort(logits, descending=True)
    cumulative_probs = torch.cumsum(F.softmax(sorted_logits, dim=-1), dim=-1)

    # Remove tokens with cumulative probability above the threshold
    sorted_indices_to_remove = cumulative_probs > top_p
    sorted_indices_to_remove[..., 1:] = sorted_indices_to_remove[..., :-1].clone()
    sorted_indices_to_remove[..., 0] = 0

    # Scatter sorted indices back to original positions
    indices_to_remove = sorted_indices_to_remove.scatter(1, sorted_indices, sorted_indices_to_remove)
    logits[indices_to_remove] = -float("inf")
    return logits

# Step 5: Generate Text Using the Fine-Tuned Model
prompt = "I really enjoyed this movie because"
generated_text = generate_text(model, tokenizer, prompt)
print("Generated Text:")
print(generated_text)

# Step 6: Evaluate the Model on the Test Dataset (Multiple Epochs)
def evaluate_model(model, dataloader, num_epochs=3):
    model.eval()
    total_loss = 0
    correct = 0
    total = 0

    for epoch in range(num_epochs):
        epoch_loss = 0
        epoch_correct = 0
        epoch_total = 0

        with torch.no_grad():
            for batch in dataloader:
                input_ids = batch["input_ids"].to(device)
                labels = input_ids[:, 1:].contiguous()  # Shift input to get labels
                outputs = model(input_ids[:, :-1])      # Predict next token

                # Compute loss
                loss = criterion(outputs.view(-1, vocab_size), labels.view(-1))
                epoch_loss += loss.item()

                # Compute accuracy
                preds = torch.argmax(outputs, dim=-1)
                epoch_correct += (preds == labels).sum().item()
                epoch_total += labels.numel()

        # Print results for the epoch
        avg_loss = epoch_loss / len(dataloader)
        accuracy = epoch_correct / epoch_total
        print(f"Evaluation Epoch {epoch + 1}/{num_epochs}, Loss: {avg_loss:.4f}, Accuracy: {accuracy * 100:.2f}%")

        # Accumulate results across epochs
        total_loss += epoch_loss
        correct += epoch_correct
        total += epoch_total

    # Calculate average loss and accuracy across all epochs
    avg_loss = total_loss / (len(dataloader) * num_epochs)
    accuracy = correct / total
    print(f"Final Evaluation Results: Loss: {avg_loss:.4f}, Accuracy: {accuracy * 100:.2f}%")

# Evaluate on the test set with multiple epochs
print("Evaluating on the test set...")
evaluate_model(model, test_imdb_loader, num_epochs=3)  # Evaluate for 3 epochs

Generated Text:
I really enjoyed this movie because he can't good, and there for that is a long to be, or so as this movie of a lot (and so. Even is about this movie was a lot and that to be a good, with the fact in "or and and
Evaluating on the test set...
Evaluation Epoch 1/3, Loss: 3.2441, Accuracy: 54.48%
Evaluation Epoch 2/3, Loss: 3.2441, Accuracy: 54.48%
Evaluation Epoch 3/3, Loss: 3.2441, Accuracy: 54.48%
Final Evaluation Results: Loss: 3.2441, Accuracy: 54.48%
