# Fine-Tuning GPT-2 for Positive Movie Reviews with Direct Preference Optimization

This project fine-tunes a GPT-2 language model to generate positive movie reviews using **Direct Preference Optimization (DPO)** — a reinforcement learning-free approach to preference modeling. The training relies on automatically labeled pairs of completions based on sentiment, derived from a pre-trained classifier.

Direct Preference Optimization (DPO) is a recent method that trains language models to prefer one output over another — without relying on reward models or reinforcement learning. Instead, DPO uses a simple logistic-style loss over a pair of responses:

- **Input**: `(prompt, chosen_response, rejected_response)`
- **Loss**: Encourages the model to assign a higher log-probability to the `chosen_response` than the `rejected_response`

---

## Dataset

We use the IMDb dataset from Hugging Face. It contains raw movie reviews without sentiment labels.
Since we aim to fine-tune GPT-2 to generate positive reviews, we follow this pipeline:

1. Construct prompts from raw reviews (e.g.,  
   `"Generate a positive movie review based on this input: …"`)

2. Generate multiple completions per prompt using GPT-2

3. Use a fine-tuned sentiment classifier
   (`distilbert-base-uncased-finetuned-sst-2-english`) to rank the completions

4. Store `(prompt, chosen, rejected)` triplets for training

This creates a preference dataset without requiring any manual labeling.

---

## Fine-Tuning Process

- Fine-tune GPT-2 using the `trl` library's `DPOTrainer`
- Run training for 5 epochs
- Log training and evaluation metrics using Weights & Biases
- Evaluate generations after each epoch using a consistent prompt.
During generation, we noticed the model sometimes produced repetitive or looping outputs. To address this, we added the decoding parameter `repetition_penalty = 1.2`.
This reduced repetition, improved coherence, and resulted in more natural and fluent completions.


---

## Model Architecture & Workflow

                    +-----------------------------+
                    |       IMDb Raw Review       |
                    |        (Unsupervised)       |
                    +-------------+---------------+
                                  |
                                  v
                    +-----------------------------+
                    |        Build Prompt         |
                    |"Generate a positive review…"|
                    +-------------+---------------+
                                  |
                                  v
                    +-----------------------------+
                    |      GPT-2 (Base Model)     |
                    |    generates completions    |
                    +-------+-------------+-------+
                        |                     |
                        v                     v
                +---------------+     +---------------+
                | Completion A  |     | Completion B  |
                +---------------+     +---------------+
                        \                   /
                         \                 /
                          v               v
                    +-----------------------------+
                    |    Sentiment Classifier     |
                    |        (DistilBERT)         |
                    +-------------+---------------+
                                  |
                                  v
                    +-----------------------------+
                    |      Rank Completions       |
                    |  → Chosen (more positive)   |
                    |  → Rejected (less positive) |
                    +-------------+---------------+
                                  |
                                  v
                    +-----------------------------+
                    |  DPO fine-tunes GPT-2 using |
                    | (prompt, chosen, rejected)  |
                    |         triples             |
                    +-----------------------------+




In [None]:
!pip install -q transformers datasets trl accelerate peft
!pip install wandb

import wandb
wandb.login()

from datasets import load_dataset, Dataset
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
from tqdm import tqdm
from trl import DPOTrainer, DPOConfig
from pprint import pprint

import torch
import random
import json
import os

### Model and Dataset Initialization

In this section, we prepare all the necessary components for our sentiment-driven text generation project.

1. **Dataset: IMDb (Unsupervised Split)**  
   We load the unsupervised split of the [IMDb dataset](https://huggingface.co/datasets/imdb), which contains movie reviews without sentiment labels. We will assign these labels ourselves using a sentiment classifier.

3. **Language Model: GPT-2**  
   We load the `gpt2` model and its tokenizer from Hugging Face. GPT-2 is a widely used autoregressive language model, chosen here because it is lightweight ans also pretrained on a large corpus of web text.
   We configure the tokenizer with:
   - `pad_token = eos_token`: Since GPT-2 has no dedicated padding token, we use the end-of-sequence (EOS) token to represent padding.
   - `padding_side = "left"`: Padding on the left ensures that the most recent tokens (at the end of the sequence) are aligned across batches. This is important for decoder-only models like GPT-2, which process input autoregressively from left to right, attending only to past tokens during generation.

4. **Sentiment Analysis: Fine-Tuned DistilBERT**  
   We use the pipeline API to load a **sentiment analysis model**, specifically `distilbert-base-uncased-finetuned-sst-2-english`. This is a lightweight, distilled version of BERT fine-tuned on the Stanford Sentiment Treebank (SST-2) dataset.  
   Sentiment analysis is the task of classifying text based on emotional tone — typically **positive**, **negative**, or **neutral**. It helps in understanding user opinions, reviews, or feedback automatically.

This setup allows us to:
- Use **DistilBERT** to assign positive/negative labels to raw IMDb reviews and generate labeled examples for training.
- Fine-tune **GPT-2** to generate text conditioned on positive sentiment by learning from these labeled examples.


In [None]:
# Set device
device = "cuda" if torch.cuda.is_available() else "cpu"

# Load IMDb unsupervised dataset
dataset = load_dataset("imdb", split="unsupervised")

# Load GPT-2 model and tokenizer
model_name = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "left"

model = AutoModelForCausalLM.from_pretrained(model_name).to(device)

# Load sentiment analysis pipeline using DistilBERT
sentiment_pipe = pipeline(
    task="sentiment-analysis",
    model="distilbert-base-uncased-finetuned-sst-2-english",
    device=0 if device == "cuda" else -1,
    truncation=True,
    max_length=512,
)

### Prompt Construction, Text Generation, and Sentiment-Based Ranking

This section defines three core functions used for generating and ranking movie reviews based on sentiment.

---

#### 1. `build_prompt_from_review(review, max_tokens=100)`
This function creates a GPT-style prompt from a raw IMDb review. It tokenizes and truncates the review to a fixed number of tokens (`max_tokens`, default: 100) and formats it into the following structure:

`"Generate a positive movie review based on this input: <truncated review>"`


This setup helps guide GPT-2 to generate text that aligns with a specific sentiment.

---

#### 2. Text Generation — `generate_completions_batch(prompts, num_completions=5, max_new_tokens=100)`
This function generates multiple completions for each input prompt using GPT-2.

- **Batching for Speed**: Prompts are tokenized and processed in a batch, which speeds up computation compared to handling them one-by-one.
- **`num_completions`**: The number of different completions to generate for each prompt. For example, setting `num_completions=5` returns five completions per prompt.
- **`max_new_tokens`**: The maximum number of tokens that the model is allowed to generate beyond the original prompt. This helps constrain the length of generated responses.
- **Left Padding**: Since GPT-2 is a decoder-only model that generates text autoregressively from left to right, we use `padding_side = "left"`.
- After generation, the function trims the prompt portion from each output, returning only the new text generated by the model.

---

#### 3. Sentiment-Based Ranking — `rank_by_sentiment_batch(completions)`
This function evaluates and ranks generated completions based on how **positive** they are, using a pre-trained **DistilBERT** sentiment classifier (`distilbert-base-uncased-finetuned-sst-2-english`).

- **Batching** is used here as well (`batch_size=5`) to accelerate sentiment prediction across multiple completions.
- Each output receives:
  - A `label`: either `"POSITIVE"` or `"NEGATIVE"`
  - A `score`: confidence level for the predicted label
- The ranking logic is defined as:

`score = entry["score"] if label == "POSITIVE" else 1 - entry["score"]`

This means:
- Higher scores correspond to more confidently **positive** reviews
- Negative completions are penalized by inverting their scores

The completions are then sorted from **most to least positive**, allowing us to pick the most sentiment-aligned generations.

---

### Summary
By combining these three functions:
- We construct prompts from unlabeled IMDb reviews
- Generate diverse completions using GPT-2
- And rank them by positivity using DistilBERT




In [None]:
def build_prompt_from_review(review, max_tokens=100):
    """
    Construct a prompt by truncating a raw review and formatting it for GPT-2 generation.
    """
    tokens = tokenizer.tokenize(review)
    truncated_review = tokenizer.convert_tokens_to_string(tokens[:max_tokens])
    return f"Generate a positive movie review based on this input: {truncated_review}"


def generate_completions_batch(prompts, num_completions=5, max_new_tokens=100):
    """
    Generate multiple completions for each input prompt using GPT-2.

    Args:
        prompts (List[str]): Input prompts.
        num_completions (int): Number of completions per prompt.
        max_new_tokens (int): Maximum number of tokens to generate per completion.

    Returns:
        List[List[str]]: A list of completion lists (one per prompt).
    """

    # Tokenize input prompts with truncation
    inputs = tokenizer(
        prompts,
        return_tensors="pt",
        padding=True,
        truncation=True,
        max_length=1024 - max_new_tokens
    ).to(device)

    # Track input lengths for trimming the prompt part from outputs
    input_lengths = [
        len(tokenizer(p, truncation=True, max_length=1024 - max_new_tokens)["input_ids"])
        for p in prompts
    ]

    # Generate outputs
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            do_sample=True,
            top_k=50,
            top_p=0.95,
            temperature=0.9,
            num_return_sequences=num_completions,
            pad_token_id=tokenizer.eos_token_id,
            eos_token_id=tokenizer.eos_token_id,
        )

    # Group outputs by original prompt
    completions_per_prompt = [
        outputs[i * num_completions : (i + 1) * num_completions]
        for i in range(len(prompts))
    ]

    # Decode each group and strip the prompt portion
    decoded_outputs = [
        [
            tokenizer.decode(output[input_len:], skip_special_tokens=True).strip()
            for output in group
        ]
        for group, input_len in zip(completions_per_prompt, input_lengths)
    ]

    return decoded_outputs


def rank_by_sentiment_batch(completions):
    """
    Rank a list of completions by sentiment using the DistilBERT classifier.

    Args:
        completions (List[str]): The text completions to evaluate.

    Returns:
        List[Tuple[str, Dict]]: Completions and their sentiment results, sorted from most to least positive.
    """
    results = sentiment_pipe(completions, batch_size=5)  # Batched for performance

    def score(entry):
        label = entry["label"]
        confidence = entry["score"]
        return confidence if label == "POSITIVE" else 1 - confidence

    return sorted(zip(completions, results), key=lambda x: score(x[1]), reverse=True)


### Constructing Preference Data from Generated Completions

This section loops through the entire IMDb unsupervised dataset to build a dataset of preferences for fine-tuning a language model using DPO (Direct Preference Optimization).

---

#### What This Code Does

1. **Batch Processing**  
   The dataset is processed in batches (default size: 64) to efficiently handle generation and sentiment classification.

2. **Prompt Construction**  
   For each batch, raw IMDb reviews are converted into GPT-2-style prompts using `build_prompt_from_review()`.


3. **Text Generation**  
For each prompt, the function `generate_completions_batch()` generates multiple completions (e.g., 5 variations) using GPT-2 with controlled sampling (top-k, top-p, temperature).

4. **Sentiment Ranking**  
The completions are passed through a sentiment classifier (`rank_by_sentiment_batch()`), which ranks them from most to least positive based on the classifier’s confidence scores.

5. **Output Format**  
For each prompt, we record:
- The original prompt
- The most positive completion as the chosen
- The least positive completion as the rejected

The output format is as follow:

```json
{
    "prompt": "<input prompt>",
    "chosen": "<most positive completion>",
    "rejected": "<least positive completion>"
}


In [None]:
# Set batch size
batch_size = 64

# Use the entire IMDb unsupervised dataset
num_samples = len(dataset)
print(f"Total number of prompts in dataset: {num_samples}")

# List to store prompt-completion preference data
preference_data = []

# Process the dataset in batches
for start in tqdm(range(0, num_samples, batch_size)):
    end = min(start + batch_size, num_samples)

    # Prepare raw reviews and construct prompts
    batch_reviews = [dataset[i]['text'] for i in range(start, end)]
    batch_prompts = [build_prompt_from_review(review) for review in batch_reviews]

    try:
        # Generate multiple completions per prompt
        all_completions = generate_completions_batch(
            batch_prompts,
            num_completions=5
        )

        # Rank completions by sentiment and collect best/worst samples
        for prompt, completions in zip(batch_prompts, all_completions):
            ranked = rank_by_sentiment_batch(completions)
            chosen = ranked[0][0]    # Most positive
            rejected = ranked[-1][0]  # Least positive

            preference_data.append({
                "prompt": prompt,
                "chosen": chosen,
                "rejected": rejected
            })

    except Exception as e:
        print(f"Error processing batch starting at index {start}: {e}")
        continue

Total number of prompts in dataset: 50000


100%|██████████| 782/782 [42:56<00:00,  3.29s/it]


In [None]:
# Convert the list of prompt-preference dictionaries into a Hugging Face Dataset
preference_dataset = Dataset.from_list(preference_data)

# # Display dataset size
print(f"Dataset contains {len(preference_dataset)} samples.")

# Preview the first sample
pprint(preference_data[0])

Dataset contains 50000 samples.
{'chosen': 'depths of a cult, is very complex and well thought out. It is also '
           'a very moving tale and it does it in such an entertaining way that '
           'I love it and I could not be more pleased with the overall '
           'experience. I will keep listening to this movie to see if it is '
           'any better, but if not, give it a try... Read more',
 'prompt': 'Generate a positive movie review based on this input: This is just '
           'a precious little diamond. The play, the script are excellent. I '
           'cant compare this movie with anything else, maybe except the movie '
           '"Leon" wonderfully played by Jean Reno and Natalie Portman. But... '
           'What can I say about this one? This is the best movie Anne '
           'Parillaud has ever played in (See please "Frankie Starlight", '
           "she's speaking English there) to see what I mean. The story of "
           'young punk girl Nikita, taken 

### Preparing for DPO Fine-Tuning

This section splits the dataset and defines the training configuration for Direct Preference Optimization (DPO) using the `trl` library.
We split the preference dataset into training and evaluation subsets using a 90/10 split. We define a `DPOConfig` object to control the training behavior. Moreover, we create a `DPOTrainer` instance that handles

- Batching and formatting `prompt` / `chosen` / `rejected` samples  
- Applying the DPO loss function  
- Running training and evaluation  
- Saving the best-performing model based on evaluation loss  




In [None]:
# Split the preference dataset into training and evaluation sets (90% / 10%)
train_valid = preference_dataset.train_test_split(test_size=0.1, seed=42)
train_dataset = train_valid["train"]
eval_dataset  = train_valid["test"]

# Define training configuration for DPO fine-tuning
config = DPOConfig(
    beta=0.1,                               # Inverse temperature for DPO loss
    learning_rate=5e-5,                     # Fine-tuning learning rate
    per_device_train_batch_size=64,         # Batch size per device
    num_train_epochs=5,                     # Number of training epochs
    logging_steps=100,                      # Log training metrics every 100 steps
    eval_strategy="epoch",                  # Run evaluation at the end of each epoch
    save_strategy="epoch",                  # Save a checkpoint at the end of each epoch
    # save_total_limit=2,                   # Optional: keep only the 2 most recent checkpoints
    load_best_model_at_end=True,            # Load best-performing model based on eval loss
    metric_for_best_model="eval_loss",      # Use evaluation loss for model selection
    greater_is_better=False,                # Lower loss is considered better
    report_to="wandb",                      # Report metrics to Weights & Biases
    output_dir="/workspace/gpt2-dpo-imdb"   # Directory to save model checkpoints and logs
)

# Initialize the DPO trainer
trainer = DPOTrainer(
    model=model,                            # Pretrained GPT-2 model to be fine-tuned
    args=config,                            # Training configuration (DPOConfig)
    train_dataset=train_dataset,            # Dataset of prompts with preferred completions
    eval_dataset=eval_dataset,              # Held-out evaluation set (10% of data)
    processing_class=tokenizer              # Tokenizer for preprocessing input/output
)

### DPO Training Results Summary

The table below shows the training and evaluation metrics logged over 5 epochs during fine-tuning of GPT-2 using Direct Preference Optimization (DPO). Training was performed on an NVIDIA A100 PCIe GPU for 45 minutes.

---

#### Key Metrics

- **Training Loss**: Decreased steadily from `0.0821` in epoch 1 to near-zero (`0.0001`) by epoch 5, indicating that the model quickly learned to distinguish between preferred (positive) and rejected completions.
- **Validation Loss**: Remained stable throughout training, ranging from `0.0668` to `0.0786`, suggesting consistent generalization performance.
- **Rewards (chosen/rejected)**:
  - Chosen completions had less negative reward values (e.g., around `-6.6` to `-9.7`)
  - Rejected completions received significantly more negative rewards (e.g., `-17.3` to `-27.5`)
  - This widening gap indicates the model learned to favor completions aligned with positive sentiment.
- **Rewards/Accuracies**: Maintained a high value around `0.97`, meaning that the model preferred the better (more positive) completion in ~97% of cases.
- **Logps and Logits**:
  - Log-probabilities and logits for chosen completions increased in absolute magnitude and were clearly distinguishable from rejected ones by the end of training.
  - This indicates increased model confidence in its preferences.

In [None]:
# Start DPO fine-tuning
trainer.train()



Epoch,Training Loss,Validation Loss,Rewards/chosen,Rewards/rejected,Rewards/accuracies,Rewards/margins,Logps/chosen,Logps/rejected,Logits/chosen,Logits/rejected
1,0.0821,0.078679,-6.619421,-17.387569,0.9734,10.768147,-315.803711,-428.713837,-30.952179,-48.807888
2,0.0162,0.07019,-8.06495,-21.061266,0.9782,12.996313,-330.259064,-465.450806,-45.05788,-70.751205
3,0.0012,0.066828,-8.559361,-23.89463,0.981,15.335272,-335.203125,-493.784454,-51.758068,-76.813622
4,0.0002,0.068424,-9.440296,-26.522356,0.9816,17.082062,-344.012512,-520.061829,-50.273396,-78.096497
5,0.0001,0.071569,-9.727662,-27.520519,0.9818,17.79286,-346.886139,-530.043274,-49.534012,-78.051064


There were missing keys in the checkpoint model loaded: ['lm_head.weight'].


TrainOutput(global_step=3520, training_loss=0.0319982063901989, metrics={'train_runtime': 2756.3353, 'train_samples_per_second': 81.63, 'train_steps_per_second': 1.277, 'total_flos': 0.0, 'train_loss': 0.0319982063901989, 'epoch': 5.0})

### Inference and Output Comparison Across Checkpoints

After training with Direct Preference Optimization (DPO), we evaluate the model's progress by generating completions from the base GPT-2 model and from each checkpoint saved at the end of every epoch. We use the following prompt for all generations:

```text
Generate a positive movie review based on this input: The acting was mediocre, but the story...
```

---

#### Inference Setup

* We generate one completion per prompt for:

  * The base model (before fine-tuning)
  * Checkpoints at steps: `704`, `1408`, `2112`, `2816`, and `3520` (end of epochs 1 through 5)
* Sampling configuration: `top_k=50`, `top_p=0.95`, `temperature=0.9`, `max_new_tokens=100`

---

#### Output Comparison

**Base Model Output**
The base GPT-2 model’s output is generic and lacks sentiment alignment. It is off-topic and incoherent toward the end.

> *"...The writing was not very good, but at least they said it had a good ending... There was no voice acting, but they did manage to make a strong voice, which..."*

---

**Epoch 1**

The generation shows improved positivity and thematic structure. It reflects action and atmosphere well but becomes somewhat verbose.

> *"...The first few scenes were the perfect light novels... The pace was fast. The action was very good... The tone was very good, and the action was great..."*

---

**Epoch 2**

This output suffers from severe repetition and lack of coherence.

> *"Okay, okay, okay..." (repeated over 30 times)*

---

**Epoch 3**

The sentiment is positive and aligned, but output is highly redundant.

> *"Overall, this movie was good..." (repeated almost identically over multiple lines)*

---

**Epoch 4**

Notable improvement. Shows structured phrasing and more variety.

> *"...a good movie with good chemistry... The story is great. And the action is great..."*

---

**Epoch 5**

Tone is positive, but repetition persists.

> *"...it's a great movie... But it's a great movie... And it's good..."*

---

### Summary

* The base model fails to generate sentiment-aligned completions.
* Epoch 1 begins to align with positive tone and structure.
* Epochs 2–3 struggle with redundancy despite improved sentiment.
* Epoch 4 shows strong coherence and structured review-style writing.
* Epoch 5 maintains tone but overuses similar phrasing.



In [2]:
# === Settings ===
base_model_name = "gpt2"
checkpoint_dir = "/workspace/gpt2-dpo-imdb"
checkpoint_names = [
    "checkpoint-704",
    "checkpoint-1408",
    "checkpoint-2112",
    "checkpoint-2816",
    "checkpoint-3520"
]
prompts = [
    "Generate a positive movie review based on this input: The acting was mediocre, but the story..."
]
output_file = "generations.txt"
max_new_tokens = 100

# === Load Tokenizer ===
tokenizer = AutoTokenizer.from_pretrained(base_model_name)
tokenizer.pad_token = tokenizer.eos_token

# === Generation Function ===
def generate_text(model, prompt):
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    with torch.no_grad():
        output = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            do_sample=True,
            top_k=50,
            top_p=0.95,
            temperature=0.9,
            pad_token_id=tokenizer.eos_token_id,
            eos_token_id=tokenizer.eos_token_id,
        )
    return tokenizer.decode(output[0], skip_special_tokens=True)

# === Output Helper ===
def write_and_print(f, text):
    print(text)
    f.write(text + "\n")

# === Run Generation for Base Model and Checkpoints ===
with open(output_file, "w") as f:
    # Base model
    write_and_print(f, "==== BASE MODEL ====")
    base_model = AutoModelForCausalLM.from_pretrained(base_model_name).eval().to(
        "cuda" if torch.cuda.is_available() else "cpu"
    )
    for prompt in prompts:
        write_and_print(f, f"\nPrompt:\n{prompt}")
        output = generate_text(base_model, prompt)
        write_and_print(f, f"Base model output:\n{output}")

    # Fine-tuned checkpoints
    for ckpt_name in checkpoint_names:
        ckpt_path = os.path.join(checkpoint_dir, ckpt_name)
        write_and_print(f, f"\n==== {ckpt_name} ====")
        model = AutoModelForCausalLM.from_pretrained(ckpt_path).eval().to(
            "cuda" if torch.cuda.is_available() else "cpu"
        )
        for prompt in prompts:
            write_and_print(f, f"\nPrompt:\n{prompt}")
            output = generate_text(model, prompt)
            write_and_print(f, f"Fine-tuned output:\n{output}")

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

==== BASE MODEL ====


model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]


Prompt:
Generate a positive movie review based on this input: The acting was mediocre, but the story...
Base model output:
Generate a positive movie review based on this input: The acting was mediocre, but the story... I don't want to judge. The writing was not very good, but at least they said it had a good ending and it had some good action sequences. The characters were very similar to other people I have seen or seen. The script, though, was mostly good. The story was very good. It is good if you want to compare this to other popular anime because it really can't get any better. There was no voice acting, but they did manage to make a strong voice, which

==== checkpoint-704 ====

Prompt:
Generate a positive movie review based on this input: The acting was mediocre, but the story...
Fine-tuned output:
Generate a positive movie review based on this input: The acting was mediocre, but the story... I enjoyed the performance with great intensity. "The first few scenes were the perfect

### Updated Generations with Repetition Penalty

In this version, we updated the `generate_text` function to include the `repetition_penalty` parameter. This adjustment helps reduce redundancy  in the model's output by discouraging the reuse of the same tokens repeatedly during decoding. We re-generated completions from the base model and each fine-tuned checkpoint using the previous prompt. Below are the updated outputs after applying the repetition penalty:

---

#### Output Comparison (With Repetition Penalty)

**Base Model Output**

The base model remains incoherent and off-topic with no clear positive sentiment or structure.

> *"...there is no redeeming effect in having him as his 'guy'... maybe pick up A2DIII and give some thought to where they fit..."*

---

**Epoch 1**

More positive tone and structure appear, with some overgeneralization. The review introduces award references and viewing recommendations.

> *"...excellent film... great voice acting... highly recommend watching him... just as award-winning director for his first feature..."*

---

**Epoch 2**

Improved sentence coherence and natural flow. Characters are described in a more human and likable way.

> *"...an enjoyable read... each character individually enough... great stuff... amazing books by Christopher Lee..."*

---

**Epoch 3**

Review becomes concise, relevant, and sentiment-aligned. It includes specific narrative details and a clear recommendation.

> *"...very entertaining... highly recommend... a great action flick about a young boy named Jack..."*

---

**Epoch 4**

Wide genre coverage and strong recommendation tone. More diverse sentence structure and review realism.

> *"...good action movies that have great comedy themes... plenty different types within each genre... very special feature..."*

---

**Epoch 5**

Shows a polished and mature review style. References acting, character interaction, and notable cast members.

> *"...good and interesting... really enjoyed watching... terrific performance from Chris Rock..."*

---

### Summary

* Adding `repetition_penalty=1.2` reduced redundancy and improved diversity in outputs.
* Base model remained generic and sentiment-neutral.
* From Epoch 1, outputs showed increasing alignment with prompt intent.
* Epochs 2–3 delivered more human-like structure and focus.
* Epochs 4–5 produced well-rounded reviews with genre and cast references.





In [3]:
# === Settings ===
base_model_name = "gpt2"
checkpoint_dir = "/workspace/gpt2-dpo-imdb"
checkpoint_names = [
    "checkpoint-704",
    "checkpoint-1408",
    "checkpoint-2112",
    "checkpoint-2816",
    "checkpoint-3520"
]
prompts = [
    "Generate a positive movie review based on this input: The acting was mediocre, but the story..."
]
output_file = "generations_v2.txt"
max_new_tokens = 100

# === Load Tokenizer ===
tokenizer = AutoTokenizer.from_pretrained(base_model_name)
tokenizer.pad_token = tokenizer.eos_token

# === Generation Function ===
def generate_text(model, prompt):
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    with torch.no_grad():
        output = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            do_sample=True,
            top_k=50,
            top_p=0.95,
            temperature=0.9,
            repetition_penalty=1.2,    # Add repetition_penalty to discourage repetitive output
            pad_token_id=tokenizer.eos_token_id,
            eos_token_id=tokenizer.eos_token_id,
        )
    return tokenizer.decode(output[0], skip_special_tokens=True)


# === Generate Outputs for Base Model and Fine-Tuned Checkpoints ===
with open(output_file, "w") as f:
    # Base model output
    write_and_print(f, "==== BASE MODEL ====")
    base_model = AutoModelForCausalLM.from_pretrained(base_model_name).eval().to(
        "cuda" if torch.cuda.is_available() else "cpu"
    )
    for prompt in prompts:
        write_and_print(f, f"\nPrompt:\n{prompt}")
        output = generate_text(base_model, prompt)
        write_and_print(f, f"Base model output:\n{output}")

    # Outputs from each fine-tuned checkpoint
    for ckpt_name in checkpoint_names:
        ckpt_path = os.path.join(checkpoint_dir, ckpt_name)
        write_and_print(f, f"\n==== {ckpt_name} ====")
        model = AutoModelForCausalLM.from_pretrained(ckpt_path).eval().to(
            "cuda" if torch.cuda.is_available() else "cpu"
        )
        for prompt in prompts:
            write_and_print(f, f"\nPrompt:\n{prompt}")
            output = generate_text(model, prompt)
            write_and_print(f, f"Fine-tuned model output:\n{output}")

==== BASE MODEL ====

Prompt:
Generate a positive movie review based on this input: The acting was mediocre, but the story...
Base model output:
Generate a positive movie review based on this input: The acting was mediocre, but the story... well enough that it's not considered to be good. Even without trying out all of these different ways, there is no redeeming effect in having him as his "guy" (as he'd usually call himself).
That being said if you are looking for something better than your previous two trailers , or get yourself into another one with other people working at Warner Bros., then maybe pick up A2DIII and give some thought to where they fit between them while making sure everyone knows exactly what their

==== checkpoint-704 ====

Prompt:
Generate a positive movie review based on this input: The acting was mediocre, but the story...
Fine-tuned model output:
Generate a positive movie review based on this input: The acting was mediocre, but the story... But overall it is an