# GPT-2 Movie Dialogue Metrics Evaluation: Summary

In this notebook, we evaluated the performance of our fine-tuned GPT-2 model using the BLEU metric. The evaluation was conducted against a random sample of movie quotes.

- **Comparison**: The BLEU score for our fine-tuned model was compared to the vanilla GPT-2 (medium) model on the same dataset. Interestingly, our fine-tuned model performed worse than the vanilla GPT-2.
  
- **Possible Causes**:
  - This result may be due to improper training or the quality of the dataset used.
  - Alternatively, it could be a random fluke given the small sample size.
  - Another possibility is the limitation of BLEU itself, as it might not fully capture the nuances of dialogue generation or conversational models.

Further analysis and improvements in training or evaluation may be necessary to better understand these results.

In [None]:
!pip install torch
!pip install sacrebleu
!pip install convokit transformers


In [2]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
import sacrebleu
from convokit import Corpus, download
import random
from sklearn.model_selection import train_test_split

# Device configuration
if torch.backends.mps.is_available() and torch.backends.mps.is_built():
    device = torch.device("mps")
elif torch.cuda.is_available():
    device = torch.device("cuda")
else:
    device = torch.device("cpu")




# Download the Cornell Movie Dialogs Corpus
corpus = Corpus(download("movie-corpus"))


Downloading movie-corpus to /root/.convokit/downloads/movie-corpus
Downloading movie-corpus from http://zissou.infosci.cornell.edu/convokit/datasets/movie-corpus/movie-corpus.zip (40.9MB)... Done


In [3]:
# Skip colab calls if running locally
from google.colab import drive
drive.mount('/content/drive')


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [4]:
# 1. Data Understanding
# Check basic information about the corpus
print(corpus)

# Initialize a set to store unique movie names
movie_names = set()

# Iterate through each conversation and extract the movie name
for convo in corpus.iter_conversations():
    movie_name = convo.meta['movie_name']
    movie_names.add(movie_name)

# Display the number of unique movies
print(f"Total unique movies: {len(movie_names)}")

# Display the names of all movies
for movie in random.sample(movie_names, 10):
    print(movie)

<convokit.model.corpus.Corpus object at 0x78c6e0497940>
Total unique movies: 617
thunderheart
twelve monkeys
braveheart
escape from the planet of the apes
blade runner
kalifornia
gods and monsters
dave
nurse betty
taking sides


since Python 3.9 and will be removed in a subsequent version.
  for movie in random.sample(movie_names, 10):


In [5]:
import random

def get_random_movie_dialogs(corpus, num_samples=5):
    # Initialize a list to store conversation data
    convo_data = []

    # Iterate through conversations and collect movie name and dialogue data
    for convo in corpus.iter_conversations():
        movie_name = convo.meta['movie_name']

        # Collect the utterances in the conversation
        dialog = []
        for utt in convo.iter_utterances():
            dialog.append(utt.text)

        # Join all utterances into a single string to represent the whole conversation as one reference
        full_dialog = " ".join(dialog)

        # Add movie name and conversation dialog (now a single string) to the list
        convo_data.append((movie_name, full_dialog))

    # Randomly select the specified number of conversations
    random_sample = random.sample(convo_data, num_samples)

    return random_sample

# Example usage:
random_dialogs = get_random_movie_dialogs(corpus, 5)

# Print the random movie dialogs
for movie_name, dialog in random_dialogs:
    print(f"Movie: {movie_name}")
    print("Dialog:")
    print(f"  {dialog}")
    print("-" * 40)  # Separator for readability

Movie: u-turn
Dialog:
  I'm just flesh and blood, baby. That and a few memories of bad women; just like most guys.  But you already know that.  You read my fortune.  Thanks for the lemonade. Maybe I like to find out about a man first.  Maybe I like to know what he's made of. I think I can find my own way back to into town.
----------------------------------------
Movie: ghost world
Dialog:
  I've been looking all over for this. You lent it to me in like tenth grade. Hey - why do you have this?
----------------------------------------
Movie: stepmom
Dialog:
  Of course not.  Does he look lost to you?  BENNNNN!!! You lost Ben?! I didn't forget.  I was up all night thinking about it and I concluded you're too special to look like everyone else.  Orange Red.  That's your color.  Few can carry it off.  Now please.  Help me find your brother. You forgot to wash my purple shirt.  I told you a hundred times it was Purple Day at school today.
----------------------------------------
Movie: the 

In [6]:
import sacrebleu
import torch
import re

def bleu_metric(model, tokenizer, corpus, num_samples=20, device='cpu'):
    # Example prompts for model generation
    sample = get_random_movie_dialogs(corpus, num_samples)

    # Extract the substring up to the first punctuation and store it in prompts
    prompts = [re.split(r'[.!?]', dialog, 1)[0] for _, dialog in sample]

    # Expanded reference texts (expected ground truth)
    # Wrap each reference dialog in a list as required by sacrebleu
    references = [[dialog] for _, dialog in sample]

    # Generate responses using the fine-tuned model
    def generate_text(prompt, max_length=200):
        inputs = tokenizer(prompt, return_tensors="pt", padding=True, truncation=True, max_length=512).to(device)
        input_ids = inputs["input_ids"]
        attention_mask = inputs["attention_mask"]

        outputs = model.generate(
            input_ids=input_ids,
            attention_mask=attention_mask,
            eos_token_id=tokenizer.eos_token_id,
            num_return_sequences=1,
            max_new_tokens=max_length,
            min_new_tokens=10,
            no_repeat_ngram_size=2,  # Avoid repetition
        )
        return tokenizer.decode(outputs[0], skip_special_tokens=True)

    # Move model to the correct device
    model.to(device)

    # Generate text for each prompt
    generated_texts = [generate_text(prompt) for prompt in prompts]

    # Calculate BLEU score
    bleu = sacrebleu.corpus_bleu(generated_texts, references)

    # Output results
    print("Generated Texts:")
    for i, gen_text in enumerate(generated_texts):
        print(f"Prompt {i + 1}: {prompts[i]}")
        print(f"Generated: {gen_text}")
        print(f"Reference: {references[i][0]}\n")

    print(f"BLEU score: {bleu.score}")

In [8]:

# Load fine-tuned model.
tokenizer_ft = AutoTokenizer.from_pretrained("/content/drive/MyDrive/Movies_latest")
model_ft = AutoModelForCausalLM.from_pretrained("/content/drive/MyDrive/Movies_latest").to(device)
tokenizer_ft.pad_token = tokenizer_ft.eos_token

# Call BLEU metric function
bleu_metric(model=model_ft, tokenizer=tokenizer_ft, corpus=corpus)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end gene

Generated Texts:
Prompt 1: Crowd control
Generated: Crowd control this is not a shooting this was an accident waiting to happen were going to kill now we cant tell when it is coming our only chance is to try to figure out when its coming and if its not when were have been saying all week it wont happen when i amnt around the bullets have to pass through people to us so no moving parts or casings so bullets dont bounce off things like glass or flesh or metal or faeces but theres still a chance some things hit the wrong nerve in this chick and its going out as soon as its over its okay to go back to sleep its a chaos the only thing i can do is wait and see but if it was a accident or if something washes mistaken ill do something about it as fast as i possibly can then as often as possible maybe not im going on vacation for the weekend and as far as im concerned the next one is just a matter of minutes or hours maybe more its like a vacation in a nutshell but at least now im part
Referenc

In [9]:
tokenizer_og = AutoTokenizer.from_pretrained("gpt2-medium")
model_og = AutoModelForCausalLM.from_pretrained("gpt2-medium").to(device)
tokenizer_og.pad_token = tokenizer_og.eos_token
bleu_metric(model=model_og, tokenizer=tokenizer_og, corpus=corpus)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad

Generated Texts:
Prompt 1: Jackie, get in the car
Generated: Jackie, get in the car. I'm going to go get you."

"I'm not going anywhere," I said. "I'll be fine." I was going home.
, I had to get out of there. It was a long drive. The car was so heavy that I couldn't even get into the passenger seat. My legs were shaking. But I didn't care. There was no way I could get back. So I got out and started walking. As I walked, my heart pounded. And then I heard a voice. A voice that said, "You're going nowhere." And I knew it was me. That voice was telling me to stay put. Then I saw a car coming. At that moment, the voice stopped. When I looked back, it had disappeared.
Reference: Jackie, get in the car. NOW! Mom, he just fixed our car. Get your fucking hands off my daughter!

Prompt 2: Au contraire, mon ami
Generated: Au contraire, mon ami, il n'y a pas de la vie.

"I'm not going to be a part of this," he said. "I don't want to."
,
...



I was sitting in the kitchen, eating a sandwich, when