# AI-Driven NLP: Language Model Exploration

## Research Questions & Objectives
- How well does GPT-2 understand contextual prompts?
- Can GPT-2 generate coherent and creative responses?
- What are the limitations of GPT-2 in handling ambiguous inputs?


## Implementation of GPT-2

In [1]:
from transformers import GPT2LMHeadModel, GPT2Tokenizer
import torch

# Load GPT-2 model and tokenizer
model_name = 'gpt2'
tokenizer = GPT2Tokenizer.from_pretrained(model_name)
model = GPT2LMHeadModel.from_pretrained(model_name)


## Exploration & Analysis
We will analyze GPT-2 by providing different prompts and evaluating its responses.

In [4]:
sample_prompts = [
    "Once upon a time in a futuristic city, there was a robot who",
    "The key difference between classical physics and quantum mechanics is",
    "A fascinating fact about black holes is that"
]

for prompt in sample_prompts:
    inputs = tokenizer(prompt, return_tensors='pt')
    output = model.generate(**inputs, max_length=50)
    generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
    
    # Print only the prompt and generated text without unnecessary messages
    print(f'Prompt: {prompt}\nGenerated: {generated_text}\n')


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Prompt: Once upon a time in a futuristic city, there was a robot who
Generated: Once upon a time in a futuristic city, there was a robot who was able to control a robot. The robot was a robot named "The Robot" who was able to control a robot. The robot was a robot named "The Robot" who



Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Prompt: The key difference between classical physics and quantum mechanics is
Generated: The key difference between classical physics and quantum mechanics is that classical physics is a theory of the world, whereas quantum mechanics is a theory of the world.

The key difference between classical physics and quantum mechanics is that classical physics is a theory of the

Prompt: A fascinating fact about black holes is that
Generated: A fascinating fact about black holes is that they are not the only ones that have been observed to have a black hole.

The most famous black hole is the Large Hadron Collider, which is located in Switzerland. It is the largest particle accelerator



## Visualization of Results
To evaluate GPT-2's output, we compute BLEU and ROUGE scores.

In [3]:
from nltk.translate.bleu_score import sentence_bleu
from rouge_score import rouge_scorer

reference = "Once upon a time in a futuristic city, there was a robot who helped humans."
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)

bleu_score = sentence_bleu([reference.split()], generated_text.split())
scorer = rouge_scorer.RougeScorer(['rouge1', 'rougeL'], use_stemmer=True)
rouge_scores = scorer.score(reference, generated_text)

print(f'BLEU Score: {bleu_score}')
print(f'ROUGE Scores: {rouge_scores}')


BLEU Score: 8.412065649527267e-232
ROUGE Scores: {'rouge1': Score(precision=0.06818181818181818, recall=0.2, fmeasure=0.10169491525423728), 'rougeL': Score(precision=0.045454545454545456, recall=0.13333333333333333, fmeasure=0.06779661016949153)}


The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


## Conclusion & Insights
- GPT-2 generates fluent text but may struggle with long-range coherence.
- Context understanding depends on the prompt structure.
- Further fine-tuning could improve performance in specific domains.
