# HW4: Text Generation and Attention Mechanism Analysis

This homework has two section:
1. **Text Generation:** using multiple models with different parameters like temperature and max tokens.
2. **Understanding Attention Mechanisms:** using BertViz for models of different sizes to analyze how their attention mechanisms differ.


## Part 1: Text Generation with HuggingFace Models
We'll experiment with different models and generation parameters, including temperature and max tokens, to see how they affect the model's responses.


In [None]:
!pip install transformers torch
!pip install huggingface_hub



### Load Models and Tokenizers
We'll use a few models for text generation and tweak generation parameters.

In [10]:
from huggingface_hub import login


In [11]:
from transformers import AutoTokenizer, AutoModelForCausalLM

# Load GPT-2
gpt2_tokenizer = AutoTokenizer.from_pretrained("gpt2")
gpt2_model = AutoModelForCausalLM.from_pretrained("gpt2")


### Generate Text with Different Parameters
We'll use the GPT-2 model and experiment with the following generation parameters:
- `temperature`
- `max_new_tokens`

In [15]:
# Define a function to generate text
def generate_text(model, tokenizer, prompt, temperature=1.0, max_new_tokens=50):
    inputs = tokenizer(prompt, return_tensors="pt")
    output = model.generate(
        **inputs,
        temperature=temperature,
        max_new_tokens=max_new_tokens,
        do_sample=True
    )
    return tokenizer.decode(output[0], skip_special_tokens=True)

# Test the function with GPT-2 and different parameters
prompt = "Fall in Chicago is"
gpt2_output1 = generate_text(gpt2_model, gpt2_tokenizer, prompt, temperature=0.7)
gpt2_output2 = generate_text(gpt2_model, gpt2_tokenizer, prompt, temperature=1.5, max_new_tokens=100)

print("Output with temperature 0.7:")
print(gpt2_output1)
print("Output with temperature 1.5:")
print(gpt2_output2)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Output with temperature 0.7:
Fall in Chicago is the newest chapter in the history of the Chicago Cubs. The Cubs have had two World Series titles in their past two years, but the team recently lost its first playoff game. The Cubs are currently in the NL Wild Card hunt, and while it's
Output with temperature 1.5:
Fall in Chicago is the first installment in her current series I am not saying it because to be honest how I felt is entirely out the window due to getting my body over with (but we were just supposed to be doing business this weekend and did not feel like I was out doing business or anything with our brand and no, none of this has been coming around now at all with the way they handle some of these projects and our new video game franchise seems very excited about me trying to start making my own next one).


Now, we'll generate text using the LLaMA-2 model with different parameters.

## Part 2: Attention Mechanism Analysis with BertViz
We'll use BertViz to analyze how the attention mechanisms differ between a smaller model (GPT-2) and a larger model (LLaMA-2-7B).

In [17]:
!pip install bertviz

Collecting bertviz
  Downloading bertviz-1.4.0-py3-none-any.whl.metadata (19 kB)
Collecting boto3 (from bertviz)
  Downloading boto3-1.35.54-py3-none-any.whl.metadata (6.7 kB)
Collecting botocore<1.36.0,>=1.35.54 (from boto3->bertviz)
  Downloading botocore-1.35.54-py3-none-any.whl.metadata (5.7 kB)
Collecting jmespath<2.0.0,>=0.7.1 (from boto3->bertviz)
  Downloading jmespath-1.0.1-py3-none-any.whl.metadata (7.6 kB)
Collecting s3transfer<0.11.0,>=0.10.0 (from boto3->bertviz)
  Downloading s3transfer-0.10.3-py3-none-any.whl.metadata (1.7 kB)
Downloading bertviz-1.4.0-py3-none-any.whl (157 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m157.6/157.6 kB[0m [31m4.0 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading boto3-1.35.54-py3-none-any.whl (139 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m139.2/139.2 kB[0m [31m9.4 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading botocore-1.35.54-py3-none-any.whl (12.7 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━

### Load the Models
We'll load GPT-2 and LLaMA-2 for the attention visualization task.

In [18]:
# Load the smaller model (GPT-2)
small_tokenizer = AutoTokenizer.from_pretrained("gpt2")
small_model = AutoModelForCausalLM.from_pretrained("gpt2", output_attentions=True)



### Tokenize Input for Both Models

In [20]:
text = "Fall in Chicago immenses potentials."

# Tokenize for GPT-2
small_inputs = small_tokenizer(text, return_tensors="pt")


### Get the Outputs with Attention

In [24]:
# Get attention from GPT-2
small_outputs = small_model(**small_inputs)

### Visualize Attention using BertViz

In [31]:
from bertviz import head_view

# Assuming 'text' is your input text
text = "Fall in Chicago immenses potentials."

# Load pre-trained GPT-2 tokenizer and model
small_tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
small_model = GPT2LMHeadModel.from_pretrained('gpt2')

# Tokenize the input text
small_inputs = small_tokenizer(text, return_tensors="pt")

# Get attention from GPT-2, set output_attentions to True
small_outputs = small_model(**small_inputs, output_attentions=True)

# Extract attention weights
attention = small_outputs.attentions

# Visualize attention
head_view(attention,
          tokens=small_tokenizer.convert_ids_to_tokens(small_inputs.input_ids[0]),  # Pass tokens as a list
          sentence_b_start=None  # For single sentence input
         )



<IPython.core.display.Javascript object>

## Analysis of Attention Mechanisms
Now that we have visualized the attention heads of both models, let's analyze the differences:

- **GPT-2 (Small model)**: With fewer parameters, GPT-2 has fewer attention heads. These heads tend to focus on a limited context, making the model more likely to attend to recent tokens or single-word relationships.