# HW4: Text Generation and Attention Mechanism Analysis

This homework has two section:
1. **Text Generation:** using multiple models with different parameters like temperature and max tokens.
2. **Understanding Attention Mechanisms:** using BertViz for models of different sizes to analyze how their attention mechanisms differ.


## Part 1: Text Generation with HuggingFace Models
We'll experiment with different models and generation parameters, including temperature and max tokens, to see how they affect the model's responses.


In [52]:
%%capture
!pip install transformers torch
!pip install huggingface_hub


### Load Models and Tokenizers
We'll use a few models for text generation and tweak generation parameters.

In [53]:
from transformers import AutoTokenizer, AutoModelForCausalLM

# Load GPT-2
gpt2_tokenizer = AutoTokenizer.from_pretrained("gpt2")
gpt2_model = AutoModelForCausalLM.from_pretrained("gpt2")


### Generate Text with Different Parameters
We'll use the GPT-2 model and experiment with the following generation parameters:
- `temperature`
- `max_new_tokens`

In [54]:
# Define a function to generate text
def generate_text(model, tokenizer, prompt, temperature=1.0, max_new_tokens=50):
    inputs = tokenizer(prompt, return_tensors="pt")
    output = model.generate(
        **inputs,
        temperature=temperature,
        max_new_tokens=max_new_tokens,
        do_sample=True
    )
    return tokenizer.decode(output[0], skip_special_tokens=True)

# Test the function with GPT-2 and different parameters
prompt = "Fall in Chicago is"
gpt2_output1 = generate_text(gpt2_model, gpt2_tokenizer, prompt, temperature=0.7)
gpt2_output2 = generate_text(gpt2_model, gpt2_tokenizer, prompt, temperature=1.5, max_new_tokens=100)

print("Output with temperature 0.7:")
print(gpt2_output1)
print("Output with temperature 1.5:")
print(gpt2_output2)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Output with temperature 0.7:
Fall in Chicago is a great place to start at your local park. It's amazing how many people choose to stay to enjoy the view of the Lake Michigan from Chicago.

You can find out more about reservations and more information about our tours and activities at www.
Output with temperature 1.5:
Fall in Chicago is the new city in Chicago with the tallest concentration of black residents. Today's white residents will hold two decades lower-rated job spots in cities like New York, Denver & Los Altos but not just in suburbs like San Antonio & Boston. The other suburbs' cities are also increasingly populated; at first it seemed odd. This changed by 2008, when suburban white folks suddenly started moving to inner-city Chicago and its newer suburban slobs became suburban to suburbanite residents and made an easy adjustment to suburban


## Part 2: Attention Mechanism Analysis with BertViz
We'll use BertViz to analyze how the attention mechanisms differ between a smaller model (GPT-2) and a larger model (LLaMA-2-7B).

In [55]:
%%capture
!pip install bertviz

### Load the Models
We'll load GPT-2 and LLaMA-2 for the attention visualization task.

In [56]:
# Load the smaller model (GPT-2)
small_tokenizer = AutoTokenizer.from_pretrained("gpt2")
small_model = AutoModelForCausalLM.from_pretrained("gpt2", output_attentions=True)

### Tokenize Input for Both Models

In [57]:
text = "Fall in Chicago immenses potentials."

# Tokenize for GPT-2
small_inputs = small_tokenizer(text, return_tensors="pt")


### Get the Outputs with Attention

In [58]:
# Get attention from GPT-2
small_outputs = small_model(**small_inputs)

### Visualize Attention using BertViz

In [59]:
from bertviz import head_view

# Assuming 'text' is your input text
text = "Fall in Chicago immenses potentials."

# Load pre-trained GPT-2 tokenizer and model
small_tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
small_model = GPT2LMHeadModel.from_pretrained('gpt2')

# Tokenize the input text
small_inputs = small_tokenizer(text, return_tensors="pt")

# Get attention from GPT-2, set output_attentions to True
small_outputs = small_model(**small_inputs, output_attentions=True)

# Extract attention weights
attention = small_outputs.attentions

# Visualize attention
head_view(attention,
          tokens=small_tokenizer.convert_ids_to_tokens(small_inputs.input_ids[0]),  # Pass tokens as a list
          sentence_b_start=None  # For single sentence input
         )

<IPython.core.display.Javascript object>

## Analysis of Attention Mechanisms
Now that we have visualized the attention heads of both models, let's analyze the differences:

- **GPT-2 (Small model)**: With fewer parameters, GPT-2 has fewer attention heads. These heads tend to focus on a limited context, making the model more likely to attend to recent tokens or single-word relationships.