# Assignment 1: Large Language Models for Text Generation
### CS 410/510 Large Language Models Fall 2024
#### Greg Witt

### Q1. Describe three differences between Llama 3.2 models and Phi-3.5 model.


### Q2. Generate a story of 200 words that starts with the words *“Once upon a time”* using each of these models.  
**You should have 3 outputs in total.**

### Install Required Packages

In [None]:

# pip install transformers
# pip install torch

### Llama 3.2 - 1B:

[Hugging Face Model Card](https://huggingface.co/meta-llama/Llama-3.2-1B)

**Download Llama-3.2 1B**

In [1]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

llama_32_1B = "meta-llama/Llama-3.2-1B"

# creates a Tokenizer specifically for the llama_model requested
tokenizer = AutoTokenizer.from_pretrained(llama_32_1B)

# Set the padding token ID to be the same as the EOS token ID
tokenizer.pad_token_id = tokenizer.eos_token_id

model = AutoModelForCausalLM.from_pretrained(llama_32_1B, torch_dtype=torch.float16)

model = model.to('cpu')


**Generate A Story with Llama 3.2 1B**

In [6]:

# Our Story Prompt
story_prompt = "Once upon a time"
    
# Encode the prompt into token IDs
prompt_ids = tokenizer.encode(story_prompt, return_tensors="pt")

# Create an attention mask
attention_mask = prompt_ids.ne(tokenizer.pad_token_id)

# Generate a response from llama_3.2-1B
outputs = model.generate(prompt_ids,
                         attention_mask=attention_mask,
                         max_length=200,
                         do_sample=True,
                         num_return_sequences=1,
                         pad_token_id=tokenizer.eos_token_id,
                         temperature=0.93,
                         top_k=30,
                         top_p=0.90,
                         repetition_penalty=1.2
                        )

# Decode the generated response
generated_tokens = outputs[0]

generated_story = tokenizer.decode(generated_tokens, skip_special_tokens=True)
print(generated_story)

Once upon a time, the world was ruled by an old man who lived on Mount Olympus. One day he discovered that there were other people in his land too! He invited them all to come and live at Mount Olympus with him.
He named these new guests: Hercules, Minos, Heracles, Atlas, Tyndareus... These are the ones we know today as the Olympian gods of Greece. They didn't like it much though!
They weren't happy living together under one roof. So they decided to split up again. Zeus went off alone into the woods to find himself some peace from this strange life in the middle of everybody else's houses!
Hercules stayed behind for awhile when he felt lonely. But then, after thinking about how peaceful being here would be compared to having so many people around always arguing amongst themselves...
Then finally came along Odysseus, son of Laertes, King Eurystheus' nephew
Odyssey - Wikipedia https://


**Measure of Perplexity**  

In [14]:
import torch

# Extract the Logits from the Model based on the Inputs
with torch.no_grad():
    outputs = model(prompt_ids)
    logits = outputs.logits

# shift the input_ids to the right to determine the next token for the model to predict
shift_logits = logits[:, :-1, :].contiguous()
shift_labels = prompt_ids[:, 1:].contiguous()

# calculate the log likelihood based on Cross EntropyLoss
loss_fct = torch.nn.CrossEntropyLoss(ignore_index=tokenizer.pad_token_id)
# determine the loss value to exponentiate
loss = loss_fct(shift_logits.view(-1, shift_logits.size(-1)), shift_labels.view(-1))

# exponentiate the loss
perplexity = torch.exp(loss)

# return the results
print(f"Perplexity for Llama 3.2 1B Model: {round(perplexity.item(),2)}")

Perplexity for Llama 3.2 1B Model: 33.89


**Measure Token Type Ratio**

In [15]:
prompt_text = "Once upon a time"

tokens = prompt_text.split()

types = set(tokens)
ttr = len(types) / len(tokens)

print("Type-Token Ratio (TTR) for Llama 3.2 1B Model:", round(ttr,2))

Type-Token Ratio (TTR) for Llama 3.2 1B Model: 1.0


### Llama 3.2 - 3B:

[Hugging Face Model Card](https://huggingface.co/meta-llama/Llama-3.2-3B)

**Download Llama 3.2 - 3B**

In [24]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

llama_32_3B = "meta-llama/Llama-3.2-3B"

tokenizer = AutoTokenizer.from_pretrained(llama_32_3B)
model = AutoModelForCausalLM.from_pretrained(llama_32_3B)

# creates a Tokenizer specifically for the llama_model requested
tokenizer = AutoTokenizer.from_pretrained(llama_32_3B)

# Set the padding token ID to be the same as the EOS token ID
tokenizer.pad_token_id = tokenizer.eos_token_id

model = AutoModelForCausalLM.from_pretrained(llama_32_3B, torch_dtype=torch.float32)

model = model.to('cpu')

Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:26<00:00, 13.01s/it]
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:15<00:00,  7.66s/it]


**Generate A Story with Llama 3.2 3B**

In [50]:
# Your special prompt
story_prompt = "Once upon a time "
    
# Encode the prompt into token IDs
prompt_ids = tokenizer.encode(story_prompt, return_tensors="pt")

# Create an attention mask
attention_mask = prompt_ids.ne(tokenizer.pad_token_id)

# Generate a response from llama_3.2-3B
outputs = model.generate(prompt_ids,
                        attention_mask=attention_mask,
                        max_length=200,
                        do_sample=True,
                        num_return_sequences=1,
                        pad_token_id=tokenizer.eos_token_id,
                        temperature=0.83,
                        top_k=30,
                        top_p=0.90,
                        repetition_penalty=1.2
                    )

# Decode the generated response
generated_tokens = outputs[0]

generated_story = tokenizer.decode(generated_tokens, skip_special_tokens=True)
print(generated_story)


Once upon a time 10 years ago, I bought some beautiful hand woven rugs from an antique store. They were so soft and warm to the touch. My kids loved them! But after living in our home for several months my husband noticed that they had started getting pretty dirty.
We tried washing the carpets with water but soon realized it was not going to work out as we wanted since there are too many of those knots on each rug.
I decided to make carpet cleaners myself because no one has ever done this before. So here is how you can do it!
Mix together all your ingredients except for the hot water (use cold). You may want to use about half or less than half depending on how much dirt/dust you have on these rugs.
Pour into spray bottle. Spray liberally onto stained areas. Make sure to cover everything well.
Wet cloth – This will help clean off excess stain & oil
Rag – Use large rag which covers entire area
Sponge/Soft


**Measure Perplexity**

In [29]:
import torch

# Extract the Logits from the Model based on the Inputs
with torch.no_grad():
    outputs = model(prompt_ids)
    logits = outputs.logits

# shift the input_ids to the right to determine the next token for the model to predict
shift_logits = logits[:, :-1, :].contiguous()
shift_labels = prompt_ids[:, 1:].contiguous()

# calculate the log likelihood based on Cross EntropyLoss
loss_fct = torch.nn.CrossEntropyLoss(ignore_index=phi_tokenizer.pad_token_id)
# determine the loss value to exponentiate
loss = loss_fct(shift_logits.view(-1, shift_logits.size(-1)), shift_labels.view(-1))

# exponentiate the loss
perplexity = torch.exp(loss)

# return the results
print(f"Perplexity for Llama 3.2 3B Model: {round(perplexity.item(),2)}")

Perplexity for Llama 3.2 3B Model: 33.89


**Measure Type-Token Ratio**

In [48]:
prompt_text = "Once upon a time"

tokens = prompt_text.split()

types = set(tokens)
ttr = len(types) / len(tokens)

print("Type-Token Ratio (TTR) for Llama 3.2 3B Model:", round(ttr,2))

Type-Token Ratio (TTR) for Llama 3.2 3B Model: 1.0


### Phi 3.5-Mini-Instruct:

[Hugging Face Model Card](https://huggingface.co/microsoft/Phi-3.5-mini-instruct)

**Download Phi 3.5 - Mini - Instruct Model**

In [34]:
from transformers import AutoTokenizer, AutoModelForCausalLM

phi_35_mini_inst = "microsoft/Phi-3.5-mini-instruct"

phi_tokenizer = AutoTokenizer.from_pretrained(phi_35_mini_inst)
phi_model = AutoModelForCausalLM.from_pretrained(phi_35_mini_inst)


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

**Generate a Story with Phi 3.5 - Mini Instruct**

In [45]:
prompt_text = "Generate a 200 word story that begins with, 'Once upon a time' incorperate a nature theme and make it scary, or mysterious"

# Tokenize the input text
inputs = phi_tokenizer(prompt_text, return_tensors="pt")

# Generate text
outputs = phi_model.generate(inputs.input_ids, 
                             max_length=200,
                             temperature=0.85,
                             )

# Decode the generated text
generated_story = phi_tokenizer.decode(outputs[0], skip_special_tokens=True)

print(generated_story)

Generate a 200 word story that begins with, 'Once upon a time' incorperate a nature theme and make it scary, or mysterious.

Once upon a time, in a dense, mist-shrouded forest, where the trees whispered secrets and the shadows danced with an eerie grace, there existed a haunting mystery. The forest, with its ancient roots and gnarled branches, was said to be home to a spectral entity. Legends spoke of a ghostly figure, cloaked in the remnants of autumn leaves, wandering the twisted paths under the moon's pale gaze.

The villagers living on the forest's edge lived in constant fear, for they knew that the entity's presence brought misfortune and despair. Children were forbidden to play near the woods, and the elders would recount tales of the ghost's sorrowful


**Measure of Perplexity**  

In [46]:

# Extract the Logits from the Model based on the Inputs
with torch.no_grad():
    outputs = phi_model(inputs.input_ids)
    logits = outputs.logits

# shift the input_ids to the right to determine the next token for the model to predict
shift_logits = logits[:, :-1, :].contiguous()
shift_labels = inputs.input_ids[:, 1:].contiguous()

# calculate the log likelihood based on Cross EntropyLoss
loss_fct = torch.nn.CrossEntropyLoss(ignore_index=phi_tokenizer.pad_token_id)
# determine the loss value to exponentiate
loss = loss_fct(shift_logits.view(-1, shift_logits.size(-1)), shift_labels.view(-1))

# exponentiate the loss
perplexity = torch.exp(loss)

# return the results
print(f"Perplexity for Phi-3 Model: {round(perplexity.item(),2)}")




Perplexity for Phi-3 Model: 24.16


**Measure Type Token Ratio**

In [47]:

prompt_text = "Generate a 200 word story that begins with, 'Once upon a time'"

tokens = prompt_text.split()

types = set(tokens)
ttr = len(types) / len(tokens)

print("Type-Token Ratio (TTR) for Phi-3 Model:", round(ttr,2))

Type-Token Ratio (TTR) for Phi-3 Model: 0.92


## Q3
