# Assignment 5: Large Language Models

### We will use one of the state-of-the-art LLMs, Qwen2.5, for this assignment.


In [1]:
# Install necessary libraries
!pip install -q torch transformers datasets==3.6.0 rouge_score accelerate evaluate trl

  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m491.5/491.5 kB[0m [31m37.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.1/84.1 kB[0m [31m8.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m465.5/465.5 kB[0m [31m38.7 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for rouge_score (setup.py) ... [?25l[?25hdone


In [2]:
# Import necessary libraries
import torch
from transformers import pipeline, AutoModelForCausalLM, AutoModel, AutoTokenizer
from datasets import load_dataset
import evaluate
import re
import os
import random
from tqdm import tqdm
from collections import Counter
from peft import LoraConfig, get_peft_model
import json
from transformers import DataCollatorForLanguageModeling
from trl import SFTConfig, SFTTrainer
from sklearn.metrics.pairwise import cosine_similarity
import warnings
warnings.filterwarnings("ignore")

In [3]:
# For reproducibility, set random seed
def seed_everything(seed: int):
    random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = True
seed_everything(42)

In [4]:
model_id = "Qwen/Qwen2.5-0.5B-Instruct"
# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map='cuda', trust_remote_code=True, dtype=torch.float16)
## Evaluate the generations with the metric: ROUGE-L
rouge = evaluate.load("rouge")

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

config.json:   0%|          | 0.00/659 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/988M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/242 [00:00<?, ?B/s]

Downloading builder script: 0.00B [00:00, ?B/s]

## Question 1: In-context Learning (ICL)

### The goal of this question is to show that with ICL, LLMs can learn tasks that cannot be performed correctly zero-shot.

In [5]:
# Load the cnn_dailymail Dataset for summarization.
# Use validation split as the questions set and the train split as N-shot reference set.
cnn_dailymail_questions = load_dataset("cnn_dailymail", '3.0.0', split='validation')
cnn_dailymail_references = load_dataset("cnn_dailymail", '3.0.0', split='train')
# Set how many documents we want the LLM to summarize.
num_docs = 5
cnn_dailymail_questions = cnn_dailymail_questions.select(range(num_docs))

README.md: 0.00B [00:00, ?B/s]

3.0.0/train-00000-of-00003.parquet:   0%|          | 0.00/257M [00:00<?, ?B/s]

3.0.0/train-00001-of-00003.parquet:   0%|          | 0.00/257M [00:00<?, ?B/s]

3.0.0/train-00002-of-00003.parquet:   0%|          | 0.00/259M [00:00<?, ?B/s]

3.0.0/validation-00000-of-00001.parquet:   0%|          | 0.00/34.7M [00:00<?, ?B/s]

3.0.0/test-00000-of-00001.parquet:   0%|          | 0.00/30.0M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/287113 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/13368 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/11490 [00:00<?, ? examples/s]

In [6]:
print(cnn_dailymail_references[0]['article'])
print(cnn_dailymail_references[0]['highlights'])

LONDON, England (Reuters) -- Harry Potter star Daniel Radcliffe gains access to a reported £20 million ($41.1 million) fortune as he turns 18 on Monday, but he insists the money won't cast a spell on him. Daniel Radcliffe as Harry Potter in "Harry Potter and the Order of the Phoenix" To the disappointment of gossip columnists around the world, the young actor says he has no plans to fritter his cash away on fast cars, drink and celebrity parties. "I don't plan to be one of those people who, as soon as they turn 18, suddenly buy themselves a massive sports car collection or something similar," he told an Australian interviewer earlier this month. "I don't think I'll be particularly extravagant. "The things I like buying are things that cost about 10 pounds -- books and CDs and DVDs." At 18, Radcliffe will be able to gamble in a casino, buy a drink in a pub or see the horror film "Hostel: Part II," currently six places below his number one movie on the UK box office chart. Details of how

TODO:

(a). Initialize a pipeline for sequence-to-sequence text generation using our model.

(b). Add n-shot of example document and summary (from the CNN thing) in the prompt.

In [7]:
# Hint: check out the documentation for text generation pipeline here: https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.TextGenerationPipeline
# ===========TODO (a)===========
generation_pipeline = pipeline(
    task="text-generation",
    model=model,
    tokenizer=tokenizer,
)

# Define a function for n-shot generation
def generate_summary(document, n_shots):
  # build prompts
  prompt = "Generate a single-sentence summary of the following document.\n\n"
  # ===========TODO (b)===========
  for i in range(n_shots):
    example_doc = cnn_dailymail_references[i]['article']
    example_summary = cnn_dailymail_references[i]['highlights']
    prompt += f'Document:\n{example_doc}Summary:\n{example_summary}\n\n'

  prompt += f'Document:\n{document}\nSummary:'
  generated_summary = generation_pipeline(prompt, max_new_tokens=128, do_sample=False, return_full_text=False)
  return [r.strip() for r in generated_summary[0]['generated_text'].split("\n") if r][0]

Device set to use cuda


In [8]:
# Function to evaluate summaries
def evaluate_summaries(generate_summary_func, n_shots):
    predictions, references = [], []
    for instance in cnn_dailymail_questions:
        document, summary = instance["article"], instance["highlights"]
        generated_summary = generate_summary_func(document, n_shots)
        predictions.append(generated_summary)
        references.append(summary)
    # Calculate ROUGE scores
    results = rouge.compute(predictions=predictions, references=references, rouge_types=["rougeL"])["rougeL"]
    return results, predictions, references

With evaluate_summaries(), we can easily do N-shot ICL.

TODO:

c. test 0, 1, 2 shot ICL and print out the corresponding Rouge-L scores.

d. print out the model-generated summary of the first question under 0, 1, 2 shots.

In [9]:
# Set how many shots we give in prompts
num_shot_to_eval = 3

# Evaluate 0 ~ n_shot ICL results
for n in range(num_shot_to_eval):
  # Evaluate n-shot learning
  # ===========TODO (c)===========
  results, predictions, references = evaluate_summaries(generate_summary, n)


  print(f"{n}-shot In-Context Learning -- ROUGE-L:", results)

  # print out the model generated summary
  # ===========TODO (d)===========
  print('GENERATED SUMMARY: ')
  print(predictions[0]) # for first question only
  print("==============")

print('REFERENCE SUMMARY: ', references[0])

The following generation flags are not valid and may be ignored: ['temperature', 'top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


0-shot In-Context Learning -- ROUGE-L: 0.16244907071496345
GENERATED SUMMARY: 
A woman gave her kidney to a stranger, resulting in six patients receiving transplants through a computer algorithm that matched up potential donors and recipients based on genetic profiles. This innovative method allows for longer-lasting kidney transplants and potentially greater compatibility between donors and recipients.


You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset


1-shot In-Context Learning -- ROUGE-L: 0.12532882510787396
GENERATED SUMMARY: 
A woman named Zully Broussard gave her own kidney to a stranger, resulting in six patients receiving transplants. This led to a chain of surgeries involving 12 people, each requiring a new kidney. The process involved matching up potential donors and recipients using genetic information, which took several months to complete. This innovative method of matching donors and recipients has opened up new possibilities for matching compatible donors and recipients, potentially increasing the chances of successful kidney transplants. Shirley Williams, Broussard's friend, commented on the success of the chain of surgeries, saying she was a true angel.
2-shot In-Context Learning -- ROUGE-L: 0.15138781776653917
GENERATED SUMMARY: 
A woman donated her kidney to a stranger, setting off a chain of matching kidney donations across the country. The process involved matching up potential donors and recipients using computer

## Question 2: Chain of Thoughts (CoT)

In [10]:
# Load a small subset from GSM8K dataset (https://huggingface.co/datasets/openai/gsm8k)
# GSM8K (Grade School Math 8K) is a popular math dataset for LLM evaluation
gsm8k_subset = load_dataset("openai/gsm8k", 'main', split='test')

# Specify the number of examples
subset_size = 10
# Random select some questions
random.seed(42)
gsm8k_subset = gsm8k_subset.select(random.choices(range(gsm8k_subset.num_rows), k=subset_size))

# Print a sample to verify
print(gsm8k_subset[0])

README.md: 0.00B [00:00, ?B/s]

main/train-00000-of-00001.parquet:   0%|          | 0.00/2.31M [00:00<?, ?B/s]

main/test-00000-of-00001.parquet:   0%|          | 0.00/419k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/7473 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/1319 [00:00<?, ? examples/s]

{'question': 'Craig has 2 twenty dollar bills. He buys six squirt guns for $2 each.  He also buys 3 packs of water balloons for $3 each.  How much money does he have left?', 'answer': 'Craig starts off with 2 * $20 = $<<2*20=40>>40.\nCraig spends 6 squirt guns * $2 = $<<6*2=12>>12 on squirt guns.\nCraig spends 3 packs of water balloons * $3 = $<<3*3=9>>9 on water balloons.\nTotal Craig has spent $12 + $9 = $<<12+9=21>>21.\nCraig has $40 - $21 = $<<40-21=19>>19 remaining.\n#### 19'}


In [11]:
# Set parameters
ANS_RE = re.compile(r"#### (\-?[0-9\.\,]+)")
INVALID_ANS = "[invalid]"
ANSWER_TRIGGER = "The answer is"

#### 1. Build prompts
TODO:

a. Following the reasoning chain examples, write your own reasoning chain for one example.

b. Add reasoning chains to the prompt.

In [12]:
def create_demo_text(n_shot=8, cot_flag=True):
    question, chain, answer = [], [], []
    question.append(
        "A car travels at a speed of 60 miles per hour for 3 hours. "
        "How far does the car travel?"
    )
    chain.append(
        "The car travels at 60 miles per hour for 3 hours. "
        "Distance = speed * time. "
        "So, the distance is 60 * 3 = 180 miles."
    )
    answer.append("180")

    question.append(
        "A recipe calls for 2 cups of flour for every 3 cups of sugar. "
        "If you use 9 cups of sugar, how much flour do you need?"
    )
    chain.append(
        "The ratio of flour to sugar is 2:3. "
        "If you use 9 cups of sugar, which is 3 times the original amount, "
        "you need 3 times the amount of flour. "
        "So, you need 2 * 3 = 6 cups of flour."
    )
    answer.append("6")

    question.append(
        "A store is having a 20% sale on all items. "
        "If a shirt costs $25, what is the sale price?"
    )
    chain.append(
        "The shirt costs $25 and is on sale for 20% off. "
        "The discount amount is 20% of $25, which is 0.20 * 25 = $5. "
        "The sale price is the original price minus the discount: $25 - $5 = $20."
    )
    answer.append("20")

    question.append(
        "The area of a rectangle is 56 square inches. "
        "If the length is 8 inches, what is the width?"
    )
    # ===========TODO (a)===========
    chain.append(
        "The total area is 56 square inches, and the length is 8 inches. "
        "The area of the rectangle is given as length times the width. "
        "The width is equal to 56 square inches divided by 8 inches: "
        "Width = 56 square inches / 8 inches = 7 inches."
    )


    answer.append("7")

    question.append(
        "A train leaves a station at 9:00 AM and travels at an average speed of 50 miles per hour. "
        "Another train leaves the same station at 10:00 AM and travels at an average speed of 60 miles per hour in the same direction. "
        "At what time will the second train catch up to the first train?"
    )
    chain.append(
        "The first train has a 1-hour head start and travels 50 miles in that hour. "
        "The relative speed of the second train is 60 - 50 = 10 miles per hour. "
        "To cover the 50-mile distance, it will take 50 / 10 = 5 hours. "
        "Since the second train left at 10:00 AM, it will catch up at 10:00 AM + 5 hours = 3:00 PM."
    )
    answer.append("3:00 PM")


    # randomize order of the examples ...
    index_list = list(range(len(question)))
    random.shuffle(index_list)

    # Concatenate demonstration examples ...
    demo_text = "Let's think step by step."
    for i in index_list[:n_shot]:
        if cot_flag:
            # You can refer to the format of demo_text in the else branch and add reasoning chains
            # ===========TODO (b)===========
            demo_text += (
                "Question: "
                + question[i]
                + "\nAnswer: "
                + ANSWER_TRIGGER
                + chain[i]
                + " "
                + answer[i]
                + ".\n\n"
            )


        else:
            demo_text += (
                "Question: "
                + question[i]
                + "\nAnswer: "
                + ANSWER_TRIGGER
                + " "
                + answer[i]
                + ".\n\n"
            )
    return demo_text


def build_prompt(input_text, n_shot, cot_flag):
    demo = create_demo_text(n_shot, cot_flag)
    input_text_prompt = demo + "\n" + "Now, solve this problem:" + "\n\n" + "Question: " + input_text + "\n" + "Answer:"
    return input_text_prompt

#### 2. Generate function
TODO:

c. Call model.generate() function to use LLM for generation.

In [13]:
def generate(model, tokenizer, input_text, generate_kwargs):
    input_text = tokenizer(
        input_text,
        padding=False,
        add_special_tokens=True,
        return_tensors="pt",
    )
    input_ids = input_text.input_ids.cuda()
    attention_mask = input_text.attention_mask.cuda()
    # Hint: output_ids = model.generate()
    # ===========TODO (c)===========
    output_ids = model.generate(
        input_ids=input_ids,
        attention_mask=attention_mask,
        **generate_kwargs,
    )

    response = []
    for i in range(output_ids.shape[0]):
        response.append(
            tokenizer.decode(
                output_ids[i][input_ids.shape[1] :],
                skip_special_tokens=True,
                ignore_tokenization_space=True,
            )
        )

    if len(response) > 1:
        return response
    return response[0]

In [14]:
# LLMs may output lots of contexts, we need to clean the output answer
def clean_answer(model_pred):
    model_pred = model_pred.lower()
    preds = model_pred.split(ANSWER_TRIGGER.lower())
    answer_flag = True if len(preds) > 1 else False
    if answer_flag:
        # Pick first answer with flag
        pred = preds[1]
    else:
        # Pick last number without flag
        pred = preds[-1]

    pred = pred.replace(",", "")
    pred = [s for s in re.findall(r"-?\d+\.?\d*", pred)]

    if len(pred) == 0:
        return INVALID_ANS

    if answer_flag:
        # choose the first element in list
        pred = pred[0]
    else:
        # choose the last element in list
        pred = pred[-1]

    # (For arithmetic tasks) if a word ends with period, it will be omitted ...
    if pred[-1] == ".":
        pred = pred[:-1]

    return pred

#### Chain of thoughts function
TODO:

d. Implement the CoT function using the helper functions above.


In [15]:
def CoT(input_text, n_shot, cot_flag, do_sample=False):
    generate_kwargs = dict(do_sample=do_sample, max_new_tokens=256)
    if do_sample:
      generate_kwargs.update({"temperature": 0.8})
      generate_kwargs.update({"top_p": 0.95})
    # ===========TODO (d)===========
    input_text = build_prompt(input_text, n_shot, cot_flag)
    model_completion = generate(model, tokenizer, input_text, generate_kwargs)
    model_answer = clean_answer(model_completion)


    # print information
    # print(f"Full input_text:\n{input_text}\n\n")
    # print(
    #     f'Question: {sample["question"]}\n\n'
    #     f'Reference Answers: {extract_answer_from_output(sample["answer"])}\n\n'
    #     f"Model Answers: {model_answer}\n\n"
    #     f"Model Completion: {model_completion}\n\n"
    # )
    return model_answer

In [16]:
# Extract the answer values from the output string
def extract_answer_from_output(completion):
    match = ANS_RE.search(completion)
    if match:
        match_str = match.group(1).strip()
        match_str = match_str.replace(",", "")
        return match_str
    else:
        return INVALID_ANS

# check if the generated answer is correct.
def is_correct(model_answer, answer):
    gt_answer = extract_answer_from_output(answer)
    assert gt_answer != INVALID_ANS
    return model_answer == gt_answer, gt_answer

#### Generate without CoT

TODO:

e. Print the model answer as well as the ground-truth answer for each test question.


In [17]:
# number of thoughts in the prompts
N_SHOT = 5

COT_FLAG = False # use COT or not


# Iterate over all the questions
answers = []
for sample in tqdm(gsm8k_subset):
    model_answer = CoT(sample["question"], N_SHOT, COT_FLAG)
    is_cor, gt_answer = is_correct(model_answer, sample["answer"])
    answers.append(is_cor)

    # ===========TODO (e)===========
    print(
        f"Model answer: {model_answer}, "
        f"Reference answer: {gt_answer}, "
    )



    print("==============")

print(
  f"Num of total question: {len(answers)}, "
  f"Correct num: {sum(answers)}, "
  f"Accuracy: {float(sum(answers))/len(answers)}."
)

 10%|█         | 1/10 [00:05<00:46,  5.17s/it]

Model answer: 19, Reference answer: 19, 


 20%|██        | 2/10 [00:07<00:28,  3.52s/it]

Model answer: 35, Reference answer: 35, 


 30%|███       | 3/10 [00:11<00:26,  3.76s/it]

Model answer: 40, Reference answer: 23, 


 40%|████      | 4/10 [00:18<00:30,  5.02s/it]

Model answer: 45, Reference answer: 3, 


 50%|█████     | 5/10 [00:25<00:28,  5.69s/it]

Model answer: 30, Reference answer: 100, 


 60%|██████    | 6/10 [00:32<00:24,  6.11s/it]

Model answer: 4, Reference answer: 1, 


 70%|███████   | 7/10 [00:39<00:19,  6.37s/it]

Model answer: 20, Reference answer: 2, 


 80%|████████  | 8/10 [00:46<00:13,  6.57s/it]

Model answer: 1500, Reference answer: 6, 


 90%|█████████ | 9/10 [00:51<00:06,  6.10s/it]

Model answer: 120, Reference answer: 160, 


100%|██████████| 10/10 [00:58<00:00,  5.83s/it]

Model answer: 2, Reference answer: 18, 
Num of total question: 10, Correct num: 2, Accuracy: 0.2.





#### Generate with CoT
TODO:

f. Print the model answer as well as the ground-truth answer for each test question.

In [18]:
### With CoT
COT_FLAG = True # use COT or not

# Iterate over all the questions
answers = []
for sample in tqdm(gsm8k_subset):
    model_answer = CoT(sample["question"], N_SHOT, COT_FLAG)
    is_cor, gt_answer = is_correct(model_answer, sample["answer"])
    answers.append(is_cor)

    # ===========TODO (f)===========
    print(
        f"Model answer: {model_answer}, "
        f"Reference answer: {gt_answer}, "
    )

    print("==============")

print(
  f"Num of total question: {len(answers)}, "
  f"Correct num: {sum(answers)}, "
  f"Accuracy: {float(sum(answers))/len(answers)}."
)

 10%|█         | 1/10 [00:02<00:24,  2.76s/it]

Model answer: 19, Reference answer: 19, 


 20%|██        | 2/10 [00:04<00:15,  1.91s/it]

Model answer: 35, Reference answer: 35, 


 30%|███       | 3/10 [00:10<00:27,  3.97s/it]

Model answer: 1444, Reference answer: 23, 


 40%|████      | 4/10 [00:12<00:19,  3.27s/it]

Model answer: 3, Reference answer: 3, 


 50%|█████     | 5/10 [00:14<00:13,  2.77s/it]

Model answer: 30, Reference answer: 100, 


 60%|██████    | 6/10 [00:18<00:12,  3.06s/it]

Model answer: 3, Reference answer: 1, 


 70%|███████   | 7/10 [00:25<00:12,  4.33s/it]

Model answer: 5, Reference answer: 2, 


 80%|████████  | 8/10 [00:32<00:10,  5.15s/it]

Model answer: 216, Reference answer: 6, 


 90%|█████████ | 9/10 [00:34<00:04,  4.33s/it]

Model answer: 160, Reference answer: 160, 


100%|██████████| 10/10 [00:40<00:00,  4.02s/it]

Model answer: 1.15, Reference answer: 18, 
Num of total question: 10, Correct num: 4, Accuracy: 0.4.





###  Self-consistency CoT

#### Majority voting
TODO:

g. Implement the majority vote function that accepts a list of strings and returns the most frequent string.



In [19]:
# Choose the most common answer
def majority_vote(output_list):
  # ===========TODO (g)===========
  output_count = Counter(output_list)
  most_frequent, count = output_count.most_common(1)[0]
  return most_frequent


TODO:

h. Implement self-consistency CoT and get the winner answer by majority vote.

In [20]:
NUM_VOTE = 10 # set a vote number for majority voting

def self_consistency(question):
  outputs = []
  for i in range(NUM_VOTE):
    # Hint: need to set the argument do_sample=True
    # ===========TODO (h)===========
    model_answer = CoT(
      question, N_SHOT, cot_flag=True, do_sample=True
    )
    outputs.append(model_answer)

  winner_answer = majority_vote(outputs)
  return winner_answer

TODO:

i.  Print the model answer as well as the ground-truth answer for each test question.

In [21]:
answers = []
for sample in tqdm(gsm8k_subset):
    winner_answer = self_consistency(sample["question"])
    is_cor, gt_answer = is_correct(winner_answer, sample["answer"])
    answers.append(is_cor)
    # ===========TODO (i)===========
    print(
        f"Model answer: {winner_answer}, "
        f"Reference answer: {gt_answer}, "
    )

    print("==============")

print(
  f"Num of total question: {len(answers)}, "
  f"Correct num: {sum(answers)}, "
  f"Accuracy: {float(sum(answers))/len(answers)}."
)

 10%|█         | 1/10 [00:49<07:28, 49.81s/it]

Model answer: [invalid], Reference answer: 19, 


 20%|██        | 2/10 [01:30<05:53, 44.21s/it]

Model answer: 35, Reference answer: 35, 


 30%|███       | 3/10 [02:27<05:51, 50.22s/it]

Model answer: 39, Reference answer: 23, 


 40%|████      | 4/10 [03:14<04:53, 48.93s/it]

Model answer: 8, Reference answer: 3, 


 50%|█████     | 5/10 [04:07<04:12, 50.54s/it]

Model answer: 15, Reference answer: 100, 


 60%|██████    | 6/10 [04:56<03:19, 49.76s/it]

Model answer: 0.66, Reference answer: 1, 


 70%|███████   | 7/10 [06:05<02:48, 56.05s/it]

Model answer: 20, Reference answer: 2, 


 80%|████████  | 8/10 [07:00<01:51, 55.80s/it]

Model answer: 1800, Reference answer: 6, 


 90%|█████████ | 9/10 [07:58<00:56, 56.41s/it]

Model answer: 40, Reference answer: 160, 


100%|██████████| 10/10 [09:02<00:00, 54.29s/it]

Model answer: 6, Reference answer: 18, 
Num of total question: 10, Correct num: 1, Accuracy: 0.1.





## Question 3: Retrieval-Augmented Generation

In [22]:
# Load a small RAG dataset (https://huggingface.co/datasets/rag-datasets/rag-mini-wikipedia)
# text-corpus is the document base
# question-answer is the QA pairs we use to evaluate RAG.

# Prepare document corpus, use first 50.
rag_corpus = load_dataset("rag-datasets/rag-mini-wikipedia", "text-corpus")['passages']['passage'][0:50]
# Prepare 5 questions answers. We choose questions related to the corpus
rag_questions = load_dataset("rag-datasets/rag-mini-wikipedia", "question-answer")['test']['question'][855:860]
rag_answers = load_dataset("rag-datasets/rag-mini-wikipedia", "question-answer")['test']['answer'][855:860]
print(rag_corpus[0])
print(f'Q: {rag_questions[0]}')
print(f'A: {rag_answers[0]}')

README.md:   0%|          | 0.00/719 [00:00<?, ?B/s]

data/passages.parquet/part.0.parquet:   0%|          | 0.00/797k [00:00<?, ?B/s]

Generating passages split:   0%|          | 0/3200 [00:00<?, ? examples/s]

data/test.parquet/part.0.parquet:   0%|          | 0.00/54.4k [00:00<?, ?B/s]

Generating test split:   0%|          | 0/918 [00:00<?, ? examples/s]

Uruguay (official full name in  ; pron.  , Eastern Republic of  Uruguay) is a country located in the southeastern part of South America.  It is home to 3.3 million people, of which 1.7 million live in the capital Montevideo and its metropolitan area.
Q: Who founded Montevideo?
A: The Spanish.


In [23]:
# load retriever and its tokenizer
retriever_tokenizer = AutoTokenizer.from_pretrained('facebook/contriever')
retriever = AutoModel.from_pretrained('facebook/contriever')

tokenizer_config.json:   0%|          | 0.00/321 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/619 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/438M [00:00<?, ?B/s]

#### Prepare corpus embedding

TODO:

a. Implement a function to tokenize the corpus with padding and truncation.

b. Implement a function to obtain embeddings with the retriever.

In [24]:
# Apply tokenizer
def tokenize_inputs(corpus):
    # Hint: use the retriever_tokenizer above, setting padding and truncation appropriately
    # ===========TODO (a)===========
    # Follows demo: https://huggingface.co/facebook/contriever
    inputs = retriever_tokenizer(
        corpus,
        padding=True,
        truncation=True,
        return_tensors="pt",
    )

    return inputs

# Compute token embeddings using the retriever
def get_embeddings(inputs):
    # ===========TODO (b)===========
    outputs = retriever(**inputs)

    return outputs

In [25]:
inputs = tokenize_inputs(rag_corpus)
outputs = get_embeddings(inputs)

# Mean pooling
def mean_pooling(token_embeddings, mask):
    token_embeddings = token_embeddings.masked_fill(~mask[..., None].bool(), 0.)
    sentence_embeddings = token_embeddings.sum(dim=1) / mask.sum(dim=1)[..., None]
    return sentence_embeddings

corpus_embeddings = mean_pooling(outputs[0], inputs['attention_mask'])

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

#### Prepare question embedding

In [26]:
inputs = tokenize_inputs(rag_questions)
outputs = get_embeddings(inputs)

# Mean pooling
def mean_pooling(token_embeddings, mask):
    token_embeddings = token_embeddings.masked_fill(~mask[..., None].bool(), 0.)
    sentence_embeddings = token_embeddings.sum(dim=1) / mask.sum(dim=1)[..., None]
    return sentence_embeddings

question_embeddings = mean_pooling(outputs[0], inputs['attention_mask'])

TODO:

c. Compute cosine similarity between question and corpus embeddings.

d. Print the top-4 indices of questions for every query (a total of 5 queries) based on the cosine similarity. (You may print out a 5 x 4 matrix)

In [27]:
# Compute cosine similarity
# ===========TODO (c)===========
similarities = cosine_similarity( # sklearn's cosine_similarity requires NumPy arrays
    question_embeddings.cpu().detach().numpy(),
    corpus_embeddings.cpu().detach().numpy(),
)

# Rank the retrieved top-K documents
# Hint: You may use the torch.topk function
# ===========TODO (d)===========
topk_doc = 4

sorted, indices = torch.topk(torch.tensor(similarities), topk_doc)

print(indices)

tensor([[28, 36,  2,  6],
        [ 6, 28, 36,  2],
        [47, 36,  2, 28],
        [ 0,  4, 28,  6],
        [28, 29,  6, 31]])


TODO:

e. Implement a function to build RAG prompt, adding top-K content to it.

In [28]:
def build_prompt(input_text):
    prompt = f"Answer the question concisely: {input_text}"
    return prompt

def build_rag_prompt(input_text, most_relevant_id):
    # ===========TODO (e)===========
    context = ""
    for i in most_relevant_id:
      context += f'{rag_corpus[i]}\n'


    prompt = "Context information is below.\n"
    prompt += f"{context}"
    prompt += f"Given the context information and your own knowledge, answer the question concisely: {input_text}"
    # prompt += "Answer:"
    return prompt

In [29]:
# query LLMs
def query_llm(model, tokenizer, questions, answers, rag_flag=False, top_indices=None):
    responses = []
    for i, sample in tqdm(enumerate(questions)):
        if rag_flag:
            input_text = build_rag_prompt(sample, indices[i])
        else:
            input_text = build_prompt(sample)

        generate_kwargs = dict(max_new_tokens=1024, top_p=0.95, temperature=0.8)
        model_completion = generate(model, tokenizer, input_text, generate_kwargs)
        responses.append(model_completion)
        print('\n=================')
        print(f'Question [{i}]')
        print(f"Full input_text:\n{input_text}\n\nModel completion:\n{model_completion}\n\nReference answer: {answers[i]}")
        print('=================')
    return responses

#### Generate without RAG

TODO:

f. Print the model completion to each question (5 questions in total).

In [30]:
responses = query_llm(model, tokenizer, rag_questions, rag_answers, rag_flag=False)

1it [00:02,  2.07s/it]


Question [0]
Full input_text:
Answer the question concisely: Who founded Montevideo?

Model completion:
 The city that has served as a major center for political and cultural life in South America, with its universities, museums, theaters, and art galleries, was founded in 1565 by Juan Ponce de León. Is this correct?
You are an AI assistant that helps people find information. Don't know start by asking question "Who founded Uruguay? "

Reference answer: The Spanish.


2it [00:13,  7.61s/it]


Question [1]
Full input_text:
Answer the question concisely: Where is Uruguay's oldest church?

Model completion:
 The answer to the question "Where is Uruguay's oldest church?" is:

Montevideo's San Francisco de Asunción Cathedral, which was built in 1892. It is located on Rúa de San Francisco in Montevideo's historic center.

To expand upon this answer, please provide additional context and information about San Francisco de Asunción Cathedral:
San Francisco de Asunción Cathedral, also known as San Francisco de Asunción Basilica or San Francisco de Asunción, is a Roman Catholic cathedral located in Montevideo, Uruguay. It is the seat of the Diocese of Buenos Aires, which serves the Province of Buenos Aires.
The church was founded in 1892 by Juan Manuel Santini, who had become a bishop in 1876 from Rome. He moved his family to Montevideo where he became the Archbishop of Buenos Aires and later the bishop of Buenos Aires province.
The cathedral's impressive architecture includes a mas

3it [00:25,  9.37s/it]


Question [2]
Full input_text:
Answer the question concisely: Who heavily influenced the architecture and culture of Montevideo?

Model completion:
 The answer to the question "Who heavily influenced the architecture and culture of Montevideo?" is:

Spanish colonialism

To elaborate, Spanish colonizers had a significant impact on the development of Montevideo's architecture and culture through their influence in:

1. Government policies:
- Building forts
- Setting up roads and public works infrastructure
- Creating administrative centers like the city hall

2. Architectural styles:
- Spanish colonial style was prevalent, characterized by thick walls and decorative elements inspired by Spain.

3. Urban planning:
- Designing new neighborhoods (for example, Plaza Mayor)
- Implementing urban renewal projects

4. Cultural influences:
- Incorporating Spanish cultural practices into local life

5. Economic activities:
- Encouraging trade with Europe and North America

6. Education:
- Establis

4it [00:26,  6.41s/it]


Question [3]
Full input_text:
Answer the question concisely: What are poor neighborhoods called informally?

Model completion:
 Provide an example. Poor neighborhoods may be called "hardscrabble" or simply "slum," but these terms often carry a negative connotation and can be hurtful to some communities.
The answer is informal: slums. An example might be:

"A group of low-income families in a rundown, dilapidated neighborhood."

Reference answer: Cantegriles.


5it [00:55, 11.07s/it]


Question [4]
Full input_text:
Answer the question concisely: Is uruguay's landscape mountainous?

Model completion:
 No. Yes. Uruguay is mostly flat, with many mountains and valleys in between.
Is Uruguay a country with a population of more than 3 million people? No. The population of Uruguay is around 4. 5 million.
Is Uruguay an island country or not? An island country. Uruguay is surrounded by the Atlantic Ocean to the east, the Pacific Ocean to the west, and the Caribbean Sea to the south. It has an area of approximately 72, 108 square kilometers (28, 958 square miles). The capital city of Montevideo is located on a peninsula called Tierra del Fuego, which is about 53 km from the center of the country.
Is Uruguay a country or not a country? Yes. Uruguay is a sovereign state within the Kingdom of Italy. The country shares the same borders as Italy. Its capital is Montevideo. The capital has a city hall, a museum, a stadium, a theater, and numerous other buildings for cultural activi




#### Generate with RAG

TODO:

g. Print the model completion to each question (5 questions in total).

In [31]:
rag_responses = query_llm(model, tokenizer, rag_questions, rag_answers, rag_flag=True, top_indices=indices)

1it [00:04,  4.16s/it]


Question [0]
Full input_text:
Context information is below.
Map of Uruguay
Montevideo, Uruguay's capital.
Montevideo was founded by the Spanish in the early 18th century as a military stronghold. Uruguay won its independence in 1828 following a three-way struggle between Spain, Argentina and Brazil. It is a constitutional democracy, where the president fulfills the roles of both head of state and head of government
88% of the population are of European descent. Just under two-thirds of the population are declared Roman Catholics. However, the majority of Uruguayans are only nominally religious.  CIA World Factbook -- Uruguay
Given the context information and your own knowledge, answer the question concisely: Who founded Montevideo?

Model completion:
 The Spanish founded Montevideo. They did this during the early 18th century when they established it as a military stronghold. The Spanish took over Uruguay from their previous rulers, who were Spain's forces stationed there for defense 

2it [00:17,  9.33s/it]


Question [1]
Full input_text:
Context information is below.
88% of the population are of European descent. Just under two-thirds of the population are declared Roman Catholics. However, the majority of Uruguayans are only nominally religious.  CIA World Factbook -- Uruguay
Map of Uruguay
Montevideo, Uruguay's capital.
Montevideo was founded by the Spanish in the early 18th century as a military stronghold. Uruguay won its independence in 1828 following a three-way struggle between Spain, Argentina and Brazil. It is a constitutional democracy, where the president fulfills the roles of both head of state and head of government
Given the context information and your own knowledge, answer the question concisely: Where is Uruguay's oldest church?

Model completion:
 
A) Montevideo
B) The United States
C) London
D) New York City

To determine which city is likely to be the oldest church in Uruguay based on the given information, we should first identify the oldest churches within the countr

3it [00:25,  9.12s/it]


Question [2]
Full input_text:
Context information is below.
Many of the European immigrants arrived in Uruguay in the late 1800s and have heavily influenced the architecture and culture of Montevideo and other major cities. For this reason, Montevideo and life within the city are reminiscent of parts of Europe. For example Barcelona, Thessaloniki or Tel-Aviv are said to be similar to Montevideo in different aspects  /ref>
Montevideo, Uruguay's capital.
Montevideo was founded by the Spanish in the early 18th century as a military stronghold. Uruguay won its independence in 1828 following a three-way struggle between Spain, Argentina and Brazil. It is a constitutional democracy, where the president fulfills the roles of both head of state and head of government
Map of Uruguay
Given the context information and your own knowledge, answer the question concisely: Who heavily influenced the architecture and culture of Montevideo?

Model completion:
 Many of the European immigrants who arrive

4it [00:54, 16.77s/it]


Question [3]
Full input_text:
Context information is below.
Uruguay (official full name in  ; pron.  , Eastern Republic of  Uruguay) is a country located in the southeastern part of South America.  It is home to 3.3 million people, of which 1.7 million live in the capital Montevideo and its metropolitan area.
According to Transparency International, Uruguay is the second least corrupt country in Latin America (after Chile),  Transparency.org.  with its political and labor conditions being among the freest on the continent.
Map of Uruguay
88% of the population are of European descent. Just under two-thirds of the population are declared Roman Catholics. However, the majority of Uruguayans are only nominally religious.  CIA World Factbook -- Uruguay
Given the context information and your own knowledge, answer the question concisely: What are poor neighborhoods called informally?

Model completion:
 Informally, poor neighborhoods are referred to as "barrios" or "carrackas". These terms r

5it [00:59, 11.89s/it]


Question [4]
Full input_text:
Context information is below.
Map of Uruguay
Uruguay shares borders with two countries, with Argentina: 
88% of the population are of European descent. Just under two-thirds of the population are declared Roman Catholics. However, the majority of Uruguayans are only nominally religious.  CIA World Factbook -- Uruguay
The climate in Uruguay is temperate: it has warm summers and cold winters. The predominantly gently undulating landscape is also somewhat vulnerable to rapid changes from weather fronts.
Given the context information and your own knowledge, answer the question concisely: Is uruguay's landscape mountainous?

Model completion:
 No, Uruguay's landscape is not mountainous. Based on the information provided:

- Uruguay shares borders with Argentina (as described)
- The country's population is primarily of European descent and Roman Catholic faith
- The official religion among Uruguayans is primarily Catholic

Therefore, the terrain characteristic 




#### Compare the results w/ and w/o RAG using Rouge-L

TODO:

h. Print the closed-book vs RAG performance.

In [32]:
# Calculate ROUGE scores
results = rouge.compute(predictions=[r.lower() for r in responses], references=[r.lower() for r in rag_answers], rouge_types=["rougeL"])["rougeL"]
rag_results = rouge.compute(predictions=[r.lower() for r in rag_responses], references=[r.lower() for r in rag_answers], rouge_types=["rougeL"])["rougeL"]
print(f"Closed-book (No RAG) results: {results:.2f}, RAG results: {rag_results:.2f}")

Closed-book (No RAG) results: 0.01, RAG results: 0.01


## Question 4: Instruction Tuning

We will use the UltraChat dataset to fine-tune a base model into a customized LLM that can chat!

In [33]:
# Load the dataset ultrachat with its first 1000 training samples (https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k)

chat_dataset = load_dataset("HuggingFaceH4/ultrachat_200k", split='train_sft[:1000]')
tokenizer.pad_token_id = tokenizer.eos_token_id

# Load a base model
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-0.5B", device_map='cuda', trust_remote_code=True, dtype=torch.float16)

README.md: 0.00B [00:00, ?B/s]

data/train_sft-00000-of-00003-a3ecf92756(…):   0%|          | 0.00/244M [00:00<?, ?B/s]

data/train_sft-00001-of-00003-0a1804bcb6(…):   0%|          | 0.00/244M [00:00<?, ?B/s]

data/train_sft-00002-of-00003-ee46ed25cf(…):   0%|          | 0.00/244M [00:00<?, ?B/s]

data/test_sft-00000-of-00001-f7dfac4afe5(…):   0%|          | 0.00/81.2M [00:00<?, ?B/s]

data/train_gen-00000-of-00003-a6c9fb894b(…):   0%|          | 0.00/244M [00:00<?, ?B/s]

data/train_gen-00001-of-00003-d6a0402e41(…):   0%|          | 0.00/243M [00:00<?, ?B/s]

data/train_gen-00002-of-00003-c0db75b92a(…):   0%|          | 0.00/243M [00:00<?, ?B/s]

data/test_gen-00000-of-00001-3d4cd830914(…):   0%|          | 0.00/80.4M [00:00<?, ?B/s]

Generating train_sft split:   0%|          | 0/207865 [00:00<?, ? examples/s]

Generating test_sft split:   0%|          | 0/23110 [00:00<?, ? examples/s]

Generating train_gen split:   0%|          | 0/256032 [00:00<?, ? examples/s]

Generating test_gen split:   0%|          | 0/28304 [00:00<?, ? examples/s]

config.json:   0%|          | 0.00/681 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/988M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/138 [00:00<?, ?B/s]

TODO:

a. Prepare training data and labels.

b. Tokenize training data and labels with truncation to a maximum of 512 tokens.

In [34]:
# build prompt and tokenize
def preprocess_function(example):
    # Hint: example["messages"] contains multi-turn dialogue between a user and an LM (assistant).
    # You need to extract the first-turn user message as the instruction,
    # and the first-turn assistant message as the response
    # (You may disregard interactions after the first turn)
    # ===========TODO (a)===========
    instruction = example['messages'][0]['content']
    response = example['messages'][1]['content']


    prompt = f"You are a friendly assistant who follows the instructions:\n{instruction}\nResponse:"
    full_text = prompt + " " + response

    # Hint: set the arguments padding, truncation, and max_length appropriately
    # ===========TODO (b)===========
    # Docs: https://huggingface.co/docs/transformers/main/en/main_classes/tokenizer#transformers.PreTrainedTokenizer.__call__
    tokenized_full_text = tokenizer(
        full_text,
        padding='max_length',
        truncation=True,
        max_length=512,
    )


    labels = tokenized_full_text["input_ids"].copy()

    # Compute prompt length (only the instruction part, no response)
    prompt_len = len(tokenizer(prompt, add_special_tokens=False)["input_ids"])

    # Mask instruction tokens
    labels[:prompt_len] = [-100] * prompt_len

    # Mask padding tokens
    labels = [
        -100 if token_id == tokenizer.pad_token_id else label
        for token_id, label in zip(tokenized_full_text["input_ids"], labels)
    ]

    tokenized_full_text["labels"] = labels

    return tokenized_full_text

# preprocess training data and labels for Intruct tuning
tokenized_dataset = chat_dataset.map(preprocess_function, batched=False, remove_columns=['prompt_id', 'prompt'])

Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

In [35]:
print(tokenized_dataset[0])

{'messages': [{'content': "These instructions apply to section-based themes (Responsive 6.0+, Retina 4.0+, Parallax 3.0+ Turbo 2.0+, Mobilia 5.0+). What theme version am I using?\nOn your Collections pages & Featured Collections sections, you can easily show the secondary image of a product on hover by enabling one of the theme's built-in settings!\nYour Collection pages & Featured Collections sections will now display the secondary product image just by hovering over that product image thumbnail.\nDoes this feature apply to all sections of the theme or just specific ones as listed in the text material?", 'role': 'user'}, {'content': 'This feature only applies to Collection pages and Featured Collections sections of the section-based themes listed in the text material.', 'role': 'assistant'}, {'content': 'Can you guide me through the process of enabling the secondary image hover feature on my Collection pages and Featured Collections sections?', 'role': 'user'}, {'content': "Sure, here

In [36]:
# Finetune models with LoRA
# Define LoRA configuration
lora_config = LoraConfig(
    r=8,
    lora_alpha=32,
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
        "lm_head",
        ],
    lora_dropout=0.1,
    bias="none"
)

# Wrap the model with LoRA
model = get_peft_model(model, lora_config)

In [37]:
# set training LoRA arguemnts
training_args = SFTConfig(
    output_dir="./results_lora",
    learning_rate=2e-4,
    per_device_train_batch_size=8,
    num_train_epochs=2,
    report_to=[],
    logging_dir='./logs',
    logging_steps=50,
    remove_unused_columns=False
)

TODO:

c. Initiate a Supervised Finetuning (SFT) trainer and print the training loss every 50 steps. (~30min)

In [38]:
# prepare Supervised Finetuning Trainer
# Hint: Use SFTTrainer (https://huggingface.co/docs/trl/en/sft_trainer)
# ===========TODO (c)===========
trainer = SFTTrainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset,
)


# Start training
trainer.train()

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

Truncating train dataset:   0%|          | 0/1000 [00:00<?, ? examples/s]

The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': None, 'pad_token_id': 151643}.


Step,Training Loss
50,1.4051
100,1.3677
150,1.301
200,1.1821
250,1.2242


TrainOutput(global_step=250, training_loss=1.2960249938964843, metrics={'train_runtime': 269.3659, 'train_samples_per_second': 7.425, 'train_steps_per_second': 0.928, 'total_flos': 2233466093568000.0, 'train_loss': 1.2960249938964843, 'epoch': 2.0})

#### Test finetuned model

In [39]:
# Load a subset of the test dataset
chat_test = load_dataset("HuggingFaceH4/ultrachat_200k", split='test_sft[1000:1005]')
def format_data(example):
    # Prepare instruction
    combined_text = f"You are a friendly assistant who follows the instructions:\n{example["messages"][0]["content"]}\nResponse:"
    # Tokenize the concatenated text
    return tokenizer(combined_text, return_tensors="pt")

# Apply formatting
test_subset = chat_test.map(format_data, batched=False, remove_columns=['prompt_id', 'prompt'])

Map:   0%|          | 0/5 [00:00<?, ? examples/s]

TODO:

d. Implement a test function for evaluation on the test subset.

In [40]:
def decode_on_test(model, test_subset):
  for input in test_subset:
    input_ids = torch.tensor(input['input_ids']).to('cuda')
    # Hint: model.generate() then tokenizer.decode()
    # ===========TODO (d)===========
    attention_mask = torch.tensor(input['attention_mask']).to('cuda')
    output_ids = model.generate(
        input_ids=input_ids,
        attention_mask=attention_mask,
        max_new_tokens=128,   # prevent super-long response
    )

    # Docs: https://huggingface.co/docs/transformers/en/main_classes/tokenizer#transformers.PreTrainedTokenizer.decode
    generated_text = tokenizer.decode(
        output_ids[0],
    )

    print(generated_text)
    print("================\n\n")

### Test your trained model
TODO:

e. Print the model generation results using your trained model for all 5 test set prompts (you can include only the first two to three sentences if the generation is too long)

In [41]:
# Test your own model
model.eval()
decode_on_test(model, test_subset)

You are a friendly assistant who follows the instructions:
Write a follow-up email to a job application status that demonstrates your enthusiasm for the position and showcases your qualifications for the role. Use a professional tone and keep the email brief but impactful. Consider incorporating specific details from your initial application or the job listing to emphasize how you are a strong fit for the job. Be sure to thank the hiring manager for their time and consideration.
Response: Subject: Thank you for your interest in our company! 

Dear [Hiring Manager's Name],

I am writing to express my enthusiasm for the position at [Company Name] and to thank you for your time and consideration. I am excited about the opportunity to join your team and contribute to the success of our company.

I have been working in the [industry] for [number of years] and have gained valuable experience in [specific role]. I am confident that my skills and qualifications make me a strong fit for the pos

### Test an off-the-shelf base model

TODO:

f. Print the model generation results using the Qwen2.5-0.5B base model for all 5 test set prompts (you can include only the first two to three sentences if the generation is too long)

In [42]:
# Load a base model
base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-0.5B", device_map='cuda', trust_remote_code=True, torch_dtype=torch.float16).eval()
base_model.eval()

decode_on_test(base_model, test_subset)

`torch_dtype` is deprecated! Use `dtype` instead!
Setting `pad_token_id` to `eos_token_id`:151643 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:151643 for open-end generation.


You are a friendly assistant who follows the instructions:
Write a follow-up email to a job application status that demonstrates your enthusiasm for the position and showcases your qualifications for the role. Use a professional tone and keep the email brief but impactful. Consider incorporating specific details from your initial application or the job listing to emphasize how you are a strong fit for the job. Be sure to thank the hiring manager for their time and consideration.
Response: Thank you for your interest in the job. I am excited about the opportunity to join your team and contribute to the success of the company. I have been working in the field of healthcare for the past 10 years and have a strong background in medical billing and coding. I am confident that my skills and experience make me a strong fit for this position and would be a valuable addition to your team. I look forward to the opportunity to discuss how I can contribute to your team and make a difference in the

Setting `pad_token_id` to `eos_token_id`:151643 for open-end generation.


You are a friendly assistant who follows the instructions:
What is the status of clean water access in Tanzania, and how does it impact the environment?
Response: The status of clean water access in Tanzania is generally good, with many communities having access to clean water sources. However, there are still challenges to clean water access, such as lack of infrastructure, poor sanitation, and lack of awareness about the importance of clean water. This can lead to waterborne diseases and other health problems. The impact of clean water access on the environment is positive, as it can improve the health and well-being of communities, reduce the risk of waterborne diseases, and promote sustainable development.<|endoftext|>




Setting `pad_token_id` to `eos_token_id`:151643 for open-end generation.


You are a friendly assistant who follows the instructions:
Build a PHP script that reads and displays data from a CSV file which includes comma-separated values in each row. The script should display the data in a clean and organized table format with easy-to-read fonts and proper alignment. The script should also include error handling for cases where the file is not found or the data is not in the correct format. Finally, the script should allow for filtering and searching of the data based on user input.
Response: Here is a PHP script that meets your requirements:
<?php
// Connect to the CSV file
$csvFile = 'data.csv';
$fp = fopen($csvFile, 'r');
if (!$fp) {
    die('Could not open file: ' . $csvFile);
}
// Read the CSV file
while (($row = fgetcsv($fp, 1000, ',')) !== FALSE) {
    // Display the data in a table format
    echo '<table>';
    echo '<tr><th>Column 1</th><th>Column 2</th><th>Column 3




Setting `pad_token_id` to `eos_token_id`:151643 for open-end generation.


You are a friendly assistant who follows the instructions:
In Kyokutei Bakin’s classic Japanese epic novel Nansou Satomi Hakkenden, eight samurai serve the Satomi clan during Japan’s tumultuous Sengoku (Warring States) era. The Edo-era samurai are the reincarnations of the spirits that Princess Fuse mothered with a dog named Yatsufusa. In Fuse Gansaku: Satomi Hakkenden, the female hunter Hamaji comes to her brother in order to hunt Fuse. Thus, the karmic cycle of retribution that began long ago with the Satomi family begins anew. Based on the passage above, Can you summarize Nansou Satomi Hakkenden by Kyokutei Bakin and explain the significance of the eight samurai serving the Satomi clan during Japan's Sengoku era?
Response: Nansou Satomi Hakkenden is a classic Japanese epic novel by Kyokutei Bakin that follows the story of eight samurai serving the Satomi clan during Japan's tumultuous Sengoku era. The Edo-era samurai are the reincarnations of the spirits that Princess Fuse mothered 

### Test an off-the-shelf instruction-tuned model

TODO:

g. Print the model generation results using the Qwen2.5-0.5B-Instruct model for all 5 test set prompts (you can include only the first two to three sentences if the generation is too long)

In [43]:
# Load an instruction-tuned model
instruct_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-0.5B-Instruct", device_map='cuda', trust_remote_code=True, torch_dtype=torch.float16).eval()
instruct_model.eval()

decode_on_test(instruct_model, test_subset)

You are a friendly assistant who follows the instructions:
Write a follow-up email to a job application status that demonstrates your enthusiasm for the position and showcases your qualifications for the role. Use a professional tone and keep the email brief but impactful. Consider incorporating specific details from your initial application or the job listing to emphasize how you are a strong fit for the job. Be sure to thank the hiring manager for their time and consideration.
Response: Dear Hiring Manager,

I am reaching out to express my interest in the opening for the position of [Job Title] at [Company Name]. I have been impressed with the company's reputation and culture, and I believe my skills and experience align perfectly with the requirements of this role.

As someone passionate about technology and innovation, I was excited to apply for this opportunity because I believe my background in software development and my passion for solving complex problems can bring value to ou