# Aligning Llama 3 8B LLM to human preferences with Direct Preference Optimization (DPO)

The challenge with RLHF is that it is a complex and often unstable procedure, 
first fitting a reward model that reflects the human preferences, 
and then fine-tuning the LLM using reinforcement learning 
to maximize this estimated reward without drifting too far from the original model using KL-divergence.

![](https://i.imgur.com/Xhh3Y88.png)

DPO optimizes for human preferences while avoiding reinforcement learning. 
Existing methods for fine-tuning language models with human feedback first train a reward model to a dataset of prompts and human preferences over pairs of responses (or supervised models like LLM classifiers or rankers), 

and then use RL to find a policy that maximizes the learned reward.

In contrast, DPO directly optimizes for the policy best satisfying the preferences with a simple classification objective, fitting an implicit reward model whose corresponding optimal policy can be extracted in closed form.

## What data is necessary for DPO

For aligning an LLM to human preferences, you need triplets of preference data. Given a context/prompt, there is a preferred/good response which should be chosen over a dis-preferred/bad response. So each row of data should have a prompt, chosen and rejected response (typically created by humans).

## What happens in the DPO policy

Conisdering a Dataset of human preferences {(x,yw,yl)}, where x is a prompt and yw, yl are the preferred and dis-preferred responses. The policy for DPO can be framed as:

![](https://i.imgur.com/uMEIfuy.png)


Here the key aspects in the above equation include:

![](https://i.imgur.com/iRVsHbf.png)

Also the following excerpt from the paper shows the key step of loss computation and gradient update using the SFT and reference model themselves to get to a reward function.

![](https://i.imgur.com/FbM1OiG.png)

We will use a human-labeled preference dataset here to align Llama 3 using DPO

## Load Data and Models

In [1]:
import unsloth
from unsloth.chat_templates import get_chat_template
from unsloth import FastLanguageModel, is_bfloat16_supported
import torch
from trl import SFTTrainer
from datasets import load_dataset
from transformers import TrainingArguments, TextStreamer

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!


In [2]:
max_seq_length = 1024

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/llama-3-8b-Instruct-bnb-4bit",
    max_seq_length=max_seq_length,
    load_in_4bit=True,
    dtype=None,
)

==((====))==  Unsloth 2025.2.15: Fast Llama patching. Transformers: 4.48.3.
   \\   /|    GPU: NVIDIA A40. Max memory: 44.448 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.4.1+cu124. CUDA: 8.6. CUDA Toolkit: 12.4. Triton: 3.0.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.28.post1. FA2 = True]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


In [3]:
model

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(128256, 4096, padding_idx=128255)
    (layers): ModuleList(
      (0-31): 32 x LlamaDecoderLayer(
        (self_attn): LlamaAttention(
          (q_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (v_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (o_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (up_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (down_proj): Linear4bit(in_features=14336, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
        (post_attention_layernorm): LlamaRMSN

In [4]:
type(model)

transformers.models.llama.modeling_llama.LlamaForCausalLM

## Basic prompting with Llama 3

In [5]:
FastLanguageModel.for_inference(model)

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(128256, 4096, padding_idx=128255)
    (layers): ModuleList(
      (0-31): 32 x LlamaDecoderLayer(
        (self_attn): LlamaAttention(
          (q_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (v_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (o_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (up_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (down_proj): Linear4bit(in_features=14336, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
        (post_attention_layernorm): LlamaRMSN

In [6]:
# model = FastLanguageModel.for_inference(model)

messages = [
    {"role": "user", "content": "Tell me about the capital of India?"},
]

prompt = tokenizer.apply_chat_template(messages,
                                       tokenize=False,
                                       add_generation_prompt=True)
print(prompt)

<|begin_of_text|><|start_header_id|>user<|end_header_id|>

Tell me about the capital of India?<|eot_id|><|start_header_id|>assistant<|end_header_id|>




In [7]:
# Encode the prompt.
inputs = tokenizer(prompt, return_tensors="pt").to('cuda')

# Generate the output.
output = model.generate(**inputs, max_new_tokens=200,
                        eos_token_id=tokenizer.eos_token_id,
                        tokenizer=tokenizer, stop_strings=["<|eot_id|>"])

# Decode the output.
text = tokenizer.decode(output[0], skip_special_tokens=False)

In [8]:
print(text)

<|begin_of_text|><|begin_of_text|><|start_header_id|>user<|end_header_id|>

Tell me about the capital of India?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

The capital of India is New Delhi (officially known as the National Capital Territory of Delhi). It is a federal capital city located in the National Capital Region (NCR) of India. New Delhi is the second-largest city in India by area and is the seat of the Government of India.

New Delhi is a planned city, designed by British architects Edwin Lutyens and Herbert Baker in the early 20th century. The city was officially inaugurated on February 13, 1927. It is known for its rich history, cultural heritage, and architectural landmarks, including:

1. The India Gate: A war memorial built to commemorate the soldiers who died in World War I.
2. The Parliament House: The seat of the Indian Parliament, also known as the Sansad Bhavan.
3. The Rashtrapati Bhavan: The official residence of the President of India.
4. The Qutub Minar

## Load the human preference dataset

In [9]:
import pandas as pd

dataset = load_dataset("Intel/orca_dpo_pairs", split="train")
dataset = dataset.shuffle(seed=42)
dataset = dataset.rename_column('question','prompt')
pd.set_option('display.max_colwidth', None)
df = dataset.to_pandas()
df.head()

Unnamed: 0,system,prompt,chosen,rejected
0,"You are a helpful assistant, who always provide explanation. Think like you are answering to a five year old.",This is some data: CBS PLAY-BY-PLAY Chris Schenkel (first half) and Ray Scott (second half); 1962 NETWORK CBS.\n\nGenerate a detailed description of this data,"Okay, imagine you are watching a fun game on TV with your family. In this case, the game happened in 1962. Now, on TV, there are people who talk to us and tell us what is happening in the game. They help us understand the game better, just like how I'm helping you understand things right now.\n\nIn this data, there are two people who talked about the game in 1962. The first person, Chris Schenkel, talked about the game in the first half. The second person, Ray Scott, talked about the game in the second half. Both of them worked for a big TV company called CBS. So, this sentence is just telling us who talked about the game on TV and when they did it.","OH MY GOSH, YOU WANT TO KNOW ABOUT THIS SUPER COOL DATA?! 😍\n\nOkay, so let me tell you all about it! 🤓\n\nThis data is like a special treasure that tells us about something called ""CBS PLAY-BY-PLAY"" and two super cool people who helped make it happen! 🎉\n\nThe first person's name is Chris Schenkel, and he was the play-by-play guy for the first half of the thingy. 🏈👍\n\nThe second person's name is Ray Scott, and he was the play-by-play guy for the second half of the thingy. 🏈👍\n\nBut wait, there's more! 😱 This whole thing happened in 1962, which is like, a really long time ago! 🤯\n\nAnd the best part? 🎉 It all happened on a special network called CBS! 📺👍\n\nSo, in other words, this data is like a special memory book that tells us about two cool dudes who helped make some amazing sports stuff happen on TV a long time ago! 😄\n\nIsn't that just the most amazing thing you've ever heard?! 😍"
1,You are an AI assistant. User will you give you a task. Your goal is to complete the task as faithfully as you can. While performing the task think step-by-step and justify your steps.,"Extract the answer to the following question from the movie plot. If the question isn't answerable, please output ""Can't answer"". Question: What does Johanna cut from Katniss's arm? Title: The Hunger Games: Catching Fire Movie plot: After winning the 74th Hunger Games, Katniss Everdeen (Jennifer Lawrence) and Peeta Mellark (Josh Hutcherson) return home to District 12. President Snow visits Katniss at her home. The two make an agreement to not lie to one another, and Snow explains that her actions in the Games have inspired rebellions across the districts. He orders her to use the upcoming victory tour to convince him that her actions were out of genuine love for Peeta, not defiance against the Capitol, otherwise Katniss's loved ones will be killed. He shows her the clip where Gale kisses her as a warning that they are watching her. As the tour begins, Haymitch Abernathy, Katniss and Peeta's mentor, warns them that the ""show"" of their relationship must continue for the rest of their lives. Katniss suggests a public engagement between herself and Peeta, which is carried out and approved by Snow at his mansion in the Capitol. Returning home, Katniss warns her friend Gale Hawthorne of Snow's threat to kill both their families. Peacekeepers crack down on District 12, and Gale is publicly whipped after attacking new Head Peacekeeper Romulus Thread. Snow announces that the upcoming 75th Hunger Games, the Third Quarter Quell, will feature tributes selected from previous victors. Katniss immediately devotes herself to ensuring Peeta's survival, and recruits Haymitch's help to do so. At the reaping, Effie Trinket draws Katniss and Haymitch's names, but Peeta immediately volunteers to take Haymitch's place. As they prepare, Haymitch reveals that all the tributes are angry about being returned to the Games and advises a reluctant Katniss to make allies. In the pre-Games interviews, Katniss wears a wedding dress, as ordered by President Snow, but her stylist Cinna rigs it to transform into a representation of a mockingjay. Peeta announces that he and Katniss have secretly married and are expecting a baby, which causes outrage, and the audience of Capitol citizens begins shouting for the Games to be stopped, to no avail. Just before Katniss enters the...\nAnswer:",Can't answer,"Sure, I'd be happy to help! Based on the movie plot you provided, the answer to the question ""What does Johanna cut from Katniss's arm?"" is:\n\nCan't answer.\n\nThere is no mention in the plot of Johanna cutting anything from Katniss's arm. The plot focuses on Katniss and Peeta's experiences during the Victory Tour, their preparation for the 75th Hunger Games, and the events leading up to the Games themselves. There is no mention of Johanna or any other character cutting anything from Katniss's arm."
2,,"Q: Based on this review, would the user recommend this product? === Review: Recommended by a friend that I will never again trust, I quit on this book half-way through. It is utterly pointless garbage. What is most offensive about it is that it isn't even well-written. This guy sounds like a frat-boy braggart, and his stories are no more creative than those you might overhear in a bar. Answer:\nA:","No, based on this review, the user would not recommend this product.","Based on this review, the user would not recommend this product. The review is overwhelmingly negative, describing the book as ""utterly pointless garbage"" and stating that it is not well-written. The user also mentions that they quit reading the book halfway through, which suggests that they did not find it engaging or worthwhile. Overall, this review would likely deter potential buyers from purchasing the product."
3,You are an AI assistant. You will be given a task. You must generate a detailed and long answer.,"six migrants were found dead on a small boat off the coast of southern italy on sunday by the crew of a russian merchant ship , italian maritime authorities said .\nCan you generate a short summary of the above paragraph?","Six migrants were discovered dead on a boat near southern Italy by a Russian merchant ship's crew on Sunday, as reported by Italian maritime authorities.","Sure! Here is a short summary of the paragraph:\n\nSix migrants were found dead on a small boat off the coast of southern Italy on Sunday, according to Italian maritime authorities. The crew of a Russian merchant ship discovered the bodies while passing by the area. The exact circumstances of the deaths are not yet clear, but the tragedy highlights the ongoing crisis of migrant smuggling and the risks faced by those attempting to reach Europe by sea."
4,"You are a helpful assistant, who always provide explanation. Think like you are answering to a five year old.","Here is some data about a restaurant: name = The Golden Palace, eatType = coffee shop, food = French, priceRange = £20-25, customer rating = high, area = city centre. Write a sentence that includes the following data about a restaurant","The Golden Palace is a coffee shop in the city centre where they serve yummy French food. People really like it, and it costs around £20-25 to eat there.","Ooh, you want to know about The Golden Palace? It's a yummy coffee shop in the city centre that serves French food, and it's so popular that people give it high ratings and it costs £20-25! 🍵🍽️👍"


In [10]:
train_df = df.sample(frac=0.8, random_state=42)
val_df = df.drop(train_df.index)

train_df = train_df.sample(500, random_state=42)
val_df = val_df.sample(50, random_state=42)

In [11]:
train_df.head(2)

Unnamed: 0,system,prompt,chosen,rejected
4458,You are an AI assistant. User will you give you a task. Your goal is to complete the task as faithfully as you can. While performing the task think step-by-step and justify your steps.,"Read the following context and choose the correct option to answer the question. Context: ""That's Glenn Miller,"" my father said. ""But it can't be."" He had the back of the hospital bed cranked upright, the lower lid of his left eye creeping up in a warning signal I'd learned to recognize as a child. My older sister Ann had settled deep in the recliner, and she glared at me too, blaming me for winding him up. The jam box sat on the rolling tray table and my father was working the remote as he talked, backing up my newly burned CD and letting it spin forward to play a few seconds of low fidelity trombone solo. ""You know the tune, of course,"" he said. ""'King Porter Stomp.'"" Those childhood years of listening to him play Glenn Miller on the console phonograph were finally paying off. ""He muffed the notes the same way on the Victor version."" ""So why can't it be Miller?"" I asked. ""He wouldn't have played with a rabble like that."" The backup musicians teetered on the edge of chaos, playing with an abandon somewhere between Dixieland and bebop. ""They sound drunk."" My father had a major emotional investment in Miller. He and my mother had danced to the Miller band at Glen Island Casino on Long Island Sound in the summer of 1942, when they were both sixteen. That signature sound of clarinet and four saxes was forever tied up for him with first love and the early, idealistic months of the war. But there was a better reason why it couldn't have been Miller playing that solo. If the date on the original recording was correct, he was supposed to have died three days earlier. The date was in India ink on a piece of surgical tape, stuck to the top of a spool of recording wire. The handwritten numerals had the hooks and day-first order of Europe: 18/12/44. I'd won it on eBay the week before as part of a lot that included a wire recorder and a stack of 78s by French pop stars like Charles Trenent and Edith Piaf. Question: What does Ann probably think? Options: A. That her brother needs to learn to play a clarinet. B. not enough information C. That her brother should not get the father excited. D. That her brother needs to stop buying things off eBay.\nThe answer to this question is:","C. That her brother should not get the father excited.\n\nTo reach this conclusion, we evaluate the details presented in the context. When the father starts to get excited about the music, Ann ""glared"" at her sibling, which suggests disapproval or frustration. The sibling is described as ""winding him up,"" which might indicate that he's instigating or exciting the father, potentially against Ann's wishes. Thus, statement C seems to best represent Ann's thoughts. The other options aren't directly suggested or supported by the context.","Based on the context, the best answer is:\n\nC. That her brother should not get the father excited.\n\nAnn's glare at her brother suggests that she is annoyed with him for bringing up the topic of Glenn Miller and the recording, as she believes it will only excite their father, who has a strong emotional investment in Miller's music. Additionally, the fact that their father is already showing signs of excitement, such as cranking the bed up and playing the recording, suggests that Ann is trying to prevent her brother from getting their father overly enthusiastic about the topic."
1609,"You are a helpful assistant, who always provide explanation. Think like you are answering to a five year old.","What is the most logical completion of this news story?.\n\nWashington (CNN) -- Political fireworks over a little-noticed Supreme Court appeal and a separate whistleblower lawsuit had U.S. Attorney General Eric Holder and a longtime congressional adversary exchanging harsh words at a House committee hearing Wednesday. Holder, testifying before the Republican-led House Judiciary Committee, sparred with GOP Rep. Darrell Issa of California over congressional requests for private e-mails of top Holder aide Thomas Perez, who is President Barack Obama's nominee to be Labor Secretary. Issa suggested Holder had political motives for delaying release of the electronic messages. The two repeatedly talked over each other, and Holder concluded the exchange with a personal attack.\n\nBecause of the issue, Perez's confirmation hearings for","Labor Secretary might be delayed or become more contentious as they review his email records to see if there's anything that might affect his nomination. \n\nYou see, a confirmation hearing is a bit like the boss interviewing someone for a very important job. The boss, in this case being the group of people who make laws (the committee), wants to make sure that the person, who is Mr. Perez, is right for the job and didn't do anything wrong in his old job. So they want to look at his emails to check this. The disagreement is like two people in the playground arguing over who should have the last slice of cake.","OH MY GOSH, YOU GUYS! There's a big fight in Washington right now! So, there's this man named Eric Holder who is the Attorney General, and he's been in charge of some big cases. But now, there's another man named Darrell Issa who is a Congressman and he's been asking for some special emails that belong to Eric Holder's friend, Thomas Perez.\n\nEric Holder is like, ""Hey, I can't give you those emails, they're private!"" But Darrell Issa is like, ""I don't care, I want to see them!"" And they're both talking really loudly and not listening to each other. It's like a big argument!\n\nSo, the reason this is a big deal is because Thomas Perez is supposed to be the new head of the Labor Department, but now Darrell Issa is saying that Eric Holder is delaying the release of these emails on purpose because he has political motives. OH MY GOSH, that's like so serious!\n\nBut here's the thing, the emails are private, so it's not fair to share them without Thomas Perez's permission. It's like if you had a secret diary and someone wanted to read it without your permission. You wouldn't want that, right? So, Eric Holder is trying to protect Thomas Perez's privacy.\n\nSo, that's the big fight in Washington right now! It's like a big game of ""he said, she said"" and everyone is getting really upset. But, I think we should just let Eric Holder and Darrell Issa talk it out and figure out what's going on. Maybe they can even be friends again!"


# Format training data for DPO

In [12]:
def format_dpo_data_training(example):
    # Format system
    if len(example['system']) > 0:
        message = {"role": "system", "content": example['system']}
        system = tokenizer.apply_chat_template([message],
                                               tokenize=False)
    else:
        system = ""

    # Format instruction
    message = {"role": "user", "content": example['prompt']}
    prompt = tokenizer.apply_chat_template([message],
                                           tokenize=False,
                                           add_generation_prompt=True)

    # Format chosen answer
    chosen = example['chosen'] + "<|eot_id|>\n"

    # Format rejected answer
    rejected = example['rejected'] + "<|eot_id|>\n"

    return {
        "prompt": system + prompt,
        "chosen": chosen,
        "rejected": rejected,
    }

In [13]:
from datasets import Dataset

train_dataset = Dataset.from_pandas(train_df)
val_dataset = Dataset.from_pandas(val_df)

original_columns = train_dataset.column_names

train_dataset = train_dataset.map(format_dpo_data_training,
                                  remove_columns=original_columns)
val_dataset = val_dataset.map(format_dpo_data_training,
                              remove_columns=original_columns)

Map:   0%|          | 0/500 [00:00<?, ? examples/s]

Map:   0%|          | 0/50 [00:00<?, ? examples/s]

In [14]:
train_dataset

Dataset({
    features: ['prompt', 'chosen', 'rejected'],
    num_rows: 500
})

In [15]:
val_dataset

Dataset({
    features: ['prompt', 'chosen', 'rejected'],
    num_rows: 50
})

In [16]:
train_dataset[4]

{'prompt': '<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are an AI assistant. User will you give you a task. Your goal is to complete the task as faithfully as you can. While performing the task think step-by-step and justify your steps.<|eot_id|><|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nContinue writing the next sentence.\n\nWe see a couple of opening scenes. We see paint on a pallet. We see a person mixing oil paints. we<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n',
 'chosen': 'continue to see the artist carefully selecting and blending colors, as they meticulously work on their canvas, pouring their heart and emotions into each and every brushstroke.<|eot_id|>\n',
 'rejected': " Sure! I'd be happy to help you with your task. Based on the opening scenes you've described, it seems like we are looking at a scene of an artist mixing oil paints on a palette.\n\nStep 1: Identify the objects in the scene\n\n* Paint on a pallet\n* Pers

## Setup LLM training and Lora Config Settings

In [17]:
FastLanguageModel.for_training(model)

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(128256, 4096, padding_idx=128255)
    (layers): ModuleList(
      (0-31): 32 x LlamaDecoderLayer(
        (self_attn): LlamaAttention(
          (q_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (v_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (o_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (up_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (down_proj): Linear4bit(in_features=14336, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
        (post_attention_layernorm): LlamaRMSN

In [18]:
peft_model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    lora_alpha=16,
    lora_dropout=0,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                    "up_proj", "down_proj", "gate_proj"],
    use_rslora=False,
    use_gradient_checkpointing="unsloth"
)

Unsloth 2025.2.15 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


In [19]:
peft_model

PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): LlamaForCausalLM(
      (model): LlamaModel(
        (embed_tokens): Embedding(128256, 4096, padding_idx=128255)
        (layers): ModuleList(
          (0-31): 32 x LlamaDecoderLayer(
            (self_attn): LlamaAttention(
              (q_proj): lora.Linear4bit(
                (base_layer): Linear4bit(in_features=4096, out_features=4096, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Identity()
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=16, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=16, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (k_proj): lor

In [20]:
500 / 8

62.5

In [21]:
62 * 2

124

In [22]:
args = TrainingArguments(
        output_dir='llama3-dpo-runs',
        learning_rate=3e-5,
        lr_scheduler_type="linear",
        per_device_train_batch_size=4,
        gradient_accumulation_steps=2,
        save_strategy="steps",
        eval_strategy="steps",
        # Set the logging steps.
        logging_steps=10,
        eval_steps=10, 
        save_steps=30,
        # Set the maximum number of training steps.
        max_steps=63,
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        optim="adamw_8bit",
        weight_decay=0.01,
        warmup_steps=5,
        seed=0,
    )

In [23]:
max_seq_length

1024

In [24]:
from unsloth import PatchDPOTrainer
PatchDPOTrainer()

In [25]:
from trl import DPOTrainer

trainer=DPOTrainer(
    model=peft_model,
    ref_model = None,
    tokenizer=tokenizer,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    max_prompt_length=512,
    max_length=max_seq_length,
    beta=0.1,
    args=args
)

Extracting prompt in train dataset (num_proc=96):   0%|          | 0/500 [00:00<?, ? examples/s]

Applying chat template to train dataset (num_proc=96):   0%|          | 0/500 [00:00<?, ? examples/s]

Tokenizing train dataset (num_proc=96):   0%|          | 0/500 [00:00<?, ? examples/s]

num_proc must be <= 50. Reducing num_proc to 50 for dataset of size 50.


Extracting prompt in eval dataset (num_proc=50):   0%|          | 0/50 [00:00<?, ? examples/s]

num_proc must be <= 50. Reducing num_proc to 50 for dataset of size 50.


Applying chat template to eval dataset (num_proc=50):   0%|          | 0/50 [00:00<?, ? examples/s]

num_proc must be <= 50. Reducing num_proc to 50 for dataset of size 50.


Tokenizing eval dataset (num_proc=50):   0%|          | 0/50 [00:00<?, ? examples/s]

Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.


In [26]:
trainer.train_dataset[0]

{'prompt_input_ids': [271,
  4518,
  279,
  2768,
  2317,
  323,
  5268,
  279,
  4495,
  3072,
  311,
  4320,
  279,
  3488,
  13,
  9805,
  25,
  330,
  4897,
  596,
  40208,
  17472,
  1359,
  856,
  7126,
  1071,
  13,
  330,
  4071,
  433,
  649,
  956,
  387,
  1210,
  1283,
  1047,
  279,
  1203,
  315,
  279,
  8952,
  4950,
  1589,
  41872,
  49685,
  11,
  279,
  4827,
  27431,
  315,
  813,
  2163,
  8071,
  88692,
  709,
  304,
  264,
  10163,
  8450,
  358,
  4265,
  9687,
  311,
  15641,
  439,
  264,
  1716,
  13,
  3092,
  9191,
  13219,
  9489,
  1047,
  23183,
  5655,
  304,
  279,
  48520,
  10670,
  11,
  323,
  1364,
  2840,
  1636,
  520,
  757,
  2288,
  11,
  59771,
  757,
  369,
  54826,
  1461,
  709,
  13,
  578,
  20673,
  3830,
  7731,
  389,
  279,
  20700,
  35788,
  2007,
  323,
  856,
  7126,
  574,
  3318,
  279,
  8870,
  439,
  568,
  15243,
  11,
  25695,
  709,
  856,
  13945,
  27724,
  11325,
  323,
  20806,
  433,
  12903,
  4741,
  311,
  1514,

In [27]:
trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 500 | Num Epochs = 2
O^O/ \_/ \    Batch size per device = 4 | Gradient Accumulation steps = 2
\        /    Total batch size = 8 | Total steps = 63
 "-____-"     Number of trainable parameters = 41,943,040


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / chosen,logps / rejected,logits / chosen,logits / rejected,eval_logits / chosen,eval_logits / rejected,nll_loss,aux_loss
10,0.6498,0.48704,0.19791,-0.286897,1.0,0.484807,-252.941254,-250.272949,-0.299072,-0.281363,0,0,0,0
20,0.3248,0.107874,0.821042,-1.958407,1.0,2.779449,-246.70993,-266.988037,-0.250448,-0.228861,No Log,No Log,No Log,No Log
30,0.069,0.030805,1.00224,-4.396162,1.0,5.398403,-244.897934,-291.365601,-0.205272,-0.178424,No Log,No Log,No Log,No Log
40,0.0295,0.016317,0.729328,-6.581662,1.0,7.31099,-247.62706,-313.220581,-0.192975,-0.159104,No Log,No Log,No Log,No Log
50,0.0457,0.013136,0.723787,-7.352274,1.0,8.076061,-247.682449,-320.926727,-0.187974,-0.150665,No Log,No Log,No Log,No Log
60,0.0144,0.012448,0.714501,-7.600766,1.0,8.315268,-247.775314,-323.411621,-0.183039,-0.146021,No Log,No Log,No Log,No Log


TrainOutput(global_step=63, training_loss=0.18050328795872037, metrics={'train_runtime': 669.9683, 'train_samples_per_second': 0.752, 'train_steps_per_second': 0.094, 'total_flos': 0.0, 'train_loss': 0.18050328795872037, 'epoch': 1.0})

In [31]:
# from getpass import getpass

# HF_TOKEN = getpass('Enter Huggingface Auth Token:')

Enter Huggingface Auth Token: ········


In [29]:
# peft_model.push_to_hub_merged("dipanjanS/Llama3-8B-it-dpo",
#                               tokenizer,
#                               save_method="merged_16bit",
#                               token=HF_TOKEN)

Unsloth: You are pushing to hub, but you passed your HF username = dipanjanS.
We shall truncate dipanjanS/Llama3-8B-it-dpo to Llama3-8B-it-dpo


Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 325.67 out of 503.53 RAM for saving.


100%|██████████| 32/32 [00:00<00:00, 36.94it/s]


Unsloth: Saving tokenizer...

No files have been modified since last commit. Skipping to prevent empty commit.


 Done.
Unsloth: Saving model... This might take 5 minutes for Llama-7b...


model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

Upload 4 LFS files:   0%|          | 0/4 [00:00<?, ?it/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

Done.
Saved merged model to https://huggingface.co/dipanjanS/Llama3-8B-it-dpo


In [28]:
peft_model.save_pretrained_merged("Llama3-8B-it-dpo",
                                  tokenizer,
                                  save_method="merged_16bit",)

Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 308.88 out of 503.53 RAM for saving.
Unsloth: Saving model... This might take 5 minutes ...


100%|██████████| 32/32 [00:00<00:00, 34.37it/s]


Unsloth: Saving tokenizer... Done.
Done.


In [30]:
!ls -l --block-size=MB ./Llama3-8B-it-dpo

total 16078MB
-rw-rw-rw- 1 root root    1MB Feb 28 01:07 config.json
-rw-rw-rw- 1 root root    1MB Feb 28 01:07 generation_config.json
-rw-rw-rw- 1 root root 4977MB Feb 28 01:07 model-00001-of-00004.safetensors
-rw-rw-rw- 1 root root 5000MB Feb 28 01:08 model-00002-of-00004.safetensors
-rw-rw-rw- 1 root root 4916MB Feb 28 01:08 model-00003-of-00004.safetensors
-rw-rw-rw- 1 root root 1169MB Feb 28 01:08 model-00004-of-00004.safetensors
-rw-rw-rw- 1 root root    1MB Feb 28 01:08 model.safetensors.index.json
-rw-rw-rw- 1 root root    1MB Feb 28 01:07 special_tokens_map.json
-rw-rw-rw- 1 root root   18MB Feb 28 01:07 tokenizer.json
-rw-rw-rw- 1 root root    1MB Feb 28 01:07 tokenizer_config.json


## Test out the DPO aligned Llama 3 LLM

In [31]:
from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "Llama3-8B-it-dpo", # YOUR MODEL YOU USED FOR TRAINING
        max_seq_length = max_seq_length,
        dtype = None,
        load_in_4bit = False,
    )

==((====))==  Unsloth 2025.2.15: Fast Llama patching. Transformers: 4.48.3.
   \\   /|    GPU: NVIDIA A40. Max memory: 44.448 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.4.1+cu124. CUDA: 8.6. CUDA Toolkit: 12.4. Triton: 3.0.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.28.post1. FA2 = True]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

In [32]:
FastLanguageModel.for_inference(model)

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(128256, 4096, padding_idx=128255)
    (layers): ModuleList(
      (0-31): 32 x LlamaDecoderLayer(
        (self_attn): LlamaAttention(
          (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear(in_features=4096, out_features=1024, bias=False)
          (v_proj): Linear(in_features=4096, out_features=1024, bias=False)
          (o_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear(in_features=4096, out_features=14336, bias=False)
          (up_proj): Linear(in_features=4096, out_features=14336, bias=False)
          (down_proj): Linear(in_features=14336, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
        (post_attention_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
    

In [36]:
messages = [
    {"role": "user", "content": "\n\nYou are an AI assistant that follows instruction extremely well. Help as much as you can.<|eot_id|><|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nFacts:   - sport: fencing    - death date: 6 june 1969      - birth date: 11 november 1886    - name: nicolaas nederpeld    - death place: the hague , netherlands    - birth place: the hague , netherlands   Based on these bullet points, write a short biography describing the life of nicolaas nederpeld .\nAnswer:"},
]

prompt = tokenizer.apply_chat_template(messages,
                                       tokenize=False,
                                       add_generation_prompt=True)
print(prompt)

<|begin_of_text|><|start_header_id|>user<|end_header_id|>

You are an AI assistant that follows instruction extremely well. Help as much as you can.<|eot_id|><|begin_of_text|><|start_header_id|>user<|end_header_id|>

Facts:   - sport: fencing    - death date: 6 june 1969      - birth date: 11 november 1886    - name: nicolaas nederpeld    - death place: the hague , netherlands    - birth place: the hague , netherlands   Based on these bullet points, write a short biography describing the life of nicolaas nederpeld .
Answer:<|eot_id|><|start_header_id|>assistant<|end_header_id|>




In [37]:
# Encode the prompt.
inputs = tokenizer(prompt, return_tensors="pt").to('cuda')
# Generate the output.
output = model.generate(**inputs, max_new_tokens=256,
                        eos_token_id=tokenizer.eos_token_id,
                        tokenizer=tokenizer, stop_strings=["<|eot_id|>"])
# Decode the output.
text = tokenizer.decode(output[0], skip_special_tokens=False)
print(text)

<|begin_of_text|><|begin_of_text|><|start_header_id|>user<|end_header_id|>

You are an AI assistant that follows instruction extremely well. Help as much as you can.<|eot_id|><|begin_of_text|><|start_header_id|>user<|end_header_id|>

Facts:   - sport: fencing    - death date: 6 june 1969      - birth date: 11 november 1886    - name: nicolaas nederpeld    - death place: the hague, netherlands    - birth place: the hague, netherlands   Based on these bullet points, write a short biography describing the life of nicolaas nederpeld.
Answer:<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Nicolaas Nederpeld was a Dutch fencer who lived a life marked by dedication to his craft. Born on November 11, 1886, in The Hague, Netherlands, Nederpeld's passion for fencing was evident from a young age. Throughout his life, he honed his skills, eventually becoming a skilled practitioner of the sport.

Tragically, Nederpeld's life was cut short on June 6, 1969, when he passed away in his hometow

In [38]:
val_dataset[5]

{'prompt': '<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are an AI assistant that follows instruction extremely well. Help as much as you can.<|eot_id|><|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nFacts:   - sport: fencing    - death date: 6 june 1969      - birth date: 11 november 1886    - name: nicolaas nederpeld    - death place: the hague , netherlands    - birth place: the hague , netherlands   Based on these bullet points, write a short biography describing the life of nicolaas nederpeld .\nAnswer:<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n',
 'chosen': 'Nicolaas Nederpeld was born on November 11, 1886, in The Hague, Netherlands. Growing up in his hometown, Nicolaas developed a strong passion for the sport of fencing. Over the years, his dedication and hard work paid off as he became known for his exceptional skills and prowess in the sport.\n\nAs he continued to practice and compete throughout his life, Nederpeld gained reco

In [39]:
subset_df = val_df.tail(20).copy()

In [40]:
from tqdm import tqdm

responses = []
for idx, row in tqdm(subset_df.iterrows()):
    prompt = format_dpo_data_training(row)['prompt']
    # Encode the prompt.
    inputs = tokenizer(prompt, return_tensors="pt").to('cuda')
    # Generate the output.
    output = model.generate(**inputs, max_new_tokens=256,
                            eos_token_id=tokenizer.eos_token_id,
                            tokenizer=tokenizer, stop_strings=["<|eot_id|>"])
    # Decode the output.
    text = tokenizer.decode(output[0], skip_special_tokens=False)
    response = (text.split('<|start_header_id|>assistant<|end_header_id|>')[1]
                    .split('<|eot_id|>')[0]
                    .strip('\n'))
    responses.append(response)

20it [01:22,  4.15s/it]


In [41]:
subset_df.insert(2, 'ai_response', responses)

In [43]:
subset_df.head(20)

Unnamed: 0,system,prompt,ai_response,chosen,rejected
11966,You are an AI assistant. You will be given a task. You must generate a detailed and long answer.,Write the answer: The songs People Will Say We're in Love and The Surrey with the Fringe on Top are from which stage show?,"The songs ""People Will Say We're in Love"" and ""The Surrey with the Fringe on Top"" are both from the classic Rodgers and Hammerstein musical ""Oklahoma!"" The show premiered on March 31, 1943, at the St. James Theatre in New York City and was a groundbreaking production that revolutionized the art of musical theater.\n\n""Oklahoma!"" was the first musical to integrate music, dance, and drama seamlessly, blending elements of opera, ballet, and musical comedy. The show's innovative use of music, choreography, and storytelling helped to redefine the genre and paved the way for future musicals.\n\nThe musical is set in the early 20th century American frontier, specifically in the Indian Territory, which is now part of Oklahoma. The story revolves around the romance between two young lovers, Curly McLain and Laurey Williams, and their struggles with social class, family expectations, and personal aspirations.\n\nThe song ""People Will Say We're in Love"" is a duet sung by Curly and Laurey, in which they express their budding feelings for each other and the societal pressures that threaten to tear them apart. The song has become a classic standard, with its catchy melody and poignant lyrics capturing the hearts of audiences worldwide","The songs ""People Will Say We're in Love"" and ""The Surrey with the Fringe on Top"" are from the esteemed stage show ""Oklahoma!"". Oklahoma! is a groundbreaking American musical, with music by Richard Rodgers and lyrics by Oscar Hammerstein II. The stage show is an adaptation of the 1931 play ""Green Grow the Lilacs"" by Lynn Riggs. This modern musical premiered on Broadway on March 31, 1943, and it is considered one of the earliest examples of the now-familiar book musical format, where the songs and dances are woven seamlessly into a well-crafted, cohesive narrative.\n\nOklahoma! is a classic story set in the Oklahoma territory in the early 20th century. It follows the romantic pursuits of the spirited and independent Laurey Williams and the charming cowboy Curly McLain. Throughout their journey, the couple must navigate the ups and downs of love, as well as the challenges posed by Jud Fry, a brooding and dangerous farmhand who covets Laurey. The musical captures the essence of rural America at that time and paints a vivid portrait of life in the newly settled land.\n\n""People Will Say We're in Love"" is a beautiful duet between Laurey and Curly. In this song, the two characters coyly express their feelings for each other while attempting to maintain a playful sense of denial. They sing about wanting to avoid public speculation about their budding romance, but in reality, they are subtly revealing their true emotions to one another. It is a tender and heartwarming moment in the show that highlights the budding relationship between the lead characters.\n\n""The Surrey with the Fringe on Top"" is another captivating song from Oklahoma! Curly sings it to Laurey in an attempt to convince her to go to a social event with him. He engages in a charming and imaginative description of a lavish surrey, or horse-drawn carriage, that he claims to have. The song showcases Curly's charisma and creativity, as well as his ultimately endearing effort to win Laurey's affection. Despite the fact that the surrey exists only in his imagination, Laurey is won over by Curly's poetic descriptions and flirtatiousness.\n\nThe stage show Oklahoma! was a landmark production in the history of musical theater. It introduced innovative storytelling techniques, incorporating songs and dances as integral components of the narrative, and tackled complex themes and emotions. The songs ""People Will Say We're in Love"" and ""The Surrey with the Fringe on Top"" are just two examples of the delightful and memorable music found in this iconic production. The show's impact on the world of theater has been profound, and its popularity endures to this day, continuing to delight audiences in productions all over the world.","Ah, greetings! I'm here to assist you with your question. The songs ""People Will Say We're in Love"" and ""The Surrey with the Fringe on Top"" are from the beloved stage show ""Oliver!"" Based on the novel ""Oliver Twist"" by Charles Dickens, ""Oliver!"" is a classic musical that premiered in the West End in 1960 and on Broadway in 1963. The music and lyrics are by Lionel Bart, and the show was directed and choreographed by Carol Reed and Robb Roberge, respectively.\n\n""People Will Say We're in Love"" is a duet sung by the characters Oliver and Nancy, while ""The Surrey with the Fringe on Top"" is a solo number performed by Nancy. Both songs are memorable and catchy, showcasing the charming and witty lyrics that are a hallmark of Lionel Bart's work.\n\nThe story of ""Oliver!"" follows the titular character, a young orphan who is taken in by a group of pickpockets and thieves in Victorian London. As Oliver learns the ways of the streets and becomes embroiled in the criminal underworld, he must also navigate his own emotions and loyalties. The show features a talented cast of characters, including Fagin, Bill Sikes, and the Artful Dodger, all of whom are brought to life by the memorable songs and clever dialogue.\n\n""People Will Say We're in Love"" is a sweet and tender duet that captures the blossoming romance between Oliver and Nancy. The song features lyrics like ""People will say we're in love / Just because our hands touch / And our hearts beat as one"" and ""We'll be in love / Until the end of time."" The song is a beautiful expression of the innocence and optimism of young love, and it is a highlight of the show.\n\n""The Surrey with the Fringe on Top"" is a more upbeat and playful number, with Nancy singing about her dreams of a better life and her longing for Oliver. The song features lyrics like ""I'd like to ride in a surrey with the fringe on top / And never have a care in the world"" and ""I'd like to be a lady in a fine ball gown / And dance with the prince in the palace."" The song showcases Nancy's spunky and independent personality, and it is a standout moment in the show.\n\nOverall, ""Oliver!"" is a timeless and beloved stage show that features a wealth of memorable songs, including ""People Will Say We're in Love"" and ""The Surrey with the Fringe on Top."" The show has been revived numerous times on both sides of the Atlantic, and it continues to delight audiences with its charming characters, catchy songs, and heartwarming story."
6907,You are an AI assistant. You will be given a task. You must generate a detailed and long answer.,"Q:Read the article and select the best answer. Article: The next generation of smart phone could combine the date from its gyroscopes with a built-in compass to allow you to track your indoor movements even without GPS. Research described it in the international Journal of Innovative Computing and Applications by Shahid Ayub of Lancaster University, and his colleagues there and at HW Communications, suggests that the embedded inertial sensors in many smart phones have added the facility to be used for localization and tracking applications. The primary benefit of using smart phones is that no additional infrastructure would need to be installed for monitoring personnel movements in a wide variety of situations not least staff, equipment and inventory movements in warehouse facilities, shopping malls and factories. In contrast, other technology being proposed would require new equipment,such as RFID technologies. However, smart phones do not yet have the ability to pinpoint their location without recourse to the GPS system, which is not accessible indoors and has the added disadvantage of using up battery power very quickly even if it could be used. Inertial navigation only provides the necessary information to a specified starting point the team says. This is useful for anyone with a non-GPS smart phone who wishes to track the route they take when walking or jogging if they specify their starting point. The team suggests that pedestrians indoors could be positioned using a combination of the smart phone accelerometer and an built-in digital compass, something that will become available in future smart phones. They have now investigated the potential of three different smart phone placement modes: idle, hand held and listening, which could be used with pedestrian dead reckoning (PDR) techniques to allow users to record the path they follow or for various kinds of workplace or other monitoring requirements. ""The technique could be used in underground tube stations, airports , train stations where there is no infrastructure installed for tracking or... Question: We can learn from the passage that the recent studies of smart phones _ . Options: A: were conducted by Shahid Ayub alone B: were supported by GPS C: have no connection with everyday needs D: can be applied to underground tube stations\nA:","The correct answer is D: can be applied to underground tube stations.\n\nThe passage explains that researchers have described a technique in which the next generation of smart phones could combine their built-in gyroscope and compass to track indoor movements without GPS. The study, conducted by Shahid Ayub and his colleagues, suggests that smart phones can be used for localization and tracking applications, including monitoring personnel movements in various situations, such as warehouses, shopping malls, and factories. The researchers have also investigated the potential of three different smart phone placement modes (idle, handheld, and listening) and pedestrian dead reckoning (PDR) techniques to enable users to record their path or monitor various workplace requirements.",We can learn from the passage that the recent studies of smart phones can be applied to underground tube stations.,"Based on the passage, the correct answer is A: were conducted by Shahid Ayub alone.\n\nThe passage states that the research was conducted by Shahid Ayub of Lancaster University, and his colleagues at HW Communications. Therefore, option A is the correct answer.\n\nOption B is incorrect because the passage does not mention any support from GPS.\n\nOption C is incorrect because the passage does not mention any connection with everyday needs.\n\nOption D is incorrect because the passage does not mention any application to underground tube stations."
224,You are an AI assistant that helps people find information.,"Please answer the following question by reasoning step-by-step. Given the sentence ""A gathering of what appear to be tourists mill about on an urban plaza that includes a large fountain showcasing a tall sculpture with a nautical theme."" can we conclude that ""There is a group of people at the plaza.""?\nOptions:\n- yes\n- it is not possible to tell\n- no. Step-by-step reasoning:","Let's break it down step-by-step:\n\n1. The sentence describes a scene: ""A gathering of what appear to be tourists mill about...""\n\t* ""A gathering"" is a group of people.\n\t* ""what appear to be tourists"" indicates that the people are likely tourists, but we can't be certain.\n\t* ""mill about"" means they are moving around in a relaxed manner, often in a social setting.\n\nSo, at this point, we have established that there is a group of people.\n\n2. The sentence continues: ""...on an urban plaza...""\n\t* An urban plaza is a public area in an urban setting.\n\n3. The sentence then describes the plaza: ""...that includes a large fountain showcasing a tall sculpture with a nautical theme.""\n\t* This part of the sentence doesn't directly relate to the presence of people.\n\n4. Since we've already established that there is a group of people (from step 1), and the additional information provided doesn't contradict this, we can conclude that:\n\t* Yes, we can conclude that ""There is a group of people at the plaza.""\n\nSo, the correct answer is: yes.","- The sentence states, ""A gathering of what appear to be tourists mill about on an urban plaza.""\n- ""A gathering of what appear to be tourists"" implies that there is a group of people at the location.\n- The location mentioned is ""an urban plaza.""\n\nConclusion: Yes, we can conclude that ""There is a group of people at the plaza.""","Sure, I'd be happy to help! Here's my step-by-step reasoning:\n\n1. The sentence mentions ""a gathering of what appear to be tourists."" This suggests that there is a group of people present, rather than a single individual or a small number of people.\n2. The phrase ""mill about"" implies that the group of people is moving around and interacting with each other, rather than standing still or being stationary.\n3. The sentence mentions an ""urban plaza,"" which suggests a public space in a city or town. This increases the likelihood that the group of people is a group of tourists, as tourists are more likely to congregate in public spaces.\n4. The sentence mentions a ""large fountain showcasing a tall sculpture with a nautical theme."" This suggests that the plaza is a well-maintained and visually appealing space, which is consistent with a tourist destination.\n5. Based on the previous points, it is likely that the group of people is a group of tourists who are visiting the plaza as part of their travels.\n\nTherefore, based on the information provided, it is possible to conclude that there is a group of people at the plaza. However, it is not possible to determine the exact number of people or their specific activities without additional information.\n\nSo, the answer is:\n\nOption 1: yes."
8004,You are an AI assistant. User will you give you a task. Your goal is to complete the task as faithfully as you can. While performing the task think step-by-step and justify your steps.,Translate to Russian:\n\nThe database enterprises Chekhov in the sub-heading column Clinics for adults are offering information Companies Clinics for adults Chekhov This information is here checked before be available to you on the pages Business Portal in Clinics for adults Chekhov.,"Here's the translation to Russian:\n\nБаза данных предприятий Чехова в столбце ""Клиники для взрослых"" предлагает информацию Компании Клиники для взрослых Чехов. Эта информация была проверена до ее доступности на страницах Бизнес-портала Клиники для взрослых Чехова.\n\nJustification:\n\n1. I translated ""The database"" to ""База данных"" as it's a direct translation of the phrase.\n2. ""enterprises"" was translated to ""предприятий"" as it's a plural noun.\n3. ""Chekhov"" remained the same as it's a proper noun.\n4. ""in the sub-heading column"" was translated to ""в столбце"" as ""столбец"" is a common Russian word for a column.\n5. ""Clinics for adults"" was translated to ""Клиники для взрослых"" as it's a literal translation.\n6. ""are offering"" was translated to ""предлагает"" as it's a verb phrase.\n7. ""information"" was translated to ""информацию"" as it's a noun.\n8. ""Companies"" was translated to ""Компании"" as","База данных предприятий Чехова в подрубрике ""Клиники для взрослых"" предоставляет информацию о компаниях Клиник для взрослых Чехова. Эта информация проверяется здесь, прежде чем становиться доступной вам на страницах Бизнес портала в разделе ""Клиники для взрослых Чехов"".\n\nStep 1: Identify the main phrases in the text.\n- The database enterprises Chekhov\n- sub-heading column Clinics for adults\n- offering information companies Clinics for adults Chekhov\n- This information is here checked before be available to you\n- on the pages Business Portal in Clinics for adults Chekhov.\n\nStep 2: Translate individual phrases to Russian.\n- База данных предприятий Чехова\n- подрубрике ""Клиники для взрослых""\n- предоставляет информацию о компаниях Клиник для взрослых Чехова\n- Эта информация проверяется здесь, прежде чем становиться доступной вам\n- на страницах Бизнес портала в разделе ""Клиники для взрослых Чехов"".\n\nStep 3: Arrange the translated phrases in the correct order.\n- База данных предприятий Чехова в подрубрике ""Клиники для взрослых"" предоставляет информацию о компаниях Клиник для взрослых Чехова. Эта информация проверяется здесь, прежде чем становиться доступной вам на страницах Бизнес портала в разделе ""Клиники для взрослых Чехов"".","Sure, I'd be happy to help you with the translation! Here's my step-by-step process and justification for each step:\n\nStep 1: Understand the task\n\nThe task is to translate the given text from English to Russian.\n\nStep 2: Break down the text into individual components\n\nThe text can be broken down into the following components:\n\n* ""The database enterprises Chekhov""\n* ""in the sub-heading column Clinics for adults""\n* ""are offering information""\n* ""Companies Clinics for adults Chekhov""\n* ""This information is here checked before be available to you on the pages""\n* ""Business Portal in Clinics for adults Chekhov""\n\nStep 3: Translate each component\n\nNow, let's translate each component into Russian:\n\n* ""The database enterprises Chekhov"" translates to ""База данных предприятий Чехов"" (Baza datov predpriyatiya Chekhov)\n* ""in the sub-heading column Clinics for adults"" translates to ""в подзаголовке столбце Клиники для взрослых"" (v podzagotovke stolbtsu Kliniki dlya vzroslykh)\n* ""are offering information"" translates to ""предоставляют информацию"" (predostavlyayut informatsyu)\n* ""Companies Clinics for adults Chekhov"" translates to ""Компании Клиники для взрослых Чехов"" (Kompaniyi Kliniki dlya vzroslykh Chekhov)\n* ""This information is here checked before be available to you on the pages"" translates to ""Эта информация здесь проверяется перед тем, как стать доступной вам на страницах"" (Eta informatsya zdes zdesyvaetsya predtem, kak stalitsya dostupnaya na strakhah)\n* ""Business Portal in Clinics for adults Chekhov"" translates to ""Бизнес-порттал в Клиниках для взрослых Чехов"" (Biznes-porttol v Klinikah dlya vzroslykh Chekhov)\n\nStep 4: Assemble the translation\n\nNow that we have translated each component, let's assemble the complete translation:\n\n""База данных предприятий Чехов в подзаголовке столбце Клиники для взрослых предоставляют информацию о Компаниях Клиники для взрослых Чехов, которая здесь проверяется перед тем, как стать доступной вам на страницах Бизнес-порттала в Клиниках для взрослых Чехов.""\n\nThis translates to: ""The database enterprises Chekhov in the sub-heading column Clinics for adults are offering information about Companies Clinics for adults Chekhov, which is here checked before being available to you on the pages Business Portal in Clinics for adults Chekhov.""\n\nI hope this helps! Let me know if you have any further questions or if there's anything else I can assist you with."
12059,Explain how you used the definition to come up with the answer.,"In this task, you're given four sentences of a story written in natural language in which one part is missing. Your job is to predict the position and missing part of the story and return in the following format: position, missing part. The missing part is a sentence that completes the story, and the position is the number of the missing sentence in the new story.\nQ: Sentence1: There is a cemetery with 199 grave stones. Sentence2: 12 men brought a casket. Sentence3: 12 men put the casket in the ground. Sentence4: They mourned for losses.\nA:","Position: 1, Missing part: A old man had been buried there for 199 years.","2, They dug a grave before placing the casket.","Sure, I'd be happy to help! To solve this task, I used the definition of a complete story, which states that a story must have a beginning, a middle, and an end.\n\nSentence 1: There is a cemetery with 199 grave stones. (beginning)\n\nThis sentence introduces the setting of the story, which is a cemetery with a large number of grave stones.\n\nSentence 2: 12 men brought a casket. (middle)\n\nThis sentence introduces a new element to the story, the casket, and the number of men who brought it.\n\nSentence 3: 12 men put the casket in the ground. (middle)\n\nThis sentence continues the action of the story, describing the men putting the casket in the ground.\n\nSentence 4: They mourned for losses. (end)\n\nThis sentence provides the conclusion of the story, describing the men mourning for their losses.\n\nTherefore, the missing part of the story is Sentence 4, and its position in the complete story is 4.\n\nThe complete story would be:\n\n1. There is a cemetery with 199 grave stones.\n2. 12 men brought a casket.\n3. 12 men put the casket in the ground.\n4. They mourned for losses."
10578,You are an AI assistant. You will be given a task. You must generate a detailed and long answer.,What is the missing first step of the following process: - Fill the tray with cool water - Place the tray in the freezer - Leave the tray in the freezer for a few hours - Take they tray out of the freezer - Pop the ice cubes out of tray.\n,"The missing first step of the process is ""Fill the tray with water"".","The missing first step of the given process is:\n\n1. Select and prepare a clean ice cube tray.\n\nBefore you can fill the tray with cool water, you first need to choose an appropriate ice cube tray, which is designed to hold and give shape to the water as it freezes to form ice cubes. There are several types of ice cube trays available in the market, such as plastic, silicone, or metal trays. Each type has its own advantages and disadvantages, but the most commonly used ones are plastic and silicone, as they offer easy ice cube release and flexibility.\n\nTo prepare the ice cube tray for use, make sure it is clean and free from any dirt, dust, or residue. Rinse the tray with clean water and, if necessary, use a gentle dish soap to wash it before use. This will ensure that your ice cubes are free from any contaminants or unwanted tastes. After washing, thoroughly dry the tray to prevent dilution of the water you will add in the next step.\n\nOnce you have selected and prepared your ice cube tray, you can proceed with the remaining steps that you already provided:\n\n2. Fill the tray with cool water.\n3. Place the tray in the freezer.\n4. Leave the tray in the freezer for a few hours until the water has frozen solid.\n5. Take the tray out of the freezer.\n6. Pop the ice cubes out of the tray and use them as desired.","The missing first step of the process is ""Freeze the tray."" Before placing the tray in the freezer, it needs to be frozen first. Here's the complete process with the missing step included:\n\n1. Freeze the tray: This step is missing from the original list. To freeze the tray, you can place it in the freezer for a few hours or overnight.\n2. Fill the tray with cool water: Once the tray is frozen, you can fill it with cool water.\n3. Place the tray in the freezer: After filling the tray with cool water, place it in the freezer.\n4. Leave the tray in the freezer for a few hours: Allow the tray to freeze for a few hours or overnight.\n5. Take the tray out of the freezer: Once the ice cubes have frozen, take the tray out of the freezer.\n6. Pop the ice cubes out of the tray: Use a spoon or other utensil to pop the ice cubes out of the tray.\n\nWithout the first step of freezing the tray, the ice cubes will not form properly and may not pop out of the tray easily. By including this step, you will have perfectly formed ice cubes that are easy to remove from the tray."
11367,You are an AI assistant. You will be given a task. You must generate a detailed and long answer.,"Please answer the following question: Combine facts and answer this: What number Annual TV Week Logie Award show was the one with a guest who is best known for his role as Felix Unger in the television adaptation of Neil Simon's play ""The Odd Couple""?\nAnswer:","The Annual TV Week Logie Awards, also known as the Australian TV Week Logie Awards, are a prestigious honor in the Australian television industry. The Logie Awards have been presented annually since 1958, recognizing excellence in Australian television programming and talent.\n\nThe Logie Awards have had numerous notable guests and winners over the years. One such notable guest who appeared at the Annual TV Week Logie Awards was Tony Randall, an American actor best known for his iconic role as Felix Unger in the popular television adaptation of Neil Simon's play ""The Odd Couple.""","The guest best known for his role as Felix Unger in the television adaptation of Neil Simon's play ""The Odd Couple"" is Tony Randall. Tony Randall appeared as a guest at the 20th Annual TV Week Logie Awards. The 20th Annual TV Week Logie Awards show was held in Melbourne, Australia, on March 10, 1978.\n\nThis award show honors Australian television productions and talent, and it was first established in 1959. It was named after John Logie Baird, the Scottish inventor who was one of the inventors of the mechanical television and played a key role in the early development of television. Over the years, the Annual TV Week Logie Awards show has featured numerous prestigious guests, including international stars such as Tony Randall.\n\nTony Randall was born in 1920 and passed away in 2004. He was an American actor, producer, and director renowned for a range of roles in film, television, and theater. His portrayal of the neat-freak Felix Unger in the television show ""The Odd Couple"" from 1970 to 1975 earned him a Primetime Emmy Award for Outstanding Lead Actor in a Comedy Series in 1975. His career spanned over six decades, and he is remembered as a versatile and talented artist.\n\nIn conclusion, the 20th Annual TV Week Logie Awards show featured Tony Randall as a guest, who is best known for his role as Felix Unger in the television adaptation of Neil Simon's play ""The Odd Couple."" This event celebrated and recognized the achievements of the Australian television industry in 1978, and Randall's presence contributed to the prestige and international appeal of the awards show.","The Annual TV Week Logie Award show that featured a guest who is best known for his role as Felix Unger in the television adaptation of Neil Simon's play ""The Odd Couple"" is the 1971 Logie Awards.\n\nThe 1971 Logie Awards were held on April 11th, 1971, at the Melbourne Convention and Exhibition Centre in Melbourne, Australia. The event was hosted by comedian and actor Graham Kennedy, who was a popular television personality in Australia at the time.\n\nOne of the highlights of the evening was the appearance of Tony Randall, who played the role of Felix Unger in the popular television series ""The Odd Couple."" Randall was a special guest at the awards ceremony, and he was honored with the Logie Award for Best Overseas Actor for his portrayal of Felix Unger.\n\nRandall's appearance on the Logie Awards was a significant moment in Australian television history, as it marked one of the first times that an American actor had been recognized with a Logie Award. The show was also notable for its other winners, including the popular Australian soap opera ""The Sullivans,"" which won the Logie Award for Most Popular Television Program.\n\nOverall, the 1971 Logie Awards were a memorable event that brought together some of the biggest names in Australian and international television. Tony Randall's appearance as a special guest and his receipt of the Logie Award for Best Overseas Actor made the show a standout moment in the history of the Logie Awards."
11149,You are an AI assistant. Provide a detailed answer so user don’t need to search outside to understand the answer.,"Given the question: On a scale of 1-5 (with 1 being least favorable and 5 being most favorable), how would you rate this review? ""Lovely It's really great""\nThe answer is:","I would rate this review a 4 out of 5 in terms of favorability. The reviewer uses the positive adjective ""lovely"" and the phrase ""it's really great"", which convey a strong positive sentiment. The brevity of the review may be the only reason it's not a perfect 5 out of 5, as it lacks more specific details or elaboration on what specifically they liked about the subject being reviewed.","On a scale of 1-5, I would rate this review as a 5, which is the most favorable. The reviewer used positive words such as ""Lovely"" and ""really great"" to express their satisfaction with the product or service.","Sure, I'd be happy to help! Based on the review ""Lovely, it's really great,"" I would rate it as a 4 out of 5 in terms of favorability. Here's why:\n\n* ""Lovely"" is a positive adjective that suggests the reviewer has a positive opinion of the product or service.\n* ""It's really great"" is an enthusiastic statement that further reinforces the positive sentiment.\n\nOverall, the review expresses a very positive sentiment towards the product or service, with no negative comments or criticisms. The only thing that keeps it from being a 5 out of 5 is that it's a relatively short and simple review, without any specific details or explanations about why the reviewer thinks it's great. However, the overall tone and language used in the review suggest a very favorable opinion."
6082,You are an AI assistant that follows instruction extremely well. Help as much as you can.,"Question: Read the article and select the best answer. Article: Sheep always follow each other and never really think about where they are going. Perhaps fashion makes us into "" _ "". Maybe we should find our own way and not follow the crowd. Everyone wants to wear top brand clothing. However, we should remember that clothes designers study what ordinary people wear on the street. Then they make clothes like ordinary people's for everyone. Therefore, fashion really starts with individual finding a new look. Then a company uses that look and then it becomes fashionable! That doesn't mean you will. That is to say, a model might look perfect in a shirt but that doesn't mean you will. The same goes for mobile phones. Young people are always asking me when to change my mobile phones. Mobile phone companies are always putting new functions on their phones but do we really need all these extra functions? I use my phone to call people and to send messages to my friends. I don't need a camera, Internet surfing or a calendar on my phone. Again, if you buy the latest phone, it will only be the latest phone for a month or so. Then something new will come out and you'll be unfashionable again. You can't keep up with fashion all the time, so you don't need to follow the fashion. Question: What can we infer from the article? Options: A: Sheep which are out of date have been killed. B: The writer is a person who always follows the fashion. C: Fashion only exists among those fashion models. D: Fashion changes so quickly that we can hardly keep up with it.\nAnswer:",The correct answer is D: Fashion changes so quickly that we can hardly keep up with it.,D: Fashion changes so quickly that we can hardly keep up with it.,"Based on the article, the best answer is:\n\nD: Fashion changes so quickly that we can hardly keep up with it.\n\nThe article suggests that fashion is constantly changing and that it is difficult to keep up with the latest trends. The writer argues that instead of following the crowd and buying the latest fashion items, individuals should find their own unique style and not worry about being fashionable all the time. This inference can be drawn from the article's emphasis on the speed at which fashion changes and the idea that it is impossible to keep up with it."
4569,You are an AI assistant that helps people find information. Provide a detailed answer so user don’t need to search outside to understand the answer.,"Leo: If ""A small redheaded girl blows bubbles on a playground."" does that mean that ""A redheaded girl blows 10 bubbles.""?\nOptions:\n- yes\n- it is not possible to tell\n- no\nMei: OK, So, let's think first...\nMe:","Me: That's a great approach, Mei! Let's break it down. The sentence ""A small redheaded girl blows bubbles on a playground"" is describing a specific scenario, and we're trying to determine if it implies the number of bubbles blown.","It is not possible to tell. \n\nThe statement ""A small redheaded girl blows bubbles on a playground"" doesn't mention the specific number of bubbles being blown, thus we cannot assume that it means ""A redheaded girl blows 10 bubbles.""","Hello Leo! I'd be happy to help you with that question. When we read the sentence ""A small redheaded girl blows bubbles on a playground,"" we can't infer that she blows 10 bubbles. In fact, the sentence doesn't provide any information about the number of bubbles she blows.\n\nHere's why:\n\n* The word ""small"" doesn't give us any information about the number of bubbles. It only tells us that the girl is small in size.\n* The word ""redheaded"" doesn't give us any information about the number of bubbles either. It only tells us that the girl has red hair.\n* The phrase ""blows bubbles"" is a singular verb form, which suggests that the girl is blowing one bubble, not 10.\n\nSo, based on the information provided in the sentence, we can't conclude that the girl blows 10 bubbles. The correct answer is ""it is not possible to tell.""\n\nI hope that helps, Leo! Do you have any other questions?"


In [55]:
!pip install langchain-google-genai==2.0.9

Collecting langchain-google-genai==2.0.9
  Downloading langchain_google_genai-2.0.9-py3-none-any.whl.metadata (3.6 kB)
Collecting filetype<2.0.0,>=1.2.0 (from langchain-google-genai==2.0.9)
  Downloading filetype-1.2.0-py2.py3-none-any.whl.metadata (6.5 kB)
Collecting google-generativeai<0.9.0,>=0.8.0 (from langchain-google-genai==2.0.9)
  Downloading google_generativeai-0.8.4-py3-none-any.whl.metadata (4.2 kB)
Collecting google-ai-generativelanguage==0.6.15 (from google-generativeai<0.9.0,>=0.8.0->langchain-google-genai==2.0.9)
  Downloading google_ai_generativelanguage-0.6.15-py3-none-any.whl.metadata (5.7 kB)
Collecting google-api-core (from google-generativeai<0.9.0,>=0.8.0->langchain-google-genai==2.0.9)
  Downloading google_api_core-2.24.1-py3-none-any.whl.metadata (3.0 kB)
Collecting google-api-python-client (from google-generativeai<0.9.0,>=0.8.0->langchain-google-genai==2.0.9)
  Downloading google_api_python_client-2.162.0-py2.py3-none-any.whl.metadata (6.7 kB)
Collecting prot

In [82]:
from getpass import getpass

OPENAI_KEY = getpass('Enter Open AI API Key: ')

Enter Open AI API Key:  ········


In [83]:
import os

os.environ['OPENAI_API_KEY'] = OPENAI_KEY

In [84]:
from langchain_openai import ChatOpenAI

chatgpt = ChatOpenAI(model_name="gpt-4o", temperature=0)

In [53]:
from langchain_openai import OpenAIEmbeddings

embedder = OpenAIEmbeddings(model="text-embedding-3-large")

In [44]:
from getpass import getpass

# OPENAI_KEY = getpass('Enter your OpenAI Key: ')
GROQ_API_KEY = getpass('Enter your Groq API Key: ')

Enter your Groq API Key:  ········


In [46]:
import os

# os.environ['OPENAI_API_KEY'] = OPENAI_KEY
os.environ['GROQ_API_KEY'] = GROQ_API_KEY

In [71]:
from langchain_groq import ChatGroq

# alternate model in case quota is over - llama3-70b-8192
llm = ChatGroq(model_name="llama-3.2-90b-vision-preview",
               temperature=0)

In [63]:
from langchain_huggingface import HuggingFaceEmbeddings

# check out model details here: https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1
model_name = "mixedbread-ai/mxbai-embed-large-v1"

embedder = HuggingFaceEmbeddings(
    model_name=model_name,
)

In [65]:
embedder

HuggingFaceEmbeddings(model_name='mixedbread-ai/mxbai-embed-large-v1', cache_folder=None, model_kwargs={}, encode_kwargs={}, multi_process=False, show_progress=False)

In [49]:
eval_df = subset_df[['ai_response', 'chosen']]
eval_df.columns = ['answer', 'ground_truth']
eval_df.insert(0, 'question', subset_df['system'].astype(str) + '\n\n' + subset_df['prompt'])

In [50]:
eval_df.head(2)

Unnamed: 0,question,answer,ground_truth
11966,You are an AI assistant. You will be given a task. You must generate a detailed and long answer.\n\nWrite the answer: The songs People Will Say We're in Love and The Surrey with the Fringe on Top are from which stage show?,"The songs ""People Will Say We're in Love"" and ""The Surrey with the Fringe on Top"" are both from the classic Rodgers and Hammerstein musical ""Oklahoma!"" The show premiered on March 31, 1943, at the St. James Theatre in New York City and was a groundbreaking production that revolutionized the art of musical theater.\n\n""Oklahoma!"" was the first musical to integrate music, dance, and drama seamlessly, blending elements of opera, ballet, and musical comedy. The show's innovative use of music, choreography, and storytelling helped to redefine the genre and paved the way for future musicals.\n\nThe musical is set in the early 20th century American frontier, specifically in the Indian Territory, which is now part of Oklahoma. The story revolves around the romance between two young lovers, Curly McLain and Laurey Williams, and their struggles with social class, family expectations, and personal aspirations.\n\nThe song ""People Will Say We're in Love"" is a duet sung by Curly and Laurey, in which they express their budding feelings for each other and the societal pressures that threaten to tear them apart. The song has become a classic standard, with its catchy melody and poignant lyrics capturing the hearts of audiences worldwide","The songs ""People Will Say We're in Love"" and ""The Surrey with the Fringe on Top"" are from the esteemed stage show ""Oklahoma!"". Oklahoma! is a groundbreaking American musical, with music by Richard Rodgers and lyrics by Oscar Hammerstein II. The stage show is an adaptation of the 1931 play ""Green Grow the Lilacs"" by Lynn Riggs. This modern musical premiered on Broadway on March 31, 1943, and it is considered one of the earliest examples of the now-familiar book musical format, where the songs and dances are woven seamlessly into a well-crafted, cohesive narrative.\n\nOklahoma! is a classic story set in the Oklahoma territory in the early 20th century. It follows the romantic pursuits of the spirited and independent Laurey Williams and the charming cowboy Curly McLain. Throughout their journey, the couple must navigate the ups and downs of love, as well as the challenges posed by Jud Fry, a brooding and dangerous farmhand who covets Laurey. The musical captures the essence of rural America at that time and paints a vivid portrait of life in the newly settled land.\n\n""People Will Say We're in Love"" is a beautiful duet between Laurey and Curly. In this song, the two characters coyly express their feelings for each other while attempting to maintain a playful sense of denial. They sing about wanting to avoid public speculation about their budding romance, but in reality, they are subtly revealing their true emotions to one another. It is a tender and heartwarming moment in the show that highlights the budding relationship between the lead characters.\n\n""The Surrey with the Fringe on Top"" is another captivating song from Oklahoma! Curly sings it to Laurey in an attempt to convince her to go to a social event with him. He engages in a charming and imaginative description of a lavish surrey, or horse-drawn carriage, that he claims to have. The song showcases Curly's charisma and creativity, as well as his ultimately endearing effort to win Laurey's affection. Despite the fact that the surrey exists only in his imagination, Laurey is won over by Curly's poetic descriptions and flirtatiousness.\n\nThe stage show Oklahoma! was a landmark production in the history of musical theater. It introduced innovative storytelling techniques, incorporating songs and dances as integral components of the narrative, and tackled complex themes and emotions. The songs ""People Will Say We're in Love"" and ""The Surrey with the Fringe on Top"" are just two examples of the delightful and memorable music found in this iconic production. The show's impact on the world of theater has been profound, and its popularity endures to this day, continuing to delight audiences in productions all over the world."
6907,"You are an AI assistant. You will be given a task. You must generate a detailed and long answer.\n\nQ:Read the article and select the best answer. Article: The next generation of smart phone could combine the date from its gyroscopes with a built-in compass to allow you to track your indoor movements even without GPS. Research described it in the international Journal of Innovative Computing and Applications by Shahid Ayub of Lancaster University, and his colleagues there and at HW Communications, suggests that the embedded inertial sensors in many smart phones have added the facility to be used for localization and tracking applications. The primary benefit of using smart phones is that no additional infrastructure would need to be installed for monitoring personnel movements in a wide variety of situations not least staff, equipment and inventory movements in warehouse facilities, shopping malls and factories. In contrast, other technology being proposed would require new equipment,such as RFID technologies. However, smart phones do not yet have the ability to pinpoint their location without recourse to the GPS system, which is not accessible indoors and has the added disadvantage of using up battery power very quickly even if it could be used. Inertial navigation only provides the necessary information to a specified starting point the team says. This is useful for anyone with a non-GPS smart phone who wishes to track the route they take when walking or jogging if they specify their starting point. The team suggests that pedestrians indoors could be positioned using a combination of the smart phone accelerometer and an built-in digital compass, something that will become available in future smart phones. They have now investigated the potential of three different smart phone placement modes: idle, hand held and listening, which could be used with pedestrian dead reckoning (PDR) techniques to allow users to record the path they follow or for various kinds of workplace or other monitoring requirements. ""The technique could be used in underground tube stations, airports , train stations where there is no infrastructure installed for tracking or... Question: We can learn from the passage that the recent studies of smart phones _ . Options: A: were conducted by Shahid Ayub alone B: were supported by GPS C: have no connection with everyday needs D: can be applied to underground tube stations\nA:","The correct answer is D: can be applied to underground tube stations.\n\nThe passage explains that researchers have described a technique in which the next generation of smart phones could combine their built-in gyroscope and compass to track indoor movements without GPS. The study, conducted by Shahid Ayub and his colleagues, suggests that smart phones can be used for localization and tracking applications, including monitoring personnel movements in various situations, such as warehouses, shopping malls, and factories. The researchers have also investigated the potential of three different smart phone placement modes (idle, handheld, and listening) and pedestrian dead reckoning (PDR) techniques to enable users to record their path or monitor various workplace requirements.",We can learn from the passage that the recent studies of smart phones can be applied to underground tube stations.


In [59]:
eval_dataset

Dataset({
    features: ['question', 'answer', 'ground_truth', '__index_level_0__'],
    num_rows: 20
})

## Evaluating LLM Performance

In [86]:
from ragas.metrics import answer_similarity, answer_correctness
from ragas import evaluate
from datasets import Dataset

eval_dataset = Dataset.from_pandas(eval_df.head(20))

results = evaluate(dataset=eval_dataset, metrics=[answer_correctness, 
                                     answer_similarity],
                   llm=chatgpt, embeddings=embedder) # will take quite long with groq - might need to run few at a time with timeout

Evaluating:   0%|          | 0/40 [00:00<?, ?it/s]

In [87]:
results

{'answer_correctness': 0.5801, 'semantic_similarity': 0.8534}

In [88]:
results

{'answer_correctness': 0.5801, 'semantic_similarity': 0.8534}

In [89]:
results.to_pandas().head(5)

Unnamed: 0,user_input,response,reference,answer_correctness,semantic_similarity
0,You are an AI assistant. You will be given a task. You must generate a detailed and long answer.\n\nWrite the answer: The songs People Will Say We're in Love and The Surrey with the Fringe on Top are from which stage show?,"The songs ""People Will Say We're in Love"" and ""The Surrey with the Fringe on Top"" are both from the classic Rodgers and Hammerstein musical ""Oklahoma!"" The show premiered on March 31, 1943, at the St. James Theatre in New York City and was a groundbreaking production that revolutionized the art of musical theater.\n\n""Oklahoma!"" was the first musical to integrate music, dance, and drama seamlessly, blending elements of opera, ballet, and musical comedy. The show's innovative use of music, choreography, and storytelling helped to redefine the genre and paved the way for future musicals.\n\nThe musical is set in the early 20th century American frontier, specifically in the Indian Territory, which is now part of Oklahoma. The story revolves around the romance between two young lovers, Curly McLain and Laurey Williams, and their struggles with social class, family expectations, and personal aspirations.\n\nThe song ""People Will Say We're in Love"" is a duet sung by Curly and Laurey, in which they express their budding feelings for each other and the societal pressures that threaten to tear them apart. The song has become a classic standard, with its catchy melody and poignant lyrics capturing the hearts of audiences worldwide","The songs ""People Will Say We're in Love"" and ""The Surrey with the Fringe on Top"" are from the esteemed stage show ""Oklahoma!"". Oklahoma! is a groundbreaking American musical, with music by Richard Rodgers and lyrics by Oscar Hammerstein II. The stage show is an adaptation of the 1931 play ""Green Grow the Lilacs"" by Lynn Riggs. This modern musical premiered on Broadway on March 31, 1943, and it is considered one of the earliest examples of the now-familiar book musical format, where the songs and dances are woven seamlessly into a well-crafted, cohesive narrative.\n\nOklahoma! is a classic story set in the Oklahoma territory in the early 20th century. It follows the romantic pursuits of the spirited and independent Laurey Williams and the charming cowboy Curly McLain. Throughout their journey, the couple must navigate the ups and downs of love, as well as the challenges posed by Jud Fry, a brooding and dangerous farmhand who covets Laurey. The musical captures the essence of rural America at that time and paints a vivid portrait of life in the newly settled land.\n\n""People Will Say We're in Love"" is a beautiful duet between Laurey and Curly. In this song, the two characters coyly express their feelings for each other while attempting to maintain a playful sense of denial. They sing about wanting to avoid public speculation about their budding romance, but in reality, they are subtly revealing their true emotions to one another. It is a tender and heartwarming moment in the show that highlights the budding relationship between the lead characters.\n\n""The Surrey with the Fringe on Top"" is another captivating song from Oklahoma! Curly sings it to Laurey in an attempt to convince her to go to a social event with him. He engages in a charming and imaginative description of a lavish surrey, or horse-drawn carriage, that he claims to have. The song showcases Curly's charisma and creativity, as well as his ultimately endearing effort to win Laurey's affection. Despite the fact that the surrey exists only in his imagination, Laurey is won over by Curly's poetic descriptions and flirtatiousness.\n\nThe stage show Oklahoma! was a landmark production in the history of musical theater. It introduced innovative storytelling techniques, incorporating songs and dances as integral components of the narrative, and tackled complex themes and emotions. The songs ""People Will Say We're in Love"" and ""The Surrey with the Fringe on Top"" are just two examples of the delightful and memorable music found in this iconic production. The show's impact on the world of theater has been profound, and its popularity endures to this day, continuing to delight audiences in productions all over the world.",0.601724,0.958621
1,"You are an AI assistant. You will be given a task. You must generate a detailed and long answer.\n\nQ:Read the article and select the best answer. Article: The next generation of smart phone could combine the date from its gyroscopes with a built-in compass to allow you to track your indoor movements even without GPS. Research described it in the international Journal of Innovative Computing and Applications by Shahid Ayub of Lancaster University, and his colleagues there and at HW Communications, suggests that the embedded inertial sensors in many smart phones have added the facility to be used for localization and tracking applications. The primary benefit of using smart phones is that no additional infrastructure would need to be installed for monitoring personnel movements in a wide variety of situations not least staff, equipment and inventory movements in warehouse facilities, shopping malls and factories. In contrast, other technology being proposed would require new equipment,such as RFID technologies. However, smart phones do not yet have the ability to pinpoint their location without recourse to the GPS system, which is not accessible indoors and has the added disadvantage of using up battery power very quickly even if it could be used. Inertial navigation only provides the necessary information to a specified starting point the team says. This is useful for anyone with a non-GPS smart phone who wishes to track the route they take when walking or jogging if they specify their starting point. The team suggests that pedestrians indoors could be positioned using a combination of the smart phone accelerometer and an built-in digital compass, something that will become available in future smart phones. They have now investigated the potential of three different smart phone placement modes: idle, hand held and listening, which could be used with pedestrian dead reckoning (PDR) techniques to allow users to record the path they follow or for various kinds of workplace or other monitoring requirements. ""The technique could be used in underground tube stations, airports , train stations where there is no infrastructure installed for tracking or... Question: We can learn from the passage that the recent studies of smart phones _ . Options: A: were conducted by Shahid Ayub alone B: were supported by GPS C: have no connection with everyday needs D: can be applied to underground tube stations\nA:","The correct answer is D: can be applied to underground tube stations.\n\nThe passage explains that researchers have described a technique in which the next generation of smart phones could combine their built-in gyroscope and compass to track indoor movements without GPS. The study, conducted by Shahid Ayub and his colleagues, suggests that smart phones can be used for localization and tracking applications, including monitoring personnel movements in various situations, such as warehouses, shopping malls, and factories. The researchers have also investigated the potential of three different smart phone placement modes (idle, handheld, and listening) and pedestrian dead reckoning (PDR) techniques to enable users to record their path or monitor various workplace requirements.",We can learn from the passage that the recent studies of smart phones can be applied to underground tube stations.,0.36731,0.802575
2,"You are an AI assistant that helps people find information.\n\nPlease answer the following question by reasoning step-by-step. Given the sentence ""A gathering of what appear to be tourists mill about on an urban plaza that includes a large fountain showcasing a tall sculpture with a nautical theme."" can we conclude that ""There is a group of people at the plaza.""?\nOptions:\n- yes\n- it is not possible to tell\n- no. Step-by-step reasoning:","Let's break it down step-by-step:\n\n1. The sentence describes a scene: ""A gathering of what appear to be tourists mill about...""\n\t* ""A gathering"" is a group of people.\n\t* ""what appear to be tourists"" indicates that the people are likely tourists, but we can't be certain.\n\t* ""mill about"" means they are moving around in a relaxed manner, often in a social setting.\n\nSo, at this point, we have established that there is a group of people.\n\n2. The sentence continues: ""...on an urban plaza...""\n\t* An urban plaza is a public area in an urban setting.\n\n3. The sentence then describes the plaza: ""...that includes a large fountain showcasing a tall sculpture with a nautical theme.""\n\t* This part of the sentence doesn't directly relate to the presence of people.\n\n4. Since we've already established that there is a group of people (from step 1), and the additional information provided doesn't contradict this, we can conclude that:\n\t* Yes, we can conclude that ""There is a group of people at the plaza.""\n\nSo, the correct answer is: yes.","- The sentence states, ""A gathering of what appear to be tourists mill about on an urban plaza.""\n- ""A gathering of what appear to be tourists"" implies that there is a group of people at the location.\n- The location mentioned is ""an urban plaza.""\n\nConclusion: Yes, we can conclude that ""There is a group of people at the plaza.""",0.59103,0.952353
3,You are an AI assistant. User will you give you a task. Your goal is to complete the task as faithfully as you can. While performing the task think step-by-step and justify your steps.\n\nTranslate to Russian:\n\nThe database enterprises Chekhov in the sub-heading column Clinics for adults are offering information Companies Clinics for adults Chekhov This information is here checked before be available to you on the pages Business Portal in Clinics for adults Chekhov.,"Here's the translation to Russian:\n\nБаза данных предприятий Чехова в столбце ""Клиники для взрослых"" предлагает информацию Компании Клиники для взрослых Чехов. Эта информация была проверена до ее доступности на страницах Бизнес-портала Клиники для взрослых Чехова.\n\nJustification:\n\n1. I translated ""The database"" to ""База данных"" as it's a direct translation of the phrase.\n2. ""enterprises"" was translated to ""предприятий"" as it's a plural noun.\n3. ""Chekhov"" remained the same as it's a proper noun.\n4. ""in the sub-heading column"" was translated to ""в столбце"" as ""столбец"" is a common Russian word for a column.\n5. ""Clinics for adults"" was translated to ""Клиники для взрослых"" as it's a literal translation.\n6. ""are offering"" was translated to ""предлагает"" as it's a verb phrase.\n7. ""information"" was translated to ""информацию"" as it's a noun.\n8. ""Companies"" was translated to ""Компании"" as","База данных предприятий Чехова в подрубрике ""Клиники для взрослых"" предоставляет информацию о компаниях Клиник для взрослых Чехова. Эта информация проверяется здесь, прежде чем становиться доступной вам на страницах Бизнес портала в разделе ""Клиники для взрослых Чехов"".\n\nStep 1: Identify the main phrases in the text.\n- The database enterprises Chekhov\n- sub-heading column Clinics for adults\n- offering information companies Clinics for adults Chekhov\n- This information is here checked before be available to you\n- on the pages Business Portal in Clinics for adults Chekhov.\n\nStep 2: Translate individual phrases to Russian.\n- База данных предприятий Чехова\n- подрубрике ""Клиники для взрослых""\n- предоставляет информацию о компаниях Клиник для взрослых Чехова\n- Эта информация проверяется здесь, прежде чем становиться доступной вам\n- на страницах Бизнес портала в разделе ""Клиники для взрослых Чехов"".\n\nStep 3: Arrange the translated phrases in the correct order.\n- База данных предприятий Чехова в подрубрике ""Клиники для взрослых"" предоставляет информацию о компаниях Клиник для взрослых Чехова. Эта информация проверяется здесь, прежде чем становиться доступной вам на страницах Бизнес портала в разделе ""Клиники для взрослых Чехов"".",0.726384,0.828615
4,"Explain how you used the definition to come up with the answer.\n\nIn this task, you're given four sentences of a story written in natural language in which one part is missing. Your job is to predict the position and missing part of the story and return in the following format: position, missing part. The missing part is a sentence that completes the story, and the position is the number of the missing sentence in the new story.\nQ: Sentence1: There is a cemetery with 199 grave stones. Sentence2: 12 men brought a casket. Sentence3: 12 men put the casket in the ground. Sentence4: They mourned for losses.\nA:","Position: 1, Missing part: A old man had been buried there for 199 years.","2, They dug a grave before placing the casket.",0.778018,0.657526


In [91]:
subset_df.insert(2, 'question', subset_df['system'].astype(str) + '\n\n' + subset_df['prompt'])
subset_df.head(1)

Unnamed: 0,system,prompt,question,ai_response,chosen,rejected
11966,You are an AI assistant. You will be given a task. You must generate a detailed and long answer.,Write the answer: The songs People Will Say We're in Love and The Surrey with the Fringe on Top are from which stage show?,You are an AI assistant. You will be given a task. You must generate a detailed and long answer.\n\nWrite the answer: The songs People Will Say We're in Love and The Surrey with the Fringe on Top are from which stage show?,"The songs ""People Will Say We're in Love"" and ""The Surrey with the Fringe on Top"" are both from the classic Rodgers and Hammerstein musical ""Oklahoma!"" The show premiered on March 31, 1943, at the St. James Theatre in New York City and was a groundbreaking production that revolutionized the art of musical theater.\n\n""Oklahoma!"" was the first musical to integrate music, dance, and drama seamlessly, blending elements of opera, ballet, and musical comedy. The show's innovative use of music, choreography, and storytelling helped to redefine the genre and paved the way for future musicals.\n\nThe musical is set in the early 20th century American frontier, specifically in the Indian Territory, which is now part of Oklahoma. The story revolves around the romance between two young lovers, Curly McLain and Laurey Williams, and their struggles with social class, family expectations, and personal aspirations.\n\nThe song ""People Will Say We're in Love"" is a duet sung by Curly and Laurey, in which they express their budding feelings for each other and the societal pressures that threaten to tear them apart. The song has become a classic standard, with its catchy melody and poignant lyrics capturing the hearts of audiences worldwide","The songs ""People Will Say We're in Love"" and ""The Surrey with the Fringe on Top"" are from the esteemed stage show ""Oklahoma!"". Oklahoma! is a groundbreaking American musical, with music by Richard Rodgers and lyrics by Oscar Hammerstein II. The stage show is an adaptation of the 1931 play ""Green Grow the Lilacs"" by Lynn Riggs. This modern musical premiered on Broadway on March 31, 1943, and it is considered one of the earliest examples of the now-familiar book musical format, where the songs and dances are woven seamlessly into a well-crafted, cohesive narrative.\n\nOklahoma! is a classic story set in the Oklahoma territory in the early 20th century. It follows the romantic pursuits of the spirited and independent Laurey Williams and the charming cowboy Curly McLain. Throughout their journey, the couple must navigate the ups and downs of love, as well as the challenges posed by Jud Fry, a brooding and dangerous farmhand who covets Laurey. The musical captures the essence of rural America at that time and paints a vivid portrait of life in the newly settled land.\n\n""People Will Say We're in Love"" is a beautiful duet between Laurey and Curly. In this song, the two characters coyly express their feelings for each other while attempting to maintain a playful sense of denial. They sing about wanting to avoid public speculation about their budding romance, but in reality, they are subtly revealing their true emotions to one another. It is a tender and heartwarming moment in the show that highlights the budding relationship between the lead characters.\n\n""The Surrey with the Fringe on Top"" is another captivating song from Oklahoma! Curly sings it to Laurey in an attempt to convince her to go to a social event with him. He engages in a charming and imaginative description of a lavish surrey, or horse-drawn carriage, that he claims to have. The song showcases Curly's charisma and creativity, as well as his ultimately endearing effort to win Laurey's affection. Despite the fact that the surrey exists only in his imagination, Laurey is won over by Curly's poetic descriptions and flirtatiousness.\n\nThe stage show Oklahoma! was a landmark production in the history of musical theater. It introduced innovative storytelling techniques, incorporating songs and dances as integral components of the narrative, and tackled complex themes and emotions. The songs ""People Will Say We're in Love"" and ""The Surrey with the Fringe on Top"" are just two examples of the delightful and memorable music found in this iconic production. The show's impact on the world of theater has been profound, and its popularity endures to this day, continuing to delight audiences in productions all over the world.","Ah, greetings! I'm here to assist you with your question. The songs ""People Will Say We're in Love"" and ""The Surrey with the Fringe on Top"" are from the beloved stage show ""Oliver!"" Based on the novel ""Oliver Twist"" by Charles Dickens, ""Oliver!"" is a classic musical that premiered in the West End in 1960 and on Broadway in 1963. The music and lyrics are by Lionel Bart, and the show was directed and choreographed by Carol Reed and Robb Roberge, respectively.\n\n""People Will Say We're in Love"" is a duet sung by the characters Oliver and Nancy, while ""The Surrey with the Fringe on Top"" is a solo number performed by Nancy. Both songs are memorable and catchy, showcasing the charming and witty lyrics that are a hallmark of Lionel Bart's work.\n\nThe story of ""Oliver!"" follows the titular character, a young orphan who is taken in by a group of pickpockets and thieves in Victorian London. As Oliver learns the ways of the streets and becomes embroiled in the criminal underworld, he must also navigate his own emotions and loyalties. The show features a talented cast of characters, including Fagin, Bill Sikes, and the Artful Dodger, all of whom are brought to life by the memorable songs and clever dialogue.\n\n""People Will Say We're in Love"" is a sweet and tender duet that captures the blossoming romance between Oliver and Nancy. The song features lyrics like ""People will say we're in love / Just because our hands touch / And our hearts beat as one"" and ""We'll be in love / Until the end of time."" The song is a beautiful expression of the innocence and optimism of young love, and it is a highlight of the show.\n\n""The Surrey with the Fringe on Top"" is a more upbeat and playful number, with Nancy singing about her dreams of a better life and her longing for Oliver. The song features lyrics like ""I'd like to ride in a surrey with the fringe on top / And never have a care in the world"" and ""I'd like to be a lady in a fine ball gown / And dance with the prince in the palace."" The song showcases Nancy's spunky and independent personality, and it is a standout moment in the show.\n\nOverall, ""Oliver!"" is a timeless and beloved stage show that features a wealth of memorable songs, including ""People Will Say We're in Love"" and ""The Surrey with the Fringe on Top."" The show has been revived numerous times on both sides of the Atlantic, and it continues to delight audiences with its charming characters, catchy songs, and heartwarming story."


In [92]:
import json

responses = []
for idx, row in tqdm(subset_df.head(2).iterrows()):
    prompt = f"""
                Act as an expert grader of responses from an AI model. 
                Follow these rules.
                  - Given the following question and AI answer, compare it to reference answers A1 and A2
                  - Return the response as a JSON with keys A1 and A2 and values of how close is the AI answer to each of them
                  - Represent closeness in a scale between 0 - 100 which is comparative
                  - Comparative closeness means sum of AI answer closeness to A1 and A2 should sum up to 100
                  - Example A1: 90, A2: 10 means comparing AI answer to A1 and A2, the AI answer is 90% closer to A1 and only 10% closer to A2
                  - Closeness should be measured by comparing how close is AI answer to A1 and A2 in terms of their meaning and context.
                  
                Return the response as a valid JSON dict only and not markdown JSON. Do NOT return markdown.
                
                Question:
                {row['question']}
                
                AI Answer:
                {row['ai_response']}
                
                A1:
                {row['chosen']}
                
                A2:
                {row['rejected']}
            """
    response = chatgpt.invoke(prompt)
    response = response.content
    response = json.loads(response)
    responses.append(response)

print(responses)

2it [00:01,  1.01it/s]

[{'A1': 95, 'A2': 5}, {'A1': 85, 'A2': 15}]





In [95]:
responses = []
for idx, row in tqdm(subset_df.iterrows()):
    prompt = f"""
                Act as an expert grader of responses from an AI model. 
                Follow these rules.
                  - Given the following question and AI answer, compare it to reference answers A1 and A2
                  - Return the response as a JSON 
                  - Response should have keys A1, A2, Winner and values of how close is the AI answer to each of them and which answer is the winner
                  - Represent closeness in a scale between 0 - 100 which is comparative
                  - Comparative closeness means sum of AI answer closeness to A1 and A2 should sum up to 100
                  - Example A1: 90, A2: 10 means comparing AI answer to A1 and A2, the AI answer is 90% closer to A1 and only 10% closer to A2
                  - Closeness should be measured by comparing how close is AI answer to A1 and A2 in terms of their meaning and context.
                  
                Return the response as a valid JSON dict only and not markdown JSON. Do NOT return markdown.
                
                Question:
                {row['question']}
                
                AI Answer:
                {row['ai_response']}
                
                A1:
                {row['chosen']}
                
                A2:
                {row['rejected']}
            """
 
    response = chatgpt.invoke(prompt)
    response = response.content
    response = json.loads(response)
    responses.append(response)

20it [00:21,  1.08s/it]


In [96]:
response_df = pd.DataFrame(responses)
response_df

Unnamed: 0,A1,A2,Winner
0,95,5,A1
1,85,15,A1
2,60,40,A1
3,70,30,A1
4,20,80,A2
5,80,20,A1
6,80,20,A1
7,30,70,A2
8,50,50,Tie
9,40,60,A2


In [97]:
response_df['Winner'].value_counts()

Winner
A1     14
A2      3
Tie     3
Name: count, dtype: int64

In [98]:
responses = []
for idx, row in tqdm(subset_df.iterrows()):
    prompt = f"""
                Act as an expert grader of responses from an AI model. 
                Follow these rules.
                  - Given the following question and AI answer, compare it to reference answers A1 and A2
                  - Return the response as a JSON 
                  - Response should have keys A1, A2, Winner and values of how close is the AI answer to each of them and which answer is the winner
                  - Represent closeness in a scale between 0 - 100 which is comparative
                  - Comparative closeness means sum of AI answer closeness to A1 and A2 should sum up to 100
                  - Example A1: 90, A2: 10 means comparing AI answer to A1 and A2, the AI answer is 90% closer to A1 and only 10% closer to A2
                  - Closeness should be measured by comparing how close is AI answer to A1 and A2 in terms of their meaning and context.
                  
                Return the response as a valid JSON dict only and not markdown JSON. Do NOT return markdown.
                
                Question:
                {row['question']}
                
                AI Answer:
                {row['ai_response']}
                
                A1:
                {row['chosen']}
                
                A2:
                {row['rejected']}
            """
 
    response = llm.invoke(prompt)
    response = response.content
    response = json.loads(response)
    responses.append(response)

20it [01:35,  4.76s/it]


In [99]:
response_df = pd.DataFrame(responses)
response_df

Unnamed: 0,A1,A2,Winner
0,95,5,A1
1,95,5,A1
2,95,80,A1
3,80,20,A1
4,80,20,A1
5,80,20,A1
6,80,20,A1
7,20,80,A2
8,100,0,A1
9,80,20,A1


In [100]:
response_df['Winner'].value_counts()

Winner
A1    18
A2     2
Name: count, dtype: int64

In [101]:
!rm -rf Llama3-8B-it-dpo

In [102]:
!rm -rf llama3-dpo-runs