### Imports

In [1]:
!nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Nov__3_17:22:02_PDT_2023
Cuda compilation tools, release 12.3, V12.3.103
Build cuda_12.3.r12.3/compiler.33492891_0


In [2]:
import torch
import pandas as pd
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
print("CUDA available:", torch.cuda.is_available())
print("GPU name:", torch.cuda.get_device_name(0) if torch.cuda.is_available() else "None")

CUDA available: False
GPU name: None


### Load the model

### Load the data and check the head

In [4]:
# --- Load Dataset ---
dataset_id = "Magpie-Align/Magpie-Air-DPO-100K-v0.1"
dataset = load_dataset(dataset_id, split="train")

# Convert to Pandas DataFrame for easy viewing
df = pd.DataFrame(dataset)
print("🧾 Dataset Features:")
print(df.head())
print("\n📦 Available columns:", df.columns.tolist())

🧾 Dataset Features:
                                   uuid  \
0  6d978758-3406-5b33-883a-2ddcd25e3d7c   
1  cb47e985-3d7c-57f7-9c6f-00231e6784b3   
2  7b4cecf5-9b18-5fbb-be21-e7b7dc29d7ae   
3  6313871b-19ae-5199-8f6e-bf65cbe4d2e5   
4  767cd4cb-0e7a-57da-a46d-939b8e8c34a1   

                                         instruction  \
0  I am trying to determine if a given computer p...   
1  I have a question about generative models, spe...   
2  I have a cooled magnet and a topological insul...   
3  I'm looking for a way to create a RESTful API ...   
4  What are the significance and organizational s...   

                      gen_input_configs  \
0   {'temperature': 1.0, 'top_p': 0.99}   
1  {'temperature': 1.0, 'top_p': 0.995}   
2  {'temperature': 1.1, 'top_p': 0.995}   
3   {'temperature': 1.0, 'top_p': 0.99}   
4  {'temperature': 1.25, 'top_p': 0.99}   

                                              intent  \
0  The user wants to evaluate the performance of ...   
1  The user w

### Check one sample to see what we are talking about

In [9]:
df.loc[0, "instruction"]

'I am trying to determine if a given computer program is a good solution for a specific problem. How do I evaluate the performance of a computer program?'

In [10]:
df.loc[0, "gen_input_configs"]

{'temperature': 1.0, 'top_p': 0.99}

In [11]:
df.loc[0, "intent"]

'The user wants to evaluate the performance of a computer program'

In [12]:
df.loc[0, "knowledge"]

'To solve this problem, the models need to know about program evaluation, performance metrics, and possibly programming concepts'

In [13]:
df.loc[0, "difficulty"]

'medium'

In [14]:
df.loc[0, "input_quality"]

'good'

In [15]:
df.loc[0, "quality_explanation"]

"The query is clear and specific about the user's goal, which is to evaluate the performance of a computer program. The user provides a specific context, which is helpful. However, the query could be improved by providing more details about the specific problem they are trying to solve and what they mean by 'good solution'. Additionally, the query could benefit from more specificity about what aspects of the program's performance they want to evaluate."

In [16]:
df.loc[0, "task_category"]

'Information seeking'

In [17]:
df.loc[0, "input_length"]

np.int64(152)

In [18]:
df.loc[0, "responses"][0] # we have 5 responses

"Evaluating the performance of a computer program involves assessing its ability to meet the requirements and needs of the problem it is intended to solve. Here are some key aspects to consider when evaluating the performance of a computer program:\n\n1. **Accuracy**: Does the program produce the correct output for a given input? Does it maintain a high level of precision and accuracy?\n2. **Speed**: How fast does the program execute? Does it respond quickly enough to meet the user's expectations?\n3. **Efficiency**: Does the program use resources (e.g., memory, CPU, network bandwidth) efficiently? Does it minimize waste and optimize system resources?\n4. **Scalability**: Can the program handle large or complex inputs? Does it scale well with increasing data sizes or user demand?\n5. **Stability**: Does the program crash or exhibit unusual behavior under normal use? Is it robust and reliable?\n6. **Usability**: Is the program easy to use and understand? Are the user interface and user 

In [19]:
df.loc[0, "gen_response_configs"]

{'engine': 'vllm',
 'max_tokens': 4096,
 'output_generator': 'meta-llama/Meta-Llama-3-8B-Instruct',
 'prompt': '<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nI am trying to determine if a given computer program is a good solution for a specific problem. How do I evaluate the performance of a computer program?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n',
 'repetition_penalty': 1.0,
 'stop_tokens': ['<|eot_id|>',
  '<|end_of_text|>',
  '<|starter_header_id|>',
  '<|end_header_id|>'],
 'temperature': 0.8,
 'top_p': 1.0}

In [20]:
df.loc[0, "rewards_armorm"]

[{'score': 0.1845703125},
 {'score': 0.185546875},
 {'score': 0.18359375},
 {'score': 0.189453125},
 {'score': 0.1826171875}]

In [21]:
df.loc[0, "chosen"]

[{'content': 'I am trying to determine if a given computer program is a good solution for a specific problem. How do I evaluate the performance of a computer program?',
  'role': 'user'},
 {'content': "Evaluating the performance of a computer program is a crucial step in determining whether it effectively solves a specific problem. Here are some key metrics and methods to help you assess the performance of a computer program:\n\n1. **Accuracy**: How well does the program achieve the intended result? Measure the accuracy of the program by comparing its output with the expected output or a known reference solution.\n2. **Efficiency**: How quickly does the program execute? Measure the program's execution time, memory usage, and CPU usage to evaluate its efficiency.\n3. **Scalability**: How well does the program handle large inputs or increasing amounts of data? Test the program with larger inputs to ensure it can scale horizontally or vertically.\n4. **Reliability**: How frequently does t

In [22]:
df.loc[0, "rejected"][1]

 'role': 'assistant'}

In [11]:
multi_obj_rewards

tensor([[0.6562, 0.6602, 0.8008, 0.3887, 0.3730, 0.5859, 0.5742, 0.7031, 0.6445,
         0.4551, 0.9180, 0.5391, 0.5352, 0.5469, 0.4082, 0.3867, 0.3828, 0.4258,
         0.4434]])

In [12]:
c

tensor([[6.8054e-03, 4.8021e-09, 5.1688e-08, 1.2016e-04, 6.9141e-01, 9.2387e-06,
         7.7637e-02, 2.0082e-09, 3.0640e-02, 2.2701e-09, 8.6054e-07, 9.1642e-07,
         9.0599e-05, 5.4836e-05, 1.9434e-01, 7.0781e-08, 7.2122e-06, 3.9291e-04,
         6.9849e-09]])

In [21]:
gating_output

tensor([[6.8054e-03, 4.8021e-09, 5.1688e-08, 1.2016e-04, 6.9141e-01, 9.2387e-06,
         7.7637e-02, 2.0082e-09, 3.0640e-02, 2.2701e-09, 8.6054e-07, 9.1642e-07,
         9.0599e-05, 5.4836e-05, 1.9434e-01, 7.0781e-08, 7.2122e-06, 3.9291e-04,
         6.9849e-09]])

In [22]:
multi_obj_coeffs

tensor([[ 6.8054e-03,  4.8021e-09,  5.1688e-08,  1.2016e-04, -1.0699e-01,
          9.2387e-06,  7.7637e-02,  2.0082e-09,  3.0640e-02,  2.2701e-09,
          8.6054e-07,  9.1642e-07,  9.0599e-05,  5.4836e-05,  1.9434e-01,
          7.0781e-08,  7.2122e-06,  3.9291e-04,  6.9849e-09]])

In [23]:
torch.dot(multi_obj_coeffs.view(-1), multi_obj_rewards.view(-1))

tensor(0.1085)

In [25]:
preference_score

tensor([0.1084])

In [5]:
import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer
device = "cpu"
path = "RLHFlow/ArmoRM-Llama3-8B-v0.1"
model = AutoModelForSequenceClassification.from_pretrained(path, device_map=device, 
                               trust_remote_code=True, torch_dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained(path, use_fast=True)
# We load a random sample from the validation set of the HelpSteer dataset
prompt = 'What are some synonyms for the word "beautiful"?'
response = "Nicely, Beautifully, Handsome, Stunning, Wonderful, Gorgeous, Pretty, Stunning, Elegant"
messages = [{"role": "user", "content": prompt},
           {"role": "assistant", "content": response}]
input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to(device)
with torch.no_grad():
   output = model(input_ids)
   # Multi-objective rewards for the response
   multi_obj_rewards = output.rewards.cpu().float() 
   # The gating layer's output is conditioned on the prompt
   c = output.gating_output.cpu().float()
   # The preference score for the response, aggregated from the 
   # multi-objective rewards with the gating layer
   preference_score = output.score.cpu().float()  
# We apply a transformation matrix to the multi-objective rewards
# before multiplying with the gating layer's output. This mainly aims
# at reducing the verbosity bias of the original reward objectives
obj_transform = model.reward_transform_matrix.data.cpu().float()
# The final coefficients assigned to each reward objective
# NOTE: this seems different compared to what they actually do in the paper.
multi_obj_coeffs = gating_output @ obj_transform.T
# The preference score is the linear combination of the multi-objective rewards with
# the multi-objective coefficients, which can be verified by the following assertion
assert torch.isclose(torch.sum(multi_obj_rewards * multi_obj_coeffs, dim=1), preference_score, atol=1e-3) 
# Find the top-K reward objectives with coefficients of the highest magnitude
K = 3
top_obj_dims = torch.argsort(torch.abs(multi_obj_coeffs), dim=1, descending=True,)[:, :K]
top_obj_coeffs = torch.gather(multi_obj_coeffs, dim=1, index=top_obj_dims)

# The attributes of the 19 reward objectives
attributes = ['helpsteer-helpfulness','helpsteer-correctness','helpsteer-coherence',
   'helpsteer-complexity','helpsteer-verbosity','ultrafeedback-overall_score',
   'ultrafeedback-instruction_following', 'ultrafeedback-truthfulness',
   'ultrafeedback-honesty','ultrafeedback-helpfulness','beavertails-is_safe',
   'prometheus-score','argilla-overall_quality','argilla-judge_lm','code-complexity',
   'code-style','code-explanation','code-instruction-following','code-readability']

example_index = 0
for i in range(K):
   attribute = attributes[top_obj_dims[example_index, i].item()]
   coeff = top_obj_coeffs[example_index, i].item()
   print(f"{attribute}: {round(coeff,5)}")
# code-complexity: 0.19922
# helpsteer-verbosity: -0.10864
# ultrafeedback-instruction_following: 0.07861

# The actual rewards of this example from the HelpSteer dataset
# are [3,3,4,2,2] for the five helpsteer objectives: 
# helpfulness, correctness, coherence, complexity, verbosity
# We can linearly transform our predicted rewards to the 
# original reward space to compare with the ground truth
helpsteer_rewards_pred = multi_obj_rewards[0, :5] * 5 - 0.5
print(helpsteer_rewards_pred)
# [2.78125   2.859375  3.484375  1.3847656 1.296875 ]


Loading checkpoint shards: 100%|███████████████████████████████████████| 4/4 [00:00<00:00,  5.75it/s]


code-complexity: 0.19434
helpsteer-verbosity: -0.10699
ultrafeedback-instruction_following: 0.07764
tensor([2.7812, 2.8008, 3.5039, 1.4434, 1.3652])


### Let's try the model on the Magpie