# Llama 3.2 1B Evaluation
Ryan Roi Cayas\
2022-22085

In this notebook, we replicate the published scores of the Llama 3.2 1B on the MGSM dataset.

## 1. Prerequisites


### 1.1 Load libraries and set-up CUDA

In [28]:
# Import the necessary libraries
import os
import json
from tqdm import tqdm

import torch
from transformers import LlamaForCausalLM, PreTrainedTokenizerFast
from datasets import load_dataset, concatenate_datasets # Huggingface datasets (https://huggingface.co/docs/datasets/)
from huggingface_hub import login

In [5]:
# Set-up CUDA device
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID" 
# use a specific GPU
os.environ["CUDA_VISIBLE_DEVICES"]="4"

# Use GPU for inference
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Print the device being used
print(f"Using device: {device}")

# Check the GPU name
if device.type == 'cuda':
    gpu_name = torch.cuda.get_device_name(0)  # 0 because CUDA_VISIBLE_DEVICES=4 means GPU 4 is now 0
    print("Using GPU:", gpu_name)

Using device: cuda
Using GPU: NVIDIA A100-SXM4-40GB


### 1.2 Load the pre-trained tokenizer and model

In [7]:
# Paths to model and tokenizer
model_dir = "../../../../../llm/llama/Llama-3.2-1B-Instruct"

# Load tokenizer and model
tokenizer = PreTrainedTokenizerFast.from_pretrained(model_dir)
model = LlamaForCausalLM.from_pretrained(model_dir)

# Set the eos_token as the padding token
tokenizer.pad_token = tokenizer.eos_token

# Move the model to the GPU
model.to(device)

# Set the model to evaluation mode
model.eval()

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(128256, 2048)
    (layers): ModuleList(
      (0-15): 16 x LlamaDecoderLayer(
        (self_attn): LlamaSdpaAttention(
          (q_proj): Linear(in_features=2048, out_features=2048, bias=False)
          (k_proj): Linear(in_features=2048, out_features=512, bias=False)
          (v_proj): Linear(in_features=2048, out_features=512, bias=False)
          (o_proj): Linear(in_features=2048, out_features=2048, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear(in_features=2048, out_features=8192, bias=False)
          (up_proj): Linear(in_features=2048, out_features=8192, bias=False)
          (down_proj): Linear(in_features=8192, out_features=2048, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm((2048,), eps=1e-05)
        (post_attention_layernorm): LlamaRMSNorm((2048,), eps=1e-05)
      )
    )
    (norm):

### 1.3 Load the MGSM Dataset

In [21]:
# Languages: English, Spanish, French, German, Russian, Chinese, Japanese, Thai, Swahili, Bengali, Telugu
languages = ["en","es","fr","de","ru","zh","ja","th","sw","bn","te"]

# Empty list to store datasets with added 'language' feature
train_datasets = []
test_datasets = []

# Load the datasets and add the 'language' feature
for lang in tqdm(languages, desc="Loading datasets"):
    # Load train and test datasets for the language
    train = load_dataset("juletxara/mgsm", lang, split="train")
    test = load_dataset("juletxara/mgsm", lang, split="test")
    
    # Add the 'language' feature to both train and test sets
    train = train.add_column("language", [lang] * len(train))
    test = test.add_column("language", [lang] * len(test))
    
    # Append the datasets with the 'language' feature to the lists
    train_datasets.append(train)
    test_datasets.append(test)

# Concatenate datasets from all languages into a single train and test dataset
mgsm_train = concatenate_datasets(train_datasets)
mgsm_test = concatenate_datasets(test_datasets)

# Verify the structure
print(mgsm_train)
print(mgsm_test)


Loading datasets: 100%|██████████| 11/11 [00:32<00:00,  2.96s/it]

Dataset({
    features: ['question', 'answer', 'answer_number', 'equation_solution', 'language'],
    num_rows: 88
})
Dataset({
    features: ['question', 'answer', 'answer_number', 'equation_solution', 'language'],
    num_rows: 2750
})





In [26]:
mgsm_train.to_pandas().head()

Unnamed: 0,question,answer,answer_number,equation_solution,language
0,Question: Roger has 5 tennis balls. He buys 2 ...,Step-by-Step Answer: Roger started with 5 ball...,11,5 + 6 = 11.,en
1,Question: There were nine computers in the ser...,Step-by-Step Answer: There are 4 days from mon...,29,4 * 5 = 20. 9 + 20 = 29.,en
2,Question: Leah had 32 chocolates and her siste...,Step-by-Step Answer: Leah had 32 chocolates an...,39,32 + 42 = 74. 74 - 35 = 39.,en
3,"Question: Shawn has five toys. For Christmas, ...",Step-by-Step Answer: He has 5 toys. He got 2 f...,9,5 + 2 = 7. 7 + 2 = 9.,en
4,Question: Michael had 58 golf balls. On tuesda...,Step-by-Step Answer: Michael started with 58 g...,33,58 - 23 = 35. 35 - 2 = 33.,en


In [23]:
mgsm_test.to_pandas().head()

Unnamed: 0,question,answer,answer_number,equation_solution,language
0,Janet’s ducks lay 16 eggs per day. She eats th...,,18,,en
1,A robe takes 2 bolts of blue fiber and half th...,,3,,en
2,Josh decides to try flipping a house. He buys...,,70000,,en
3,James decides to run 3 sprints 3 times a week....,,540,,en
4,"Every day, Wendi feeds each of her chickens th...",,20,,en


In [27]:
sample_idx = 0

print("Sample Problem:")
print("Question:", mgsm_train[sample_idx]['question'])
print("Answer:", mgsm_train[sample_idx]['answer'])
print("Answer Number:", mgsm_train[sample_idx]['answer_number'])
print("Equation_Solution:", mgsm_train[sample_idx]['equation_solution'])

Sample Problem:
Question: Question: Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now?
Answer: Step-by-Step Answer: Roger started with 5 balls. 2 cans of 3 tennis balls each is 6 tennis balls. 5 + 6 = 11. The answer is 11.
Answer Number: 11
Equation_Solution: 5 + 6 = 11.


### 1.4 Load evaluation data from Meta

The evaluation data is publicly released by Meta and is available here: https://huggingface.co/datasets/meta-llama/Llama-3.2-1B-Instruct-evals

We mainly use this to identify the additional instructions and prompts used by Meta to evaluate the model.

In [None]:
# Load token from config.json
with open("config.json") as f:
    config = json.load(f)

hf_token = config["hf_token"]
login(hf_token)
eval_dataset_FROM_META = load_dataset("meta-llama/Llama-3.2-1B-Instruct-evals", "Llama-3.2-1B-Instruct-evals__mgsm__details")

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: fineGrained).
Your token has been saved to /data/students/ryan/.cache/huggingface/token
Login successful
