# Benchmark lm-evaluation-harness over the model

Authors: Giacomo Zuccarino, Jhon Sebastián Moreno Triana.

Programmed as part of the assignment of the course P2.11_advanced_DL_24_25.

Professors: Alerto Cazzaniga, Cristiano de Nobili

Program: Master in High Performance Computing.

Institution: SISSA/ICTP, Trieste.

---

In the following notebook we use the lm-evaluation-harness for execute some benchmarks over the model after the Continued Pre Training (CPT) and fine tuning using LoRA adapters.

> 📝 <font color="DodgerBlue"><b>NOTE</b></font>
>
> <font color="DodgerBlue">If you want to check the CPT and fine tuning notebook you can go to [git hub notebook](./Gemma-3-4B-CPT-and-Fine-tuning.ipynb) if you want to open it from the github repository  or  [colab notebook](https://colab.research.google.com/drive/1Fn80nVlwy1vNqx8sMZ6ev4Um6KMNEkGf?usp=sharing) if you want to open it using the google colab platform.</front>

## 0. Initial setup and must run cells

The following cells install all the packages needed for running the model's benchmark and setup variables for running the notebook with any unsloth gemma 3 model with LoRA adapters.

> ⚠️ <font color="GoldenRod"><b>CAUTION</b> </font>
>
> <font color="GoldenRod">Please read the instruction related to the unsloth model setup is very important if you want to run the unsloth model setup with a non default configuration. This setup instruction can be found on the [Readme file](github.com) at the github repository and/or in the [following subsection](#important-model-setup).</font>

In [None]:
# Installing unsloth without any dependency (those are overwrited by lm-eval)
!pip install --no-deps unsloth

# Installing unsloth_zoo and some dependencies for using, saving, pushing and loading the models
!pip install --no-deps bitsandbytes accelerate xformers==0.0.29.post3 peft "trl==0.15.2" triton cut_cross_entropy unsloth_zoo
!pip install sentencepiece protobuf datasets huggingface_hub hf_transfer hf_xet

# Installing lm-eval varsion 0.4.8 and some pre-requisites
# In this way lm-eval is going to install the lost dependencies of unsloth
!pip install --no-deps sacrebleu portalocker colorama evaluate sqlitedict
!pip install lm-eval==0.4.8

Collecting unsloth
  Downloading unsloth-2025.4.7-py3-none-any.whl.metadata (46 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m46.8/46.8 kB[0m [31m2.0 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading unsloth-2025.4.7-py3-none-any.whl (218 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m218.5/218.5 kB[0m [31m3.8 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: unsloth
Successfully installed unsloth-2025.4.7
Collecting bitsandbytes
  Downloading bitsandbytes-0.45.5-py3-none-manylinux_2_24_x86_64.whl.metadata (5.0 kB)
Collecting xformers==0.0.29.post3
  Downloading xformers-0.0.29.post3-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (1.0 kB)
Collecting trl==0.15.2
  Downloading trl-0.15.2-py3-none-any.whl.metadata (11 kB)
Collecting cut_cross_entropy
  Downloading cut_cross_entropy-25.1.1-py3-none-any.whl.metadata (9.3 kB)
Collecting unsloth_zoo
  Downloading unsloth_zoo-2025.4.4-py3-none-any.whl.metadata (8.0 kB)
Downloading xfo

<a name="important-model-setup"></a>
### IMPORTANT model setup

> 📝 <font color="DodgerBlue"><b>NOTE</b> </font>
>
> <font color="DodgerBlue">This part is needed just if you are using a different LoRA addapter and/or a different base model. In this case the variable `run_the_full_setup` in the following cells have to be turn to `True`</font>

Running this code can be very callenging because of the unsloth+LoRA+gemma3 and lm-evaluation-harness uncompatibility. In this case we need to merge the base model with the LoRA adapter after the fine tuning, if this can sound an easy task it gets very difficult due to unsloth and gemma3 class wrapping that can be very divergent from a usual hugging face model.

For make the run of the setup easier we setup some variables that can help to run the model better.

---

<h4><font color="red">HOW TO RUN THE CODE (with enough gpu):</font></h4>

If you have enough gpu you can just change the _hugginh face path_ related variables for the ones that you are using and set to `True` the `enough_gp` variable and run the code, that's all.

 ---

<h4><font color="red">HOW TO RUN THE CODE (without enough gpu):</font></h4>

Without GPU we have to do more steps, but don't worry is not that hard. Now, `enough_gpu` variable should be set up to `False`, because of GPU V-RAM limitation you need to restart the colab session after you finnish the model saving local (that is why we are saving locally the models due to limitated GPU resources).

In this case you need to run the [setup cell](#setup-cell) twice. The first time you need to run it with the variable `first_run` equals to `True`, then restart the sesion and set it to `False` and run it again. That's it.

---

**And finally...**

You can run the `lm-eval` command with the local directory to your model or the _hugging face path_ to the final merged model (the same as the `merge_model_hf_path` variable) and the task taht you want to run.

> 📝 <font color="DodgerBlue"><b>NOTE</b></font>
>
> <font color="DodgerBlue"> We suggest you to push to hugging face if you want to run it in the future, so you don't have to restart everything from scratch and fou can go directly to the `lm-eval` command.</font>

> ⚡ <font color="Tomato"><b>IMPORTANT</b> </font>
>
> <font color="Tomato" >Before you change the values of this variable please read the previous markdown cell.</font>


In [None]:
base_model_hf_path = "unsloth/gemma-3-4b-it-unsloth-bnb-4bit"
LoRA_adapter_hf_path = "Jh0mpis/gemma-3b-physics-instruct-alpaca-v2"
merge_model_hf_path = "Jh0mpis/gemma-3-4b-physics-instruct-alpaca-model"

In [None]:
enough_gpu = False
pushing_to_hf = False
run_the_full_setup = False

## Running the benchmark (+ unsloth model setup)

Executing the `lm-eval` command alongside unsloth gemma 3 model is not a trivial task. For running the benchmaark we did the following:

1. Install the dependencies that can work together.
2. Merging the base model with the used LoRA adapter. Unsloth saves the model after the fine tuning as a LoRA adapter (this kind of model does not use files like `config.json` file), however, `lm-evaluation-harness` need a usual hugging face model (this is, a model with the proper configuration file). we follow the following steps:
  1. Thats why we need to load the base model, in this case the `unsloth/gemma-3-4b-it-unsloth-bnb-4bit` model.
  2. We need to save the model in a _hugging face format_ copying the tied weights manually to ensure consistency.
  3. Save the base model locally.
  4. Load the model as a `AutoModelForCausalLM` instance from the local presaved model.
  5. Load the adapters using a `Peft` model.
  6. Merge the model and the adapter.
3. Save locally the final merged model that is compatible for `lm-evaluation-harness` and push it to hugging face (you can see the merged model at the [hugging face repository](https://huggingface.co/Jh0mpis/gemma-3-4b-physics-instruct-alpaca-model))
4. Finally, we can run the `lm-eval` command using the hugging face merged model.

<a name="setup-cell"></a>
### Setup cell

In [None]:
first_run = True # DO NOT FORGET TO CHANGE THIS IF enough_gpu IS FALSE AFTER YOU RUN THE FIRST TIME
if run_the_full_setup:
    from transformers import AutoModelForCausalLM, AutoTokenizer
    from peft import PeftModel

    load_and_save_base_model = True
    load_and_save_model_for_merge = True

    if not enough_gpu:
        if first_run:
            load_and_save_model_for_merge = False
        else:
            load_and_save_base_model = False


    # Step 1: Load original model, but untie embeddings
    if load_and_save_base_model:
        model = AutoModelForCausalLM.from_pretrained(
            base_model_hf_path,
            trust_remote_code=True,
            device_map="auto",
            tie_word_embeddings=False
        )

        # Step 2: Copy the tied weights manually to ensure consistency
        model.language_model.lm_head.weight.data = model.language_model.model.embed_tokens.weight.data.clone()

        # Step 3: Save this new base model (optional, for future reuse)
        model.save_pretrained("untied_base_model")
        model.config.save_pretrained("untied_base_model")

    if load_and_save_model_for_merge:
        # Load untied model
        base_model = AutoModelForCausalLM.from_pretrained(
            "untied_base_model",
            trust_remote_code=True,
            device_map="auto"
        )

        # Load adapter
        peft_model = PeftModel.from_pretrained(base_model, LoRA_adapter_hf_path)

        # Merge LoRA adapter
        merged_model = peft_model.merge_and_unload()
        # save the merged model
        merged_model.save_pretrained("final_model", safe_serialization=True)
        # Re-load the tokenizer manually from the base model
        tokenizer = AutoTokenizer.from_pretrained(LoRA_adapter_hf_path)
        # Save tokenizer files to the same directory as the merged model
        tokenizer.save_pretrained("final_model")

### Pushing to hugging face

In [None]:
if pushing_to_hf:
    from huggingface_hub import HfApi

    HfApi().upload_folder(
        folder_path="final_model",
        repo_id=merge_model_hf_path,
        commit_message="Merged base and LoRA adapter"
    )

### Creating a bash script file and running the `lm-eval` command

> 📝 <font color="DodgerBlue"><b>NOTE</b> </font>
>
> <font color="DodgerBlue">If you have your own bash file you can upload it, ignore the next cell and run with it.</font>

In [None]:
tasks_list = "mmlu_stem,piqa"

config_file_content = f'''
lm_eval \
  --model hf \
  --model_args "pretrained={merge_model_hf_path},dtype=float16" \
  --tasks {tasks_list} \
  --num_fewshot 5 \
  --limit 50 \
  --output_path "./benchmark_results.json" \
  --trust_remote_code
'''

with open("run_benchmark.sh", "w") as file:
    file.write(config_file_content)

In [None]:
!bash run_benchmark.sh

2025-05-06 22:49:18.712945: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1746571758.889904    1139 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1746571758.937286    1139 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-05-06 22:49:19.301597: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-05-06:22:49:42,516 INFO     [lm_eval.__main__:368] Passed `--trust_remote_code`, setting environment variable `HF_DATASE