<a href="https://colab.research.google.com/github/dhrits/LLM-Engineering-Foundations-to-SLMs/blob/main/09_Alignment_II/Model_Merging_AI_Makerspace_Assignment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Let's Merge our Model!

We have a "Riddle" model, and a "LegalEasy" model - both of which are good at their respective tasks:

1. Speaking in Riddles
2. Translating from legalese to English

What if we want a model that's good at...both of those things at once.

We can use the handy `mergekit-gui` Hugging Face Space (found [here](https://huggingface.co/spaces/arcee-ai/mergekit-gui?ref=blog.arcee.ai)) to merge our model - this was is `config.yaml`:

```yaml
models:
  - model: NousResearch/Meta-Llama-3-8B-Instruct
  - model: ai-maker-space/leagaleasy-llama-3-instruct-v2
    parameters:
      density: 0.5
      weight: 0.5
  - model: ai-maker-space/riddle-bot-v1
    parameters:
      density: 0.5
      weight: 0.5

merge_method: ties
base_model: NousResearch/Meta-Llama-3-8B-Instruct
parameters:
  normalize: false
  int8_mask: true
dtype: float16
```

You can see that this model was merged using the `ties` method.

- [TIES-Merging Paper](https://arxiv.org/abs/2306.01708)
- [TIES README.md Reference](https://github.com/arcee-ai/mergekit?tab=readme-ov-file#ties)


##### ❓ Question #1:

Describe what is happening in the TIES method of merging in your own words.

TIES - Trim, Elect and Merge is a 3 step model parameter merging technique. First any redundant parameters are pruned. Thereafter, parameters with conflicting signs are resolved into an aggregate task vector. Finally, parameters whose signs match those of the aggregated vectors signs are averaged. This [visual](https://developer-blogs.nvidia.com/wp-content/uploads/2024/10/ties-process-diagram.png) describes this process best.

##### ❓ Question #2:

Discuss when Model Merging would be most useful to an organization!

TIES ends up being a good default which can bring together many models using a "sane" default weight merging strategy. Although SLERP might be simpler to implement.

# Testing the Merged Model

Now that we've merged our model, let's test it!

## Gather Dependencies and HF Token

In [1]:
!pip install -qU transformers peft trl accelerate bitsandbytes datasets

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.1/44.1 kB[0m [31m1.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.1/10.1 MB[0m [31m75.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m293.4/293.4 kB[0m [31m25.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m69.1/69.1 MB[0m [31m29.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m480.6/480.6 kB[0m [31m32.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m10.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m179.3/179.3 kB[0m [31m16.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m12.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━

In [2]:
from huggingface_hub import notebook_login

notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

### Data Collection for Testing

In [3]:
!git clone https://github.com/lauramanor/legal_summarization.git

Cloning into 'legal_summarization'...
remote: Enumerating objects: 31, done.[K
remote: Counting objects: 100% (6/6), done.[K
remote: Compressing objects: 100% (6/6), done.[K
remote: Total 31 (delta 2), reused 0 (delta 0), pack-reused 25 (from 1)[K
Receiving objects: 100% (31/31), 136.60 KiB | 2.63 MiB/s, done.
Resolving deltas: 100% (10/10), done.


In [57]:
import json

jsonl_array = []

with open('legal_summarization/tldrlegal_v1.json') as f:
  data = json.load(f)
  for key, value in data.items():
    jsonl_array.append(value)

In [58]:
from datasets import Dataset, load_dataset

legal_dataset = Dataset.from_list(jsonl_array)

In [59]:
legal_dataset

Dataset({
    features: ['doc', 'id', 'original_text', 'reference_summary', 'title', 'uid'],
    num_rows: 85
})

In [60]:
prepared_legal_dataset = legal_dataset.train_test_split(test_size=0.1)

In [61]:
from datasets import load_dataset

riddle_dataset = load_dataset("riddle_sense")

In [9]:
riddle_dataset

DatasetDict({
    train: Dataset({
        features: ['answerKey', 'question', 'choices'],
        num_rows: 3510
    })
    validation: Dataset({
        features: ['answerKey', 'question', 'choices'],
        num_rows: 1021
    })
    test: Dataset({
        features: ['answerKey', 'question', 'choices'],
        num_rows: 1184
    })
})

#### Instruction Templates for Each Task

We'll want to provide the training template for our tasks so we can see the desired behaviour.

In [62]:
RIDDLE_PROMPT_TEMPLATE = """\
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Please answer the following multiple-choice riddle by selecting the correct choice.<|eot_id|><|start_header_id|>user<|end_header_id|>

{question}\n\nA: {choice_a}\nB: {choice_b}\nC: {choice_c}\nD: {choice_d}\nE: {choice_e}\n<|eot_id|><|start_header_id|>assistant<|end_header_id|>"""

RIDDLE_RESPONSE_TEMPLATE = """\
{answer}<|eot_id|><|end_of_text|>"""

In [63]:
SUMMARIZE_PROMPT_TEMPLATE = """\
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Please convert the following legal content into a human-readable summary<|eot_id|><|start_header_id|>user<|end_header_id|>

[LEGAL_DOC]{LEGAL_TEXT}[END_LEGAL_DOC]<|eot_id|><|start_header_id|>assistant<|end_header_id|>"""

SUMMARIZE_RESPONSE_TEMPLATE = """\
{NATURAL_LANGUAGE_SUMMARY}<|eot_id|><|end_of_text|>"""

Now we can create a helper function that will convert our dataset row into the above prompt!

In [64]:
def create_instruction(sample, return_response=True):
  prompt = SUMMARIZE_PROMPT_TEMPLATE.format(LEGAL_TEXT=sample["original_text"])

  if return_response:
    prompt += SUMMARIZE_RESPONSE_TEMPLATE.format(NATURAL_LANGUAGE_SUMMARY=sample["reference_summary"])

  return prompt

In [65]:
def create_riddle_instruction(sample, return_response=True):
  prompt = RIDDLE_PROMPT_TEMPLATE.format(
      question=sample["question"],
      choice_a=sample["choices"]["text"][0],
      choice_b=sample["choices"]["text"][1],
      choice_c=sample["choices"]["text"][2],
      choice_d=sample["choices"]["text"][3],
      choice_e=sample["choices"]["text"][4],
  )

  if return_response:
    prompt += RIDDLE_RESPONSE_TEMPLATE.format(answer=sample["answerKey"])

  return prompt

## Baselining Meta-Llama-3-8B-Instruct on Tasks

Now that we have some test data, and the original instruction templates for each task - we can

In [67]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

model_id = "NousResearch/Meta-Llama-3-8B-Instruct"

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
)

base_model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype=torch.float16,
    quantization_config=quantization_config,
)

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

In [68]:
base_tokenizer = AutoTokenizer.from_pretrained(model_id)

In [69]:
from transformers import pipeline

base_model_pipe = pipeline("text-generation", base_model, tokenizer=base_tokenizer, max_new_tokens=256, return_full_text=False)

Device set to use cuda:0


## Riddle Baseline

Let's see how the base model performs at the riddle task!

In [70]:
riddle_dataset["test"][0]["question"]

'My life can be measured in hours. I serve by being devoured. Thin, I am quick. Fat, I am slow. Wind is my foe. What am I?'

In [71]:
for text, label in zip(riddle_dataset["test"][0]["choices"]["text"], riddle_dataset["test"][0]["choices"]["label"]):
  print(f"{label} : {text}")

A : paper
B : candle
C : lamp
D : clock
E : worm


In [72]:
outputs = base_model_pipe(create_riddle_instruction(riddle_dataset["test"][0], return_response=False), do_sample=True, max_new_tokens=256, temperature=0.1, top_k=50)

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


In [73]:
outputs[0]["generated_text"]

'\n\nThe correct answer is B: candle.\n\nHere\'s how the description fits:\n\n* "My life can be measured in hours": A candle\'s life can be measured in hours, as it burns for a certain amount of time.\n* "I serve by being devoured": A candle serves by providing light, and it is "devoured" (burned) to do so.\n* "Thin, I am quick": A thin candle burns quickly.\n* "Fat, I am slow": A fat or large candle burns slowly.\n* "Wind is my foe": Wind can extinguish a candle\'s flame, making it a foe to its continued burning.'

Unfortunately, as we can see, the model still gives an very verbose (albeit correct) answer.

## Summary Baseline:

Let's see how our base model performs on our summarization task!

In [74]:
prepared_legal_dataset["test"][1]["original_text"]

'we welcome feedback comments and suggestions for improvements to the services feedback. you can submit feedback by reaching out to us on facebook twitter or google. you grant to us a nonexclusive worldwide perpetual irrevocable fully paid royalty free sublicensable and transferable license under any and all intellectual property rights that you own or control to use copy modify create derivative works based upon and otherwise exploit the feedback for any purpose.'

In [75]:
outputs = base_model_pipe(create_instruction(prepared_legal_dataset["test"][1], return_response=False), do_sample=True, max_new_tokens=256, temperature=0.1, top_k=50)

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


In [76]:
outputs[0]["generated_text"].split("[LEGAL_DOC]")[-1]

'\n\nHere\'s a human-readable summary of the legal content:\n\n"We value your feedback and suggestions for improving our services. If you have any comments or ideas, you can share them with us on Facebook, Twitter, or Google. By sharing your feedback, you\'re giving us permission to use, modify, and share it in any way we see fit, without needing to ask for your permission or pay you any royalties. This means we can use your feedback to improve our services and make them better for everyone.'

In [77]:
prepared_legal_dataset["test"][0]["reference_summary"]

'there are no warranties and we are not liable for anything bad that happens when using youtube.'

While this is an effective description of the reference text - the summary itself is rather verbose and doesn't match the simple language of the training set.

Let's free up some resources so we can load the merged model!

In [78]:
del base_model_pipe
del base_model
torch.cuda.empty_cache()

## Loading Merged Model

Loading our merged model is simple, as we've uploaded it to the hub already!

In [79]:
model_id = "c-s-ale/RiddleLegalEasy"

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype=torch.float16,
    quantization_config=quantization_config,
)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [80]:

tokenizer = AutoTokenizer.from_pretrained(model_id)

In [81]:
merged_model_pipe = pipeline("text-generation", model, tokenizer=tokenizer, max_new_tokens=256, return_full_text=False)

Device set to use cuda:0


## Riddle Test

Let's see if our model "remembers" what we trained it on when it comes to answering riddles!

In [82]:
riddle_dataset["test"][0]["question"]

'My life can be measured in hours. I serve by being devoured. Thin, I am quick. Fat, I am slow. Wind is my foe. What am I?'

In [83]:
for text, label in zip(riddle_dataset["test"][0]["choices"]["text"], riddle_dataset["test"][0]["choices"]["label"]):
  print(f"{label} : {text}")

A : paper
B : candle
C : lamp
D : clock
E : worm


In [84]:
outputs = merged_model_pipe(create_riddle_instruction(riddle_dataset["test"][0], return_response=False), do_sample=True, max_new_tokens=256, temperature=0.1, top_k=50)

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


In [85]:
outputs[0]["generated_text"]

'B'

Absolutely amazing! It remembers to only output the letter - exactly as we fine-tuned it to do!

In [86]:
prepared_legal_dataset["test"][1]["original_text"]

'we welcome feedback comments and suggestions for improvements to the services feedback. you can submit feedback by reaching out to us on facebook twitter or google. you grant to us a nonexclusive worldwide perpetual irrevocable fully paid royalty free sublicensable and transferable license under any and all intellectual property rights that you own or control to use copy modify create derivative works based upon and otherwise exploit the feedback for any purpose.'

In [87]:
outputs = merged_model_pipe(create_instruction(prepared_legal_dataset["test"][1], return_response=False), do_sample=True, max_new_tokens=256, temperature=0.1, top_k=50)

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


In [88]:
outputs[0]["generated_text"].split("[LEGAL_DOC]")[-1]

'Here is a human-readable summary of the legal content:\n\nWe appreciate your feedback and suggestions for improving our services. You can share your thoughts with us on Facebook, Twitter, or Google. By submitting your feedback, you give us permission to use it in any way we see fit, without paying you or asking for your permission.'

In [89]:
prepared_legal_dataset["test"][1]["reference_summary"]

'give us feedback and we can do whatever we want with it.'

Not only is this response more of a summary, but it's also more in line with the casual language of the training set!