


# Merging Two Models
This notebook uses [mergekit](https://github.com/cg123/mergekit)
Apache licence and adding a readme code before uploading to hf from the implementation by [Maxime Labonne](https://mlabonne.github.io/blog/posts/2024-01-08_Merge_LLMs_with_mergekit.html)


Update the YAML config with the models you want to merge. Currently it only works with models that share the same base
This uses the SLERP method and this only allows merging of only 2 models. Some fo the other techniques allow merging more than 2 models

In [1]:
MODEL_NAME = "dhyay/MistralCode-7B-Instruct-v0.2-slerp"
yaml_config = """
slices:
  - sources:
      - model: mistralai/Mistral-7B-Instruct-v0.2
        layer_range: [0, 32]
      - model: dhyay/MistralCode-7B-Instruct-v0.2-slerp
        layer_range: [0, 32]
merge_method: slerp
base_model: mistralai/Mistral-7B-Instruct-v0.22
parameters:
  t:
    - filter: self_attn
      value: [0, 0.5, 0.3, 0.7, 1]
    - filter: mlp
      value: [1, 0.5, 0.7, 0.3, 0]
    - value: 0.5
dtype: bfloat16
"""

In [2]:
trust_remote_code = False # Make thius true if merging phi models (might get fixed later though)

!git clone https://github.com/arcee-ai/mergekit.git
!cd mergekit && pip install -qqq -e . --progress-bar off

# Save config as yaml file
with open('config.yaml', 'w', encoding="utf-8") as f:
    f.write(yaml_config)


cli = "mergekit-yaml config.yaml merge --copy-tokenizer"

cli += " --allow-crimes --out-shard-size 1B --lazy-unpickle"
print(cli)

# Merge models
!{cli}

[1;30;43mStreaming output truncated to the last 5000 lines.[0m



model-00002-of-00008.safetensors:  93% 1.80G/1.95G [01:24<00:05, 26.2MB/s][A[A[A[A




model-00004-of-00008.safetensors:  92% 1.81G/1.98G [01:24<00:06, 25.9MB/s][A[A[A[A[A






model-00006-of-00008.safetensors:  89% 1.72G/1.92G [01:22<00:07, 26.6MB/s][A[A[A[A[A[A[A





model-00003-of-00008.safetensors:  93% 1.84G/1.97G [01:24<00:05, 26.2MB/s][A[A[A[A[A[A


model-00005-of-00008.safetensors:  93% 1.80G/1.95G [01:24<00:05, 26.4MB/s][A[A[A







model-00007-of-00008.safetensors:  89% 1.74G/1.95G [01:22<00:07, 26.9MB/s][A[A[A[A[A[A[A[A








model-00001-of-00008.safetensors:  90% 1.77G/1.98G [01:22<00:07, 26.7MB/s][A[A[A[A[A[A[A[A[A



model-00002-of-00008.safetensors:  93% 1.81G/1.95G [01:25<00:05, 26.2MB/s][A[A[A[A




model-00004-of-00008.safetensors:  92% 1.82G/1.98G [01:25<00:05, 26.1MB/s][A[A[A[A[A






model-00006-of-00008.safetensors:  90% 1.73G/1.92G [01:22<0

## Uploading to huggingface
#### Used Maximme Labonnes code below to make this have an Apache 2.0 licence so it can be used by anyone 
creates a readme and adds a licenece

In [3]:
username = 'dhyay' 
token = '#Add your own taken' 
license = "apache-2.0"

!pip install -qU huggingface_hub

import yaml
from huggingface_hub import ModelCard, ModelCardData, HfApi
from google.colab import userdata
from jinja2 import Template

if branch == "main":
    template_text = """
---
license: {{ license }}
base_model:
{%- for model in models %}
  - {{ model }}
{%- endfor %}
tags:
- merge
- mergekit
- lazymergekit
{%- for model in models %}
- {{ model }}
{%- endfor %}
---

# {{ model_name }}

{{ model_name }} is a merge of the following models using [LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing):

{%- for model in models %}
* [{{ model }}](https://huggingface.co/{{ model }})
{%- endfor %}

## 🧩 Configuration

```yaml
{{- yaml_config -}}
```

## 💻 Usage

```python
!pip install -qU transformers accelerate

from transformers import AutoTokenizer
import transformers
import torch

model = "{{ username }}/{{ model_name }}"
messages = [{"role": "user", "content": "What is a large language model?"}]

tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)

outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
```
"""

    # Create a Jinja template object
    jinja_template = Template(template_text.strip())

    # Get list of models from config
    data = yaml.safe_load(yaml_config)
    if "models" in data:
        models = [data["models"][i]["model"] for i in range(len(data["models"])) if "parameters" in data["models"][i]]
    elif "parameters" in data:
        models = [data["slices"][0]["sources"][i]["model"] for i in range(len(data["slices"][0]["sources"]))]
    elif "slices" in data:
        models = [data["slices"][i]["sources"][0]["model"] for i in range(len(data["slices"]))]
    else:
        raise Exception("No models or slices found in yaml config")

    # Fill the template
    content = jinja_template.render(
        model_name=MODEL_NAME,
        models=models,
        yaml_config=yaml_config,
        username=username,
    )

elif branch == "mixtral":
    template_text = """
---
license: {{ license }}
base_model:
{%- for model in models %}
  - {{ model }}
{%- endfor %}
tags:
- moe
- frankenmoe
- merge
- mergekit
- lazymergekit
{%- for model in models %}
- {{ model }}
{%- endfor %}
---

# {{ model_name }}

{{ model_name }} is a Mixture of Experts (MoE) made with the following models using [LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing):

{%- for model in models %}
* [{{ model }}](https://huggingface.co/{{ model }})
{%- endfor %}

## 🧩 Configuration

```yaml
{{- yaml_config -}}
```

## 💻 Usage

```python
!pip install -qU transformers bitsandbytes accelerate

from transformers import AutoTokenizer
import transformers
import torch

model = "{{ username }}/{{ model_name }}"

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    model_kwargs={"torch_dtype": torch.float16, "load_in_4bit": True},
)

messages = [{"role": "user", "content": "Explain what a Mixture of Experts is in less than 100 words."}]
prompt = pipeline.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
```
"""

    # Create a Jinja template object
    jinja_template = Template(template_text.strip())

    # Fill the template
    data = yaml.safe_load(yaml_config)
    models = [model['source_model'] for model in data['experts']]

    content = jinja_template.render(
        model_name=MODEL_NAME,
        models=models,
        yaml_config=yaml_config,
        username=username,
        license=license
    )

# Save the model card
card = ModelCard(content)
card.save('merge/README.md')

api = HfApi(token=userdata.get(token))

# Upload merge folder
api.create_repo(
    repo_id=f"{username}/{MODEL_NAME}",
    repo_type="model",
    exist_ok=True,
)
api.upload_folder(
    repo_id=f"{username}/{MODEL_NAME}",
    folder_path="merge",
)

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/388.9 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m102.4/388.9 kB[0m [31m2.9 MB/s[0m eta [36m0:00:01[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━[0m [32m378.9/388.9 kB[0m [31m6.0 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m388.9/388.9 kB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0m
[?25h

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


model-00002-of-00008.safetensors:   0%|          | 0.00/2.00G [00:00<?, ?B/s]

model-00005-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

model-00003-of-00008.safetensors:   0%|          | 0.00/1.92G [00:00<?, ?B/s]

model-00001-of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

Upload 9 LFS files:   0%|          | 0/9 [00:00<?, ?it/s]

model-00004-of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

model-00006-of-00008.safetensors:   0%|          | 0.00/1.92G [00:00<?, ?B/s]

model-00007-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

model-00008-of-00008.safetensors:   0%|          | 0.00/789M [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/dhyay/Dolphin-MistralCode-v0.2.1-slerp/commit/fe838743c933daadeb8d76e7023887ec02c7afc8', commit_message='Upload folder using huggingface_hub', commit_description='', oid='fe838743c933daadeb8d76e7023887ec02c7afc8', pr_url=None, pr_revision=None, pr_num=None)