This notebook shows how to merge LLMs with mergekit using the TIES methods. I used it to create [kaitchup/Mayonnaise-4in1-022](https://huggingface.co/kaitchup/Mayonnaise-4in1-022) which ranked first for some time on the Open LLM leaderboard (7B models).

I used the A100 GPU to speed up the merge but it would also work fine with a CPU.

First, we need to install mergekit from source:

In [None]:
!git clone https://github.com/cg123/mergekit.git
!cd mergekit && pip install -e .

Cloning into 'mergekit'...
remote: Enumerating objects: 1033, done.[K
remote: Counting objects: 100% (573/573), done.[K
remote: Compressing objects: 100% (217/217), done.[K
remote: Total 1033 (delta 458), reused 397 (delta 351), pack-reused 460[K
Receiving objects: 100% (1033/1033), 270.47 KiB | 1.03 MiB/s, done.
Resolving deltas: 100% (698/698), done.
Obtaining file:///content/mergekit
  Installing build dependencies ... [?25l[?25hdone
  Checking if build backend supports build_editable ... [?25l[?25hdone
  Getting requirements to build editable ... [?25l[?25hdone
  Installing backend dependencies ... [?25l[?25hdone
  Preparing editable metadata (pyproject.toml) ... [?25l[?25hdone
Collecting accelerate==0.25.0 (from mergekit==0.0.4)
  Downloading accelerate-0.25.0-py3-none-any.whl (265 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m265.7/265.7 kB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting pydantic==2.5.3 (from mergekit==0.0.4)
  Down

Simply write a configuration file as follows:

In [None]:
merge_config = """
models:
  - model: mncai/mistral-7b-dpo-v5
    # no parameters necessary for base model
  - model: flemmingmiguel/MBX-7B
    parameters:
      density: 0.5
      weight: 0.5
  - model: BarryFutureman/NeuralTurdusVariant1-7B
    parameters:
      density: 0.5
      weight: 0.3
merge_method: ties
base_model: mncai/mistral-7b-dpo-v5
parameters:
  normalize: true
dtype: float16
"""

with open('config.yaml', 'w') as f:
    f.write(merge_config)

Then run mergekit with the following arguments.
Remove "--cuda" if you want to do it with a CPU.
Change "merge" if you want to change the name of the directory that will contain the result.

In [None]:
!mergekit-yaml config.yaml merge --cuda --copy-tokenizer --allow-crimes --out-shard-size 9B --lazy-unpickle

[1;30;43mLe flux de sortie a été tronqué et ne contient que les 5000 dernières lignes.[0m
model-00007-of-00008.safetensors:  21% 409M/1.98G [00:04<00:34, 45.8MB/s][A[A[A[A[A[A




model-00003-of-00008.safetensors:  20% 398M/1.95G [00:04<00:34, 45.1MB/s][A[A[A[A[A






model-00004-of-00008.safetensors:  21% 409M/1.95G [00:04<00:30, 49.8MB/s][A[A[A[A[A[A[A








model-00008-of-00008.safetensors:  47% 336M/713M [00:03<00:07, 53.0MB/s][A[A[A[A[A[A[A[A[A



model-00006-of-00008.safetensors:  26% 503M/1.92G [00:04<00:29, 47.5MB/s][A[A[A[A


model-00002-of-00008.safetensors:   3% 62.9M/1.99G [00:04<02:18, 13.9MB/s][A[A[A

model-00001-of-00008.safetensors:   3% 62.9M/2.00G [00:04<02:08, 15.0MB/s][A[A




model-00003-of-00008.safetensors:  22% 419M/1.95G [00:04<00:24, 63.2MB/s][A[A[A[A[A






model-00004-of-00008.safetensors:  22% 430M/1.95G [00:04<00:30, 49.9MB/s][A[A[A[A[A[A[A







model-00005-of-00008.safetensors:  22% 440M/1.99G [00:04

The following is the best merge (according to the Open LLM leaderboard) that I did:

In [None]:
merge_config = """
models:
  - model: mncai/mistral-7b-dpo-v5
    # no parameters necessary for base model
  - model: FelixChao/WestSeverus-7B-DPO-v2
    parameters:
      density: 0.5
      weight: 0.3
  - model: BarryFutureman/NeuralTurdusVariant1-7B
    parameters:
      density: 0.5
      weight: 0.5
merge_method: ties
base_model: mncai/mistral-7b-dpo-v5
parameters:
  normalize: true
dtype: float16
"""

with open('config.yaml', 'w') as f:
    f.write(merge_config)

In [None]:
!mergekit-yaml config.yaml merge5 --cuda --copy-tokenizer --allow-crimes --out-shard-size 9B --lazy-unpickle

config.json: 100% 695/695 [00:00<00:00, 3.99MB/s]
Warmup loader cache:   0% 0/3 [00:00<?, ?it/s]
Fetching 10 files:   0% 0/10 [00:00<?, ?it/s][A

generation_config.json: 100% 132/132 [00:00<00:00, 628kB/s]

Fetching 10 files:  20% 2/10 [00:00<00:03,  2.58it/s][A

tokenizer.json:   0% 0.00/1.80M [00:00<?, ?B/s][A[A


special_tokens_map.json: 100% 625/625 [00:00<00:00, 4.54MB/s]



model-00001-of-00003.safetensors:   0% 0.00/4.94G [00:00<?, ?B/s][A[A[A



model.safetensors.index.json: 100% 23.9k/23.9k [00:00<00:00, 75.6MB/s]



model-00001-of-00003.safetensors:   1% 31.5M/4.94G [00:00<00:16, 297MB/s][A[A[A



tokenizer.model: 100% 493k/493k [00:00<00:00, 88.7MB/s]




model-00003-of-00003.safetensors:   0% 0.00/4.54G [00:00<?, ?B/s][A[A[A[A




model-00002-of-00003.safetensors:   0% 0.00/5.00G [00:00<?, ?B/s][A[A[A[A[A


model-00001-of-00003.safetensors:   2% 83.9M/4.94G [00:00<00:13, 351MB/s][A[A[A



model-00003-of-00003.safetensors:   1% 41.9M/4.54G [00:00<00:13,