<a href="https://colab.research.google.com/github/0xVolt/whats-up-doc/blob/main/test/notebooks/model-merging.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Merging CodeLLMs to Create an Efficant Low-Memory Quantized Model for `whats-up-doc`

## Download and Install `mergekit`

In [1]:
!git clone https://github.com/cg123/mergekit.git
!cd mergekit && pip install -q -e .

fatal: destination path 'mergekit' already exists and is not an empty directory.
  Installing build dependencies ... [?25l[?25hdone
  Checking if build backend supports build_editable ... [?25l[?25hdone
  Getting requirements to build editable ... [?25l[?25hdone
  Installing backend dependencies ... [?25l[?25hdone
  Preparing editable metadata (pyproject.toml) ... [?25l[?25hdone
  Building editable for mergekit (pyproject.toml) ... [?25l[?25hdone


## Create the YAML Config File to Merge Models with SLERP

In [2]:
import yaml

### Write Config Script

In [3]:
# Set model name
MODEL_NAME = 'whats-up-llamas'

#### SLERP

In [4]:
# Write YAML config string
yamlConfigSLERP = """
slices:
  - sources:
      - model: stabilityai/stable-code-3b
        layer_range: [0, 32]
      - model: codellama/CodeLlama-7b-Instruct-hf
        layer_range: [0, 32]
merge_method: slerp
base_model: codellama/CodeLlama-7b-Instruct-hf
parameters:
  t:
    - filter: self_attn
      value: [0, 0.5, 0.3, 0.7, 1]
    - filter: mlp
      value: [1, 0.5, 0.7, 0.3, 0]
    - value: 0.5
dtype: bfloat16

"""

#### Passthrough

In [8]:
yamlConfigPassthrough = """
slices:
  - sources:
    - model: OpenPipe/mistral-ft-optimized-1218
      layer_range: [0, 32]
  - sources:
    - model: mlabonne/NeuralHermes-2.5-Mistral-7B
      layer_range: [24, 32]
merge_method: passthrough
dtype: bfloat16

"""

*Note: If you were to do this locally, instead of putting in the models' card name under `model`, you would specify the path to the model you downloaded from huggingface.*

### Save Config Script

In [11]:
# Save config string as a YAML file
with open('config.yaml', 'w', encoding="utf-8") as fout:
    fout.write(yamlConfigPassthrough)

## Merge Models

In [12]:
# Merge models
!mergekit-yaml config.yaml merge --copy-tokenizer --allow-crimes --out-shard-size 1B --lazy-unpickle

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
model-00001-of-00002.safetensors:  29% 2.93G/9.94G [00:52<00:57, 122MB/s][A[A[A



model-00002-of-00002.safetensors:  63% 2.86G/4.54G [00:52<00:14, 115MB/s][A[A[A[A


model-00001-of-00002.safetensors:  30% 2.95G/9.94G [00:52<00:58, 119MB/s][A[A[A



model-00002-of-00002.safetensors:  64% 2.88G/4.54G [00:52<00:14, 117MB/s][A[A[A[A



model-00002-of-00002.safetensors:  64% 2.90G/4.54G [00:52<00:13, 123MB/s][A[A[A[A


model-00001-of-00002.safetensors:  30% 2.97G/9.94G [00:52<01:03, 110MB/s][A[A[A



model-00002-of-00002.safetensors:  64% 2.93G/4.54G [00:52<00:13, 116MB/s][A[A[A[A


model-00001-of-00002.safetensors:  30% 2.99G/9.94G [00:52<01:01, 113MB/s][A[A[A


model-00001-of-00002.safetensors:  30% 3.01G/9.94G [00:52<00:56, 124MB/s][A[A[A



model-00002-of-00002.safetensors:  65% 2.95G/4.54G [00:53<00:13, 118MB/s][A[A[A[A



model-00002-of-00002.safetensors:  65% 2.97G/4.54G [00:53<00:12,