<a href="https://colab.research.google.com/github/0xVolt/whats-up-doc/blob/main/test/notebooks/model-blending/Llama-3-8B-CodeLlama-7b-Instruct-hf-passthrough.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Merging CodeLLMs to Create an Efficant Low-Memory Quantized Model for `whats-up-doc`

## Download and Install `mergekit`

In [1]:
import os

dirName = "mergekit"
cwd = os.getcwd()

concatDirPath = os.path.join(cwd, dirName)

if not os.path.exists(concatDirPath):
    !git clone https://github.com/cg123/mergekit.git
    !cd mergekit && pip install -q -e .

Cloning into 'mergekit'...
remote: Enumerating objects: 2265, done.[K
remote: Counting objects: 100% (1354/1354), done.[K
remote: Compressing objects: 100% (520/520), done.[K
remote: Total 2265 (delta 1081), reused 947 (delta 833), pack-reused 911[K
Receiving objects: 100% (2265/2265), 640.50 KiB | 18.84 MiB/s, done.
Resolving deltas: 100% (1584/1584), done.
  Installing build dependencies ... [?25l[?25hdone
  Checking if build backend supports build_editable ... [?25l[?25hdone
  Getting requirements to build editable ... [?25l[?25hdone
  Installing backend dependencies ... [?25l[?25hdone
  Preparing editable metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m78.3/78.3 kB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m280.0/280.0 kB[0m [31m17.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m394.9/394.9 kB[0

## Create the YAML Config File to Merge Models with SLERP

In [2]:
import os
import yaml
from transformers import AutoModelWithLMHead, AutoTokenizer, pipeline

In [3]:
%pip install huggingface-cli

Collecting huggingface-cli
  Downloading huggingface_cli-0.1-py3-none-any.whl (1.0 kB)
Installing collected packages: huggingface-cli
Successfully installed huggingface-cli-0.1


In [6]:
!huggingface-cli login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    A token is already saved on your machine. Run `huggingface-cli whoami` to get more information or `huggingface-cli logout` if you want to log out.
    Setting a new token will erase the existing one.
    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Token: Traceback (most recent call last):
  File "/usr/local/bin/huggingface-cli", line 8, in <module>
    sys.exit(ma

### Write Config Script

In [8]:
# Set the resultant model's name
MODEL_NAME = 'whats-up-llamas'

MODEL_1 = "codellama/CodeLlama-7b-Instruct-hf"
MODEL_2 = "meta-llama/Meta-Llama-3-8B-Instruct"

OUTPUT_DIR = "merged_model"

#### SLERP

In [None]:
yamlConfigSLERPLlamas = f"""
slices:
  - sources:
      - model: {MODEL_1}
        layer_range: [0, 32]
      - model: {MODEL_2}
        layer_range: [0, 32]
merge_method: slerp
base_model: {MODEL_1}
parameters:
  t:
    - filter: self_attn
      value: [0, 0.5, 0.3, 0.7, 1]
    - filter: mlp
      value: [1, 0.5, 0.7, 0.3, 0]
    - value: 0.5
dtype: float32
"""

#### Passthrough

In [9]:
yamlConfigPassthrough = f"""
slices:
  - sources:
    - model: {MODEL_1}
      layer_range: [0, 32]
  - sources:
    - model: {MODEL_2}
      layer_range: [24, 32]
merge_method: passthrough
dtype: bfloat16

"""

*Note: If you were to do this locally, instead of putting in the models' card name under `model`, you would specify the path to the model you downloaded from huggingface.*

#### DARE-TIES

In [None]:
yamlConfigDARETIESLlamas = f"""
models:
    # No parameters necessary for base model
  - model: {MODEL_1}
  - model: {MODEL_2}
    parameters:
      density: 0.53
      weight: 0.4
merge_method: dare_ties
base_model: {MODEL_1}
parameters:
  int8_mask: true
dtype: bfloat16

"""

### Save Config Script

In [10]:
# Save the YAML configuration to a file
yamlFileName = "config.yaml"
with open(yamlFileName, "w") as f:
    f.write(yamlConfigPassthrough)

## Merge Models

In [11]:
cmd = f"mergekit-yaml {yamlFileName} {OUTPUT_DIR} --allow-crimes --copy-tokenizer --out-shard-size 1B --low-cpu-memory --write-model-card --lazy-unpickle"
!{cmd}

[1;30;43mStreaming output truncated to the last 5000 lines.[0m

model-00001-of-00002.safetensors:  33% 3.33G/9.98G [00:46<08:42, 12.7MB/s][A[A[A



model-00002-of-00002.safetensors:  87% 3.04G/3.50G [00:46<00:42, 10.9MB/s][A[A[A[A


model-00001-of-00002.safetensors:  34% 3.37G/9.98G [00:46<05:14, 21.0MB/s][A[A[A



model-00002-of-00002.safetensors:  87% 3.06G/3.50G [00:46<00:29, 15.0MB/s][A[A[A[A



model-00002-of-00002.safetensors:  88% 3.08G/3.50G [00:47<00:20, 20.6MB/s][A[A[A[A


model-00001-of-00002.safetensors:  34% 3.40G/9.98G [00:47<03:27, 31.7MB/s][A[A[A


model-00001-of-00002.safetensors:  34% 3.42G/9.98G [00:47<02:43, 40.2MB/s][A[A[A



model-00002-of-00002.safetensors:  89% 3.10G/3.50G [00:47<00:14, 27.6MB/s][A[A[A[A



model-00002-of-00002.safetensors:  89% 3.12G/3.50G [00:47<00:11, 33.3MB/s][A[A[A[A


model-00001-of-00002.safetensors:  34% 3.44G/9.98G [00:47<02:26, 44.6MB/s][A[A[A



model-00002-of-00002.safetensors:  90% 3.15G/3.50G [0