<a href="https://colab.research.google.com/github/0xVolt/whats-up-doc/blob/main/test/notebooks/model-blending/blend.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Merging CodeLLMs to Create an Efficant Low-Memory Quantized Model for `whats-up-doc`

## Download and Install `mergekit`

In [None]:
!git clone https://github.com/cg123/mergekit.git
!cd mergekit && pip install -q -e .

fatal: destination path 'mergekit' already exists and is not an empty directory.
  Installing build dependencies ... [?25l[?25hdone
  Checking if build backend supports build_editable ... [?25l[?25hdone
  Getting requirements to build editable ... [?25l[?25hdone
  Installing backend dependencies ... [?25l[?25hdone
  Preparing editable metadata (pyproject.toml) ... [?25l[?25hdone
  Building editable for mergekit (pyproject.toml) ... [?25l[?25hdone


## Create the YAML Config File to Merge Models with SLERP

In [None]:
import os
import yaml
from transformers import AutoModelWithLMHead, AutoTokenizer, pipeline

### Write Config Script

In [None]:
# Set model name
MODEL_NAME = 'whats-up-llamas'

MODEL_1 = "codellama/CodeLlama-7b-Instruct-hf"
MODEL_2 = "meta-llama/Meta-Llama-3-8B-Instruct"

OUTPUT_DIR = "merged_models"

LAYERS_MODEL_1 = list(range(0, 32))  # Layer range for MODEL_1
LAYERS_MODEL_2 = list(range(0, 24))  # Layer range for MODEL_2

#### SLERP

In [None]:
# Write YAML config string
yamlConfigSLERP = """
slices:
  - sources:
      - model: stabilityai/stable-code-3b
        layer_range: [0, 32]
      - model: codellama/CodeLlama-7b-Instruct-hf
        layer_range: [0, 32]
merge_method: slerp
base_model: codellama/CodeLlama-7b-Instruct-hf
parameters:
  t:
    - filter: self_attn
      value: [0, 0.5, 0.3, 0.7, 1]
    - filter: mlp
      value: [1, 0.5, 0.7, 0.3, 0]
    - value: 0.5
dtype: bfloat16

"""

#### Passthrough

In [None]:
yamlConfigPassthrough = """
slices:
  - sources:
    - model: OpenPipe/mistral-ft-optimized-1218
      layer_range: [0, 32]
  - sources:
    - model: mlabonne/NeuralHermes-2.5-Mistral-7B
      layer_range: [24, 32]
merge_method: passthrough
dtype: bfloat16

"""

*Note: If you were to do this locally, instead of putting in the models' card name under `model`, you would specify the path to the model you downloaded from huggingface.*

### Save Config Script

In [None]:
# Save config string as a YAML file
with open('config.yaml', 'w', encoding="utf-8") as fout:
    fout.write(yamlConfigPassthrough)

## Merge Models

In [None]:
# Merge models
!mergekit-yaml config.yaml merge --copy-tokenizer --allow-crimes --out-shard-size 1B --lazy-unpickle

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
model-00001-of-00002.safetensors:  29% 2.93G/9.94G [00:52<00:57, 122MB/s][A[A[A



model-00002-of-00002.safetensors:  63% 2.86G/4.54G [00:52<00:14, 115MB/s][A[A[A[A


model-00001-of-00002.safetensors:  30% 2.95G/9.94G [00:52<00:58, 119MB/s][A[A[A



model-00002-of-00002.safetensors:  64% 2.88G/4.54G [00:52<00:14, 117MB/s][A[A[A[A



model-00002-of-00002.safetensors:  64% 2.90G/4.54G [00:52<00:13, 123MB/s][A[A[A[A


model-00001-of-00002.safetensors:  30% 2.97G/9.94G [00:52<01:03, 110MB/s][A[A[A



model-00002-of-00002.safetensors:  64% 2.93G/4.54G [00:52<00:13, 116MB/s][A[A[A[A


model-00001-of-00002.safetensors:  30% 2.99G/9.94G [00:52<01:01, 113MB/s][A[A[A


model-00001-of-00002.safetensors:  30% 3.01G/9.94G [00:52<00:56, 124MB/s][A[A[A



model-00002-of-00002.safetensors:  65% 2.95G/4.54G [00:53<00:13, 118MB/s][A[A[A[A



model-00002-of-00002.safetensors:  65% 2.97G/4.54G [00:53<00:12,

In [1]:


# Step 1: Define the models to be merged


# Step 2: Define the layers to be merged from each model


# Step 3: Create the YAML configuration
yaml_config = f"""
slices:
  - sources:
      - model: {model1}
        layer_range: {layers_model1}
      - model: {model2}
        layer_range: {layers_model2}
merge_method: slerp
base_model: {model1}
parameters:
  t:
    - filter: self_attn
      value: [0, 0.5, 0.3, 0.7, 1]
    - filter: mlp
      value: [1, 0.5, 0.7, 0.3, 0]
  # Set the interpolation coefficient (0 for CodeLlama, 1 for Meta-Llama)
  slerp_coeff: 0.5
dtype: float16
"""

# Step 4: Save the YAML configuration to a file
yaml_filename = "merge_config.yaml"
with open(yaml_filename, "w") as f:
    f.write(yaml_config)

# Step 5: Merge the models using the YAML configuration
os.system(f"mergekit-yaml {yaml_filename} {output_folder} --allow-crimes --copy-tokenizer --out-shard-size 1B --low-cpu-memory --write-model-card --lazy-unpickle")

# Step 6: Load the merged model and tokenizer
merged_model_name = f"{output_folder}/merged_model"
tokenizer = AutoTokenizer.from_pretrained(merged_model_name)
model = AutoModelWithLMHead.from_pretrained(merged_model_name)

# Step 7: Run inference on the merged model
generator = pipeline("text-generation", model=model, tokenizer=tokenizer)
input_text = "def calculate_sum(num1, num2):\n    # Code to be generated"
generated_text = generator(input_text, max_length=50, num_return_sequences=1)[0]["generated_text"]
print("Generated Code:", generated_text)

Exception ignored in: <function _xla_gc_callback at 0x7d3143495240>
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/jax/_src/lib/__init__.py", line 98, in _xla_gc_callback
    def _xla_gc_callback(*args):
KeyboardInterrupt: 
ERROR:root:Internal Python error in the inspect module.
Below is the traceback from this internal error.



Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/IPython/core/interactiveshell.py", line 3553, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-1-bd31aece6cc6>", line 3, in <cell line: 3>
    from transformers import AutoModelWithLMHead, AutoTokenizer, pipeline
  File "<frozen importlib._bootstrap>", line 1075, in _handle_fromlist
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1500, in __getattr__
    module = self._get_module(self._class_to_module[name])
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1510, in _get_module
    return importlib.import_module("." + module_name, self.__name__)
  File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/usr/local/lib/python3.10/dist-packages/transformers/pipelines/__init__.py", line 26, in <mo

TypeError: object of type 'NoneType' has no len()