<a href="https://colab.research.google.com/github/0xVolt/whats-up-doc/blob/main/test/notebooks/model-blending/blend.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Merging CodeLLMs to Create an Efficant Low-Memory Quantized Model for `whats-up-doc`

## Download and Install `mergekit`

In [None]:
!git clone https://github.com/cg123/mergekit.git
!cd mergekit && pip install -q -e .

Cloning into 'mergekit'...
remote: Enumerating objects: 2265, done.[K
remote: Counting objects: 100% (1354/1354), done.[K
remote: Compressing objects: 100% (520/520), done.[K
remote: Total 2265 (delta 1081), reused 947 (delta 833), pack-reused 911[K
Receiving objects: 100% (2265/2265), 640.50 KiB | 2.72 MiB/s, done.
Resolving deltas: 100% (1584/1584), done.
  Installing build dependencies ... [?25l[?25hdone
  Checking if build backend supports build_editable ... [?25l[?25hdone
  Getting requirements to build editable ... [?25l[?25hdone


## Create the YAML Config File to Merge Models with SLERP

In [None]:
import os
import yaml
from transformers import AutoModelWithLMHead, AutoTokenizer, pipeline

### Write Config Script

In [None]:
# Set the resultant model's name
MODEL_NAME = 'whats-up-llamas'

MODEL_1 = "codellama/CodeLlama-7b-Instruct-hf"
MODEL_2 = "meta-llama/Meta-Llama-3-8B-Instruct"

OUTPUT_DIR = "merged_models"

LAYERS_MODEL_1 = list(range(0, 32))  # Layer range for MODEL_1
LAYERS_MODEL_2 = list(range(0, 24))  # Layer range for MODEL_2

#### SLERP

In [None]:
yamlConfigSLERPLlamas = f"""
slices:
  - sources:
      - model: {MODEL_1}
        layer_range: {LAYERS_MODEL_1}
      - model: {MODEL_2}
        layer_range: {LAYERS_MODEL_2}
merge_method: slerp
base_model: {MODEL_1}
parameters:
  t:
    - filter: self_attn
      value: [0, 0.5, 0.3, 0.7, 1]
    - filter: mlp
      value: [1, 0.5, 0.7, 0.3, 0]
  # Set the interpolation coefficient (0 for CodeLlama, 1 for Meta-Llama-3)
  slerp_coeff: 0.5
dtype: float16
"""

#### Passthrough

In [None]:
yamlConfigPassthrough = """
slices:
  - sources:
    - model: OpenPipe/mistral-ft-optimized-1218
      layer_range: [0, 32]
  - sources:
    - model: mlabonne/NeuralHermes-2.5-Mistral-7B
      layer_range: [24, 32]
merge_method: passthrough
dtype: bfloat16

"""

*Note: If you were to do this locally, instead of putting in the models' card name under `model`, you would specify the path to the model you downloaded from huggingface.*

### Save Config Script

In [None]:
# Save the YAML configuration to a file
yamlFileName = "config.yaml"
with open(yamlFileName, "w") as f:
    f.write(yamlConfigSLERPLlamas)

## Merge Models

In [None]:
os.system(f"mergekit-yaml {yamlFileName} {OUTPUT_DIR} --allow-crimes --copy-tokenizer --out-shard-size 1B --low-cpu-memory --write-model-card --lazy-unpickle")

## Load the Blended Model

In [None]:
# Step 6: Load the merged model and tokenizer
loadModel = f"{OUTPUT_DIR}/{MODEL_NAME}"

tokenizer = AutoTokenizer.from_pretrained(loadModel)
model = AutoModelWithLMHead.from_pretrained(loadModel)

# Step 7: Run inference on the merged model
generator = pipeline("text-generation", model=model, tokenizer=tokenizer)
input_text = "def calculate_sum(num1, num2):\n    # Code to be generated"
generated_text = generator(input_text, max_length=50, num_return_sequences=1)[0]["generated_text"]
print("Generated Code:", generated_text)