###*Welcome to Spectrum Guided Model Merging!*

If you are running tests - please use GPU (cuda) because otherwise it's really slow :)

In [None]:
from google.colab import drive
drive.mount('/content/drive')


In [None]:
%cd /content/drive/MyDrive
!git clone https://github.com/Anjaas85/Spectrum-Guided-Model-Merging.git
%cd Spectrum-Guided-Model-Merging/

In [None]:
!pip install -r requirements.txt
!pip install -e .

Since we intend to use MergeKit following change is needed. Namely, according to LLMs, there is a error in library that has to be replaced.  

In [None]:
#we can use grep to control if it's changed
!sudo sed -i 's/\[w_in.name\] + (w_in.aliases or \[\])/[w_in.name] + list(w_in.aliases or [])/g' /usr/local/lib/python3.12/dist-packages/mergekit/plan.py

We need to prepare models to e merged, given that we are experimenting on classification tasks that are having different number of classes it is not possible to merge heads. Therefore, we are going to keep beheaded models and perserve separate heads.
Scrypts are in ./models directory, together with new class for multi head model, which we will be exsaustively using.

In [None]:
%cd models/
!python3 behead.py
!python3 tokenizer.py
%cd ..

All of the following tests and examples, we are going to show on TIES mergining for 50% of SNR most significant layers.\n
\n
Let's start from simple merge and testing of one merged model.

In [None]:
!mergekit-yaml configs/merges/bert_ties_template_snr50.yaml ./experiments/merges/bert_ties_template_snr50_exp --cuda

Now our model is saved and we can use our UnifiedMultiTaskModel, add classfication heads, and test it.

In [None]:
import torch
import torch.nn as nn
from transformers import AutoModel, AutoConfig, AutoModelForSequenceClassification
from src.test_model import evaluate_model
from models.unifiedMultiTaskModel import UnifiedMultiTaskModel

merged_backbone_path = "./experiments/merges/bert_ties_template_snr50_exp"
multi_model = UnifiedMultiTaskModel(merged_backbone_path)

#saved different heads
multi_model.add_task_head("sst2", "textattack/bert-base-uncased-SST-2" )
multi_model.add_task_head("ag-news", "textattack/bert-base-uncased-ag-news")
multi_model.add_task_head("mnli","textattack/bert-base-uncased-MNLI")
print("All heads added to umtm! Everything ready!!!!")

In [None]:
from src.test_model import evaluate_model
model_name = "bert_ties_template_snr50_exp"
merged_path = f"experiments/merges/{model_name}"


#evaluation for all tasks
e1 = evaluate_model(multi_model,model_name,"sst2")
e2 = evaluate_model(multi_model, model_name, "ag-news")
e3 = evaluate_model(multi_model, model_name, "mnli")
print(e1)
print(e2)
print(e3)
print("All checked! :)")

That's how we were testing toy models, then it was necessary to find the best parameters.

In [None]:
import torch
from src.optimize_ties_merge_snr50 import optimize_ties_params

best_params = optimize_ties_params(
    template_config_path="./configs/merges/bert_ties_template_snr50.yaml",
    output_dir="./experiments/optuna_ties_snr50",
    n_trials=148, # 149 if you want to play one trial - there is 148 trials in database
    device="cuda" if torch.cuda.is_available() else "cpu"
)

print(f'\n I suppose we are done :)')


When we have the parameters needed we can run optimal merge and finetune it!

In [None]:
!mergekit-yaml configs/merges/merges_highlights/bert_ties_snr50_opt.yaml ./experiments/merges/bert_ties_snr50_opt --cuda

In [None]:
# FINE-TUNE THE SPECTRUM MERGED MODELS
import torch
from models.unifiedMultiTaskModel import UnifiedMultiTaskModel
from src.fine_tune import fine_tune_and_evaluate

# only the spectrum guided ones: "bert_ties_snr50_opt", itd..
model_name = "bert_ties_snr50_opt"
merged_path = f"./experiments/merges/{model_name}"

print(f"Processing Model: {model_name}")
print(f"Path: {merged_path}")

multi_model = UnifiedMultiTaskModel(merged_path)

multi_model.add_task_head("sst2", "textattack/bert-base-uncased-SST-2")
multi_model.add_task_head("ag-news", "textattack/bert-base-uncased-ag-news")
multi_model.add_task_head("mnli", "textattack/bert-base-uncased-MNLI")

device = "cuda" if torch.cuda.is_available() else "cpu"

fine_tune_and_evaluate(
    multitask_model_instance=multi_model,
    model_name=model_name,
    output_run_name=model_name+"ft", #ft for fine-tuned
    total_steps=3000, #you can run on 2 steps to try
    base_lr=2e-5,
    device=device,
    val_check_interval=300,  # if you are running on 2 stpes put it to 1
    arr_threshold=0.98,
    patience=4,
    train_subset_ratio=0.1
)

print("\nFine Tuning and Evaluation Complete :)")

###*That's it! Our short example notebook is over!*
