# Model Merging with SparseGPT on Kaggle

This notebook demonstrates how to merge fine-tuned LLaMA models using three methods:
1. TIES (magnitude-based)
2. DARE (random dropout)
3. SparseGPT (Hessian-based importance)

**Prerequisites:**
- Enable GPU (Settings ‚Üí Accelerator ‚Üí GPU T4 x2)
- Enable Internet (Settings ‚Üí Internet ‚Üí On)
- Add your fine-tuned models as Kaggle datasets
- ‚ö†Ô∏è **NO HuggingFace account needed!** (unless using private models)

## Step 1: Setup Environment

In [4]:
# Run setup script
!git clone https://github.com/Shahriar-Ferdoush/test-2.git
%cd test-2
!python kaggle_setup.py

Cloning into 'test-2'...
remote: Enumerating objects: 24, done.[K
remote: Counting objects: 100% (24/24), done.[K
remote: Compressing objects: 100% (21/21), done.[K
remote: Total 24 (delta 4), reused 23 (delta 3), pack-reused 0 (from 0)[K
Receiving objects: 100% (24/24), 62.26 KiB | 4.45 MiB/s, done.
Resolving deltas: 100% (4/4), done.
/kaggle/working/test-2
KAGGLE ENVIRONMENT SETUP

[1/6] Checking GPU availability...
‚úì GPU available: Tesla P100-PCIE-16GB
  Memory: 17.06 GB

[2/6] Installing required packages...
  Installing transformers>=4.36.0...
  Installing datasets>=2.14.0...
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m47.7/47.7 MB[0m [31m37.4 MB/s[0m eta [36m0:00:00[0m:00:01[0m00:01[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
bi

In [None]:
# Configure LoRA adapter paths - NO COPYING NEEDED!
# Kaggle /kaggle/input/ is READ-ONLY but we can LOAD from it directly
import os

# Your Kaggle dataset paths - these point directly to the LoRA adapters
# Adjust the subdirectory name if different
LORA_ADAPTERS = [
    "/kaggle/input/llama3-2-1b-instruct-ft-doctor-consulting/llama-3-1b-medical-chatbot-v1",
    "/kaggle/input/llama3-2-1b-instruct-fine-tuning-therapist-ai/llama-3-1b-therapist-ai-v1",
]

print("="*80)
print("LORA ADAPTER CONFIGURATION")
print("="*80)

valid_adapters = []

for i, adapter_path in enumerate(LORA_ADAPTERS, 1):
    print(f"\n[{i}] Checking: {adapter_path}")
    
    if os.path.exists(adapter_path):
        files = os.listdir(adapter_path)
        
        # Check for required LoRA files
        has_adapter_config = "adapter_config.json" in files
        has_adapter_model = any("adapter_model" in f for f in files)
        
        print(f"    Files found: {len(files)}")
        print(f"    adapter_config.json: {'‚úÖ' if has_adapter_config else '‚ùå'}")
        print(f"    adapter_model.*: {'‚úÖ' if has_adapter_model else '‚ùå'}")
        
        if has_adapter_config and has_adapter_model:
            print(f"    ‚úÖ VALID LoRA adapter")
            valid_adapters.append(adapter_path)
        else:
            print(f"    ‚ùå INVALID - missing required files")
            print(f"       Available files: {files[:10]}")
    else:
        print(f"    ‚ùå PATH NOT FOUND")
        print(f"       Make sure dataset is added to notebook inputs")

print(f"\n{'='*80}")
print(f"RESULT: Found {len(valid_adapters)}/{len(LORA_ADAPTERS)} valid adapter(s)")
print(f"{'='*80}\n")

if len(valid_adapters) == 0:
    raise ValueError("No valid LoRA adapters found! Check paths and dataset names.")

Copying /kaggle/input/llama3-2-1b-instruct-ft-doctor-consulting ‚Üí ./finetuned_model_1
  ‚úì Copied 6 files
Copying /kaggle/input/llama3-2-1b-instruct-fine-tuning-therapist-ai ‚Üí ./finetuned_model_2
  ‚úì Copied 6 files

‚úì Ready to use 2 model(s)
Local paths: ['./finetuned_model_1', './finetuned_model_2']


## Step 2: Configure LoRA Adapter Paths

**IMPORTANT:** 
- `/kaggle/input/` is READ-ONLY but we can load models directly from it
- NO NEED to copy! Just use the paths directly
- Make sure you've added your fine-tuned model datasets to notebook inputs

## Step 2: Configure HuggingFace Token (OPTIONAL - Only if using HF models)

**Skip this step if you copied models from Kaggle datasets above!**

Only needed if:
- Using models from HuggingFace Hub
- Using private HuggingFace models
- Base model requires authentication

In [6]:
# import os
# from huggingface_hub import login

# # Option 1: Use Kaggle Secrets (recommended)
# # Add HF_TOKEN in Kaggle Secrets
# from kaggle_secrets import UserSecretsClient
# user_secrets = UserSecretsClient()
# hf_token = user_secrets.get_secret("HF_TOKEN")

# # Option 2: Direct token (not recommended for public notebooks)
# # hf_token = "hf_..."

# login(token=hf_token)
# print("‚úì Logged in to HuggingFace")

## Step 3: Configure Merging Parameters

In [None]:
import torch
from llama_merge import LLaMAMerger

# ============================================================================
# CONFIGURATION - Using direct paths from Kaggle input
# ============================================================================

# Base model (Kaggle dataset)
BASE_MODEL = "/kaggle/input/llama-3.2/transformers/1b-instruct/1"

# LoRA adapters (from previous cell) - NO COPYING, direct paths!
FINETUNED_MODELS = valid_adapters

# Calibration datasets (HuggingFace dataset IDs - will be downloaded)
DATASETS = [
    "ruslanmv/ai-medical-chatbot",  # Matches medical chatbot adapter
    "Amod/mental_health_counseling_conversations",  # Matches therapist adapter
]

# Merging parameters
OUTPUT_DIR = "./merged_models"
CACHE_DIR = "./merge_cache"
DENSITY = 0.2  # Keep top 20% of weights
NUM_CALIBRATION_SAMPLES = 64
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

# Display configuration
print("="*80)
print("MERGER CONFIGURATION")
print("="*80)
print(f"Device: {DEVICE}")
print(f"\nBase Model:")
print(f"  {BASE_MODEL}")
print(f"\nLoRA Adapters ({len(FINETUNED_MODELS)}):")
for i, model in enumerate(FINETUNED_MODELS, 1):
    print(f"  {i}. {model}")
print(f"\nCalibration Datasets ({len(DATASETS)}):")
for i, dataset in enumerate(DATASETS, 1):
    print(f"  {i}. {dataset}")
print(f"\nParameters:")
print(f"  Density: {DENSITY} (keep top {int(DENSITY*100)}% of weights)")
print(f"  Calibration samples: {NUM_CALIBRATION_SAMPLES}")
print(f"  Output directory: {OUTPUT_DIR}")
print(f"  Cache directory: {CACHE_DIR}")
print("="*80)

# Initialize merger
print("\nInitializing LLaMAMerger...")
merger = LLaMAMerger(
    base_model_path=BASE_MODEL,
    finetuned_model_paths=FINETUNED_MODELS,
    dataset_names=DATASETS,
    output_dir=OUTPUT_DIR,
    cache_dir=CACHE_DIR,
    density=DENSITY,
    num_calibration_samples=NUM_CALIBRATION_SAMPLES,
    device=DEVICE
)
print("‚úÖ Merger initialized successfully!")

2025-11-17 07:00:05.832291: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1763362806.034506      48 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1763362806.091118      48 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


AttributeError: 'MessageFactory' object has no attribute 'GetPrototype'

AttributeError: 'MessageFactory' object has no attribute 'GetPrototype'

AttributeError: 'MessageFactory' object has no attribute 'GetPrototype'

AttributeError: 'MessageFactory' object has no attribute 'GetPrototype'

AttributeError: 'MessageFactory' object has no attribute 'GetPrototype'

Device: cuda
Base model: /kaggle/input/llama-3.2/transformers/1b-instruct/1
Fine-tuned models: ['./finetuned_model_1', './finetuned_model_2']
Datasets: ['Amod/mental_health_counseling_conversations', 'ruslanmv/ai-medical-chatbot']
Density: 0.2


## Step 4: Initialize Merger

In [None]:
# Verification: Test loading one LoRA adapter
print("="*80)
print("VERIFICATION: Testing LoRA adapter loading")
print("="*80)

try:
    from transformers import AutoModelForCausalLM, AutoTokenizer
    from peft import PeftModel
    
    # Test with first model
    test_lora_path = FINETUNED_MODELS[0]
    print(f"\nTesting: {test_lora_path}")
    
    # Check files
    import os
    files = os.listdir(test_lora_path)
    print(f"\nFiles in adapter directory:")
    for f in files:
        print(f"  - {f}")
    
    # Try loading
    print(f"\nLoading base model: {BASE_MODEL}")
    base_model = AutoModelForCausalLM.from_pretrained(
        BASE_MODEL,
        torch_dtype=torch.float16,
        device_map="cpu"
    )
    print("  ‚úì Base model loaded")
    
    print(f"\nLoading LoRA adapter: {test_lora_path}")
    model = PeftModel.from_pretrained(base_model, test_lora_path)
    print("  ‚úì LoRA adapter loaded")
    
    print("\nMerging LoRA weights...")
    model = model.merge_and_unload()
    print("  ‚úì Merge successful!")
    
    # Clean up
    del model
    del base_model
    if torch.cuda.is_available():
        torch.cuda.empty_cache()
    
    print("\n" + "="*80)
    print("‚úì VERIFICATION PASSED - Ready to proceed!")
    print("="*80)
    
except Exception as e:
    print("\n" + "="*80)
    print("‚úó VERIFICATION FAILED")
    print("="*80)
    print(f"\nError: {str(e)}")
    print("\nPlease check:")
    print("1. BASE_MODEL path is correct")
    print("2. LoRA adapters were copied correctly")
    print("3. adapter_config.json and adapter_model.* files exist")
    print("="*80)
    raise

## Step 3A: Verify Setup (Optional but Recommended)

Let's verify everything is configured correctly before starting the merge.

In [8]:
# Create merger instance
merger = LLaMAMerger(
    base_model_path=BASE_MODEL,
    finetuned_model_paths=FINETUNED_MODELS,
    dataset_names=DATASETS,
    output_dir=OUTPUT_DIR,
    cache_dir=CACHE_DIR,
    density=DENSITY,
    num_calibration_samples=NUM_CALIBRATION_SAMPLES,
    device=DEVICE
)

print("‚úì Merger initialized")

‚úì Merger initialized


## Step 5: Run All Methods and Compare

In [9]:
# This will:
# 1. Compute task vectors (if not cached)
# 2. Compute Hessians (if not cached)
# 3. Merge with all three methods
# 4. Evaluate and compare

results = merger.merge_all_methods()

print("\n" + "="*80)
print("RESULTS SUMMARY")
print("="*80)
for method, metrics in results.items():
    print(f"\n{method}:")
    print(f"  Perplexity: {metrics['perplexity']:.4f}")
    print(f"  Time: {metrics['time']:.2f}s")

# Find best method
best_method = min(results, key=lambda k: results[k]['perplexity'])
print(f"\nüèÜ Best method: {best_method}")
print(f"   Perplexity: {results[best_method]['perplexity']:.4f}")

ERROR:llama_merge:  Failed to load model: Can't find 'adapter_config.json' at './finetuned_model_1'


ValueError: Can't find 'adapter_config.json' at './finetuned_model_1'

## Step 6: Save Best Model

In [None]:
# The models are already saved in OUTPUT_DIR
# You can load and test them:

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the best model (usually TIES-SparseGPT)
best_model_path = f"{OUTPUT_DIR}/ties_sparsegpt_merged"

model = AutoModelForCausalLM.from_pretrained(best_model_path)
tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL)

print(f"‚úì Loaded best model from {best_model_path}")

## Step 7: Test the Merged Model

In [None]:
# Test with a sample prompt
prompt = "I've been feeling anxious lately. What should I do?"

inputs = tokenizer(prompt, return_tensors="pt").to(DEVICE)
outputs = model.generate(
    **inputs,
    max_new_tokens=200,
    temperature=0.7,
    do_sample=True,
    top_p=0.9
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("Prompt:", prompt)
print("\nResponse:", response)

## Optional: Run Individual Methods

If you want to run just one method:

In [None]:
# Run only SparseGPT method
sparsegpt_model = merger.merge_with_ties(use_sparsegpt=True)
sparsegpt_model.save_pretrained(f"{OUTPUT_DIR}/sparsegpt_only")

print("‚úì SparseGPT model saved")

## Optional: Upload to HuggingFace Hub

In [None]:
# Upload the best model to HuggingFace
from huggingface_hub import HfApi

repo_name = "your-username/merged-mental-health-counselor"

model.push_to_hub(repo_name)
tokenizer.push_to_hub(repo_name)

print(f"‚úì Model uploaded to {repo_name}")

## Memory Monitoring (Optional)

In [None]:
# Check GPU memory usage
if torch.cuda.is_available():
    print(f"GPU Memory Allocated: {torch.cuda.memory_allocated() / 1e9:.2f} GB")
    print(f"GPU Memory Reserved: {torch.cuda.memory_reserved() / 1e9:.2f} GB")
    
    # Clear cache if needed
    # torch.cuda.empty_cache()