# üöÄ PacerKit Quickstart

**PACER: Permutation-Aligned Consensus Expert Routing**

This notebook demonstrates how to use PacerKit to merge multiple models using the PACER framework.

## Overview

PACER is a base-free, interference-aware model merging framework that:
1. **Aligns models geometrically** using Git Re-Basin
2. **Computes a Consensus Barycenter** as a synthetic base
3. **Analyzes interference** per layer
4. **Merges low-interference layers** using DARE-TIES
5. **Upcycles high-interference layers** to Mixture-of-Experts
6. **Uploads to HuggingFace Hub** (optional)

## Installation

First, install PacerKit and its dependencies:

In [None]:
# Install PacerKit (run from repo root)
# !pip install -e .

# Or install dependencies directly
# !pip install torch transformers safetensors accelerate huggingface_hub scipy scikit-learn pyyaml tqdm click

# Add project root to path for local development
import sys
import os
if os.path.abspath('..') not in sys.path:
    sys.path.append(os.path.abspath('..'))

## Quick Start: Merge Two Qwen Coder Models

Let's merge the two 4B Qwen coding models specified in the config.

In [None]:
from pacerkit import PACERMerger

# Initialize with model paths
merger = PACERMerger([
    "fluently/FluentlyQwen3-Coder-4B-0909",
    "SamuelBang/AesCoder-4B"
])

# Run the full merge pipeline
# Note: This requires GPU and will download ~8GB of models
# merged_model = merger.merge(
#     interference_threshold=0.35,
#     output_path="./merged_qwen_coder"
# )

## Using Configuration Files

For reproducible merges, use YAML config files:

In [None]:
from pacerkit import PACERMerger, load_config

# Load from config file
config = load_config("../configs/qwen_coder_merge.yaml")

print(f"Project: {config.project_name}")
print(f"Models: {config.models}")
print(f"Interference threshold: {config.pacer.interference_threshold}")
print(f"Push to Hub: {config.output.push_to_hub}")

In [None]:
# Initialize merger with config
merger = PACERMerger(config=config)

# Load models (this will download them if not cached)
# models = merger.load_models()

## Step-by-Step Pipeline

Let's walk through each phase of the PACER pipeline.

### Phase 1: Load and Validate Models

In [None]:
# Load models
# models = merger.load_models()

# Check architecture compatibility
# from pacerkit.utils import get_model_architecture_info
# info = get_model_architecture_info(models[0])
# print(f"Architecture: {info['architecture']}")
# print(f"Parameters: {info['total_parameters']:,}")
# print(f"Hidden size: {info['hidden_size']}")

### Phase 2: Geometric Alignment (Git Re-Basin)

This step aligns the permutation symmetries of all models to a common anchor.

In [None]:
# Align models (requires loaded models)
# aligned_models = merger.align()

# Check alignment quality
# for i in range(len(aligned_models) - 1):
#     quality = merger.aligner.compute_alignment_quality(i)
#     print(f"Model {i+1} alignment quality: {quality:.4f}")

### Phase 3: Consensus Barycenter

Compute the Fr√©chet Mean of aligned models as our synthetic base.

In [None]:
# Compute consensus
# consensus_model = merger.compute_consensus()

# Get deviation statistics
# stats = merger.consensus_engine.compute_deviation_statistics()
# print(f"Number of parameters: {len(stats)}")

# Show top deviating layers
# sorted_stats = sorted(stats.items(), key=lambda x: x[1]['mean_deviation_norm'], reverse=True)
# print("\nTop 5 deviating layers:")
# for name, stat in sorted_stats[:5]:
#     print(f"  {name}: {stat['mean_deviation_norm']:.4f}")

### Phase 4: Interference Analysis

Analyze which layers have high interference (should become MoE) vs low interference (should merge).

In [None]:
# Analyze interference
# report = merger.analyze_interference()

# Print high-interference layers
# high_interference = merger.interference_analyzer.get_high_interference_layers(10)
# print("Top 10 high-interference layers:")
# for layer, score in high_interference:
#     print(f"  {layer}: {score:.4f}")

### Phase 5: Build Merged Model

Now run the full merge to create the final model.

In [None]:
# Run full merge (combines all phases)
# merged_model = merger.merge(
#     interference_threshold=0.35,
#     top_k_experts=2,
#     output_path="./merged_model"
# )

# Get summary
# summary = merger.interference_analyzer.get_summary()
# print(f"Merged layers: {summary['merge_layers']}")
# print(f"MoE layers: {summary['moe_layers']}")

## üåê Upload to HuggingFace Hub

PacerKit can automatically upload your merged model to HuggingFace Hub with an auto-generated model card.

In [None]:
# First, login to HuggingFace (if not already logged in)
# from huggingface_hub import login
# login()  # This will prompt for your token

In [None]:
# Option 1: Upload during merge
# merged_model = merger.merge(
#     output_path="./merged_qwen_coder",
#     push_to_hub=True,
#     hub_repo="your-username/merged-qwen-coder",
#     private=False  # Set to True for private repo
# )

In [None]:
# Option 2: Upload after merging
# from pacerkit.utils import push_to_huggingface_hub

# hub_url = push_to_huggingface_hub(
#     model=merged_model,
#     repo_id="your-username/merged-qwen-coder",
#     model_path="./merged_qwen_coder",  # Local path with files
#     private=False,
#     commit_message="Upload PACER merged Qwen coder models"
# )
# print(f"Model uploaded to: {hub_url}")

## üìù Auto-Generated Model Card

PacerKit automatically generates a model card with merge details:

In [None]:
# View the generated model card
# with open("./merged_qwen_coder/README.md", "r") as f:
#     print(f.read())

## üìä Understanding the Interference Metric

The interference metric measures how much deviation vectors conflict:

$$\mathcal{I} = 1 - \frac{||\sum \Delta_k||_2}{\sum ||\Delta_k||_2}$$

- **I ‚âà 0**: Deviations are aligned ‚Üí Safe to merge
- **I ‚âà 1**: Deviations conflict ‚Üí Need MoE to preserve both

In [None]:
import torch
from pacerkit.core.interference import InterferenceAnalyzer

# Example: compute interference for synthetic data
analyzer = InterferenceAnalyzer(threshold=0.35)

# Case 1: Aligned deviations (low interference)
aligned_devs = torch.tensor([
    [1.0, 2.0, 3.0],
    [1.1, 2.1, 3.1],  # Similar direction
])
print(f"Aligned interference: {analyzer.compute_interference(aligned_devs):.4f}")

# Case 2: Conflicting deviations (high interference)
conflicting_devs = torch.tensor([
    [1.0, 0.0, 0.0],
    [-1.0, 0.0, 0.0],  # Opposite direction
])
print(f"Conflicting interference: {analyzer.compute_interference(conflicting_devs):.4f}")

## üß† Zero-Shot Routing

PACER uses a data-free router based on Subspace Projection Affinity:

In [None]:
import torch
from pacerkit.core.moe import ZeroShotRouter

# Create synthetic expert deviations
# Shape: (num_experts, d_out, d_in)
expert_deviations = torch.randn(3, 64, 128)  # 3 experts

# Initialize router
router = ZeroShotRouter(expert_deviations, top_k=2)

# Route some inputs
x = torch.randn(2, 10, 128)  # batch=2, seq=10, dim=128
weights, indices = router(x)

print(f"Routing weights shape: {weights.shape}")
print(f"Expert indices shape: {indices.shape}")
print(f"\nFirst token routes to experts: {indices[0, 0].tolist()}")
print(f"With weights: {weights[0, 0].tolist()}")

## üìÅ Output Folder Structure

PacerKit creates a well-organized output folder:

In [None]:
# Example output structure:
# merged_qwen_coder_20251209_205733/
# ‚îú‚îÄ‚îÄ config.json              # Model config
# ‚îú‚îÄ‚îÄ model.safetensors        # Model weights
# ‚îú‚îÄ‚îÄ README.md                # Auto-generated model card
# ‚îú‚îÄ‚îÄ merge_config.json        # PACER merge configuration
# ‚îú‚îÄ‚îÄ merge_report.json        # Detailed merge decisions
# ‚îî‚îÄ‚îÄ logs/                    # Log files (if any)

import json

# View merge report
# with open("./merged_qwen_coder/merge_report.json", "r") as f:
#     report = json.load(f)
#     print(json.dumps(report['summary'], indent=2))

## üéØ Advanced: Custom Merge Settings

In [None]:
from pacerkit import PACERMerger, PACERConfig
from pacerkit.config import PACERSettings, OutputConfig

# Create custom config
config = PACERConfig(
    project_name="custom-merge",
    models=[
        "fluently/FluentlyQwen3-Coder-4B-0909",
        "SamuelBang/AesCoder-4B"
    ],
    pacer=PACERSettings(
        interference_threshold=0.25,  # Lower = more MoE layers
        top_k_experts=3,              # Activate 3 experts per token
        dropout_rate=0.15,
        expert_cluster_threshold=0.85,
    ),
    output=OutputConfig(
        path="./custom_merged",
        add_timestamp=True,
        push_to_hub=False,
    )
)

# Run merge with custom config
# merger = PACERMerger(config=config)
# merged = merger.merge()

## üîç Analyzing Merge Results

In [None]:
# Get detailed interference report
# report = merger.get_interference_report()

# Visualize interference distribution
# import matplotlib.pyplot as plt

# scores = [score for _, (score, _) in report['layers'].items()]
# plt.figure(figsize=(10, 6))
# plt.hist(scores, bins=50, edgecolor='black')
# plt.axvline(x=0.35, color='r', linestyle='--', label='Threshold')
# plt.xlabel('Interference Score')
# plt.ylabel('Number of Layers')
# plt.title('Distribution of Interference Scores')
# plt.legend()
# plt.show()

## üß™ Testing the Merged Model

In [None]:
# Load and test the merged model
# from transformers import AutoModelForCausalLM, AutoTokenizer

# model = AutoModelForCausalLM.from_pretrained("./merged_qwen_coder")
# tokenizer = AutoTokenizer.from_pretrained("fluently/FluentlyQwen3-Coder-4B-0909")

# # Test generation
# prompt = "def fibonacci(n):"
# inputs = tokenizer(prompt, return_tensors="pt")
# outputs = model.generate(**inputs, max_length=100)
# print(tokenizer.decode(outputs[0]))

## üìö Next Steps

- Read the [methodology documentation](../docs/methodology.md) for full technical details
- Explore different interference thresholds for your use case
- Try merging Vision Transformers with Token Merging enabled
- Share your merged models on HuggingFace Hub!

## üîó Resources

- [GitHub Repository](https://github.com/yourusername/pacerkit)
- [Configuration Reference](../README.md#configuration)
- [PACER Methodology](../docs/methodology.md)

For questions and contributions, visit the GitHub repository!