# SR-012: LUT Candidate Analysis & Evaluation

**Version:** 1.1.0 | **Build:** 2026-01-17

Analyze Q/LUT distributions, generate LUT candidates, and evaluate via fast perplexity screening.

---

## Pipeline Overview

| Step | Script | Device | Time |
|------|--------|--------|------|
| 1 | `analyze_q_lut_stats.py` | CPU | ~30s |
| 2 | `apply_lut_candidates.py` | CPU | ~1-2min |
| 3 | `eval_lut_candidates.py` | GPU/CPU | ~30s-15min per candidate |

## Prerequisites

- **Trained V2 QAT checkpoint** (from SR-008, SR-011, etc.)
- GPU recommended for PPL evaluation (optional - can skip)

## Quick Start

1. Run **Section 1** (Setup)
2. Run **Section 2** to find your checkpoints
3. Fill in `V2_CHECKPOINT` in **Section 3**
4. Run remaining sections

---
## 1. Setup

In [None]:
#@title 1.1 Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')
print("Google Drive mounted at /content/drive")

In [None]:
#@title 1.2 Clone Repository
import os

REPO_URL = "https://github.com/Anemll/qwen3_apple_style_2bit_qat_lora.git"  #@param {type:"string"}
REPO_DIR = "/content/repo"

if not os.path.exists(REPO_DIR):
    !git clone {REPO_URL} {REPO_DIR}
    print(f"Cloned to {REPO_DIR}")
else:
    print(f"Repository already exists at {REPO_DIR}")

os.chdir(REPO_DIR)
!git pull
print(f"Working directory: {os.getcwd()}")

In [None]:
#@title 1.3 Install Dependencies
!pip install -q transformers accelerate datasets sentencepiece protobuf torch
print("Dependencies installed")

In [None]:
#@title 1.4 Check Device
import torch

print("Device Detection:")
print(f"  CUDA available: {torch.cuda.is_available()}")

if torch.cuda.is_available():
    print(f"  GPU: {torch.cuda.get_device_name(0)}")
    print(f"  Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
    HAS_GPU = True
else:
    print("  No GPU detected")
    print("  Steps 1-2 (stats, candidates) will work fine")
    print("  Step 3 (PPL eval) will be slow - consider skipping")
    HAS_GPU = False

---
## 2. Discover Checkpoints

In [None]:
#@title 2.1 Find Checkpoints on Google Drive
#@markdown Run this cell to discover available V2 checkpoints.
#@markdown Copy the path you want to use into Section 3.

from pathlib import Path

SEARCH_PATHS = [
    "/content/drive/MyDrive/qat_runs",
    "/content/drive/MyDrive/qwen3_runs",
    "/content/drive/MyDrive",
]

print("=" * 60)
print("SEARCHING FOR V2 CHECKPOINTS")
print("=" * 60)

found_any = False
for search_path in SEARCH_PATHS:
    p = Path(search_path)
    if not p.exists():
        continue
    
    # Look for .pt files
    checkpoints = list(p.glob("**/*v2*.pt")) + list(p.glob("**/*fp16*.pt"))
    checkpoints = sorted(set(checkpoints))[:15]  # Dedupe and limit
    
    if checkpoints:
        found_any = True
        print(f"\nFound in {search_path}:")
        for ckpt in checkpoints:
            size_mb = ckpt.stat().st_size / 1e6
            print(f"  {ckpt} ({size_mb:.1f} MB)")

if not found_any:
    print("\nNo V2 checkpoints found.")
    print("Upload a trained checkpoint to Google Drive first.")

print("\n" + "=" * 60)
print("Copy one of the paths above into V2_CHECKPOINT in Section 3")
print("=" * 60)

---
## 3. Configuration

**IMPORTANT:** Update `V2_CHECKPOINT` below with your checkpoint path from Section 2.

In [None]:
#@title 3.1 Main Configuration
#@markdown ### Checkpoint Path (REQUIRED)
V2_CHECKPOINT = ""  #@param {type:"string"}

#@markdown ### Model Settings
MODEL_ID = "Qwen/Qwen3-0.6B"  #@param {type:"string"}

#@markdown ### LUT Candidate Options
LUT_FAMILIES = "A,B,C,D"  #@param ["A", "B", "C", "D", "A,B", "A,B,C,D"]
MAX_ABS = 1.0  #@param {type:"number"}
SCOPE = "all"  #@param ["all", "mlp", "attn"]

#@markdown ### PPL Evaluation Options
SKIP_PPL_EVAL = False  #@param {type:"boolean"}
MAX_CHUNKS_FAST = 20  #@param {type:"integer"}
TOP_K = 3  #@param {type:"integer"}
RUN_FULL_PPL = True  #@param {type:"boolean"}

#@markdown ### Output Paths
OUTPUT_DIR = "/content/lut_candidates"  #@param {type:"string"}
GDRIVE_OUTPUT = "/content/drive/MyDrive/qat_runs/SR-012_lut_analysis"  #@param {type:"string"}

# Validation
print("=" * 60)
print("CONFIGURATION")
print("=" * 60)
if not V2_CHECKPOINT:
    print("ERROR: V2_CHECKPOINT is empty!")
    print("       Run Section 2 to find checkpoints, then paste the path here.")
else:
    print(f"Checkpoint:    {V2_CHECKPOINT}")
    print(f"Model:         {MODEL_ID}")
    print(f"LUT Families:  {LUT_FAMILIES}")
    print(f"Max Abs:       {MAX_ABS}")
    print(f"Scope:         {SCOPE}")
    print(f"Skip PPL:      {SKIP_PPL_EVAL}")
    print(f"Output:        {OUTPUT_DIR}")

In [None]:
#@title 3.2 Verify Checkpoint & Sync to Local
#@markdown Copies checkpoint to local storage for faster processing.

import os
import shutil
from pathlib import Path

os.chdir("/content/repo")

if not V2_CHECKPOINT:
    raise ValueError("V2_CHECKPOINT is not set! Go back to Section 3.1")

if not os.path.exists(V2_CHECKPOINT):
    raise FileNotFoundError(f"Checkpoint not found: {V2_CHECKPOINT}")

size_mb = os.path.getsize(V2_CHECKPOINT) / 1e6
print(f"Checkpoint found: {V2_CHECKPOINT}")
print(f"Size: {size_mb:.1f} MB")

# Sync to local storage
LOCAL_DIR = "/content/checkpoints"
os.makedirs(LOCAL_DIR, exist_ok=True)
local_ckpt = Path(LOCAL_DIR) / Path(V2_CHECKPOINT).name

if not local_ckpt.exists():
    print(f"\nCopying to local storage...")
    shutil.copy(V2_CHECKPOINT, local_ckpt)
    print(f"  -> {local_ckpt}")
    
    # Copy config.json if exists
    config_src = Path(V2_CHECKPOINT).parent / "config.json"
    if config_src.exists():
        shutil.copy(config_src, Path(LOCAL_DIR) / "config.json")
        print(f"  -> config.json")
else:
    print(f"\nLocal copy exists: {local_ckpt}")

CHECKPOINT_PATH = str(local_ckpt)
print(f"\nUsing: {CHECKPOINT_PATH}")

---
## 4. Analyze Q/LUT Statistics (CPU)

Computes per-layer distribution metrics to guide LUT selection.

In [None]:
#@title 4.1 Run Q/LUT Statistics Analysis
#@markdown Takes ~30 seconds on CPU.

import os
os.chdir("/content/repo")
os.makedirs(OUTPUT_DIR, exist_ok=True)

STATS_OUTPUT = f"{OUTPUT_DIR}/q_lut_stats.json"

!python scripts/analyze_q_lut_stats.py "{CHECKPOINT_PATH}" \
    --scope {SCOPE} \
    --output "{STATS_OUTPUT}" \
    --verbose

In [None]:
#@title 4.2 Display Statistics Summary
import json
from pathlib import Path

stats_file = Path(STATS_OUTPUT)
if stats_file.exists():
    with open(stats_file) as f:
        stats = json.load(f)
    
    summary = stats.get('summary', {})
    
    print("=" * 60)
    print("Q/LUT STATISTICS SUMMARY")
    print("=" * 60)
    print(f"Total layers:              {summary.get('total_layers', 'N/A')}")
    print(f"  MLP:                     {summary.get('mlp_layers', 'N/A')}")
    print(f"  Attention:               {summary.get('attn_layers', 'N/A')}")
    print()
    print(f"p999/max_abs (max):        {summary.get('p999_over_maxabs_max', 'N/A')}")
    print(f"p999/max_abs (avg):        {summary.get('p999_over_maxabs_avg', 'N/A')}")
    print(f"Layers need wider max_abs: {summary.get('layers_need_widen_maxabs', 'N/A')}")
    print()
    
    # Recommendation
    if summary.get('p999_over_maxabs_max', 0) > 1.0:
        suggested = round(MAX_ABS * 1.5, 1)
        print(f"RECOMMENDATION: Increase MAX_ABS to {suggested} or higher")
    else:
        print(f"max_abs={MAX_ABS} looks adequate")
else:
    print(f"Stats file not found: {stats_file}")

---
## 5. Generate LUT Candidates (CPU)

Creates checkpoint variants with different LUT families:

| Family | Description |
|--------|-------------|
| **A** | Uniform (linspace) |
| **B** | Dense-center (more values near 0) |
| **C** | Heavy-tail (more values at extremes) |
| **D** | Quantile (data-driven) |

In [None]:
#@title 5.1 Generate LUT Candidates
#@markdown Takes ~1-2 minutes on CPU.

import os
os.chdir("/content/repo")

!python scripts/apply_lut_candidates.py "{CHECKPOINT_PATH}" \
    --output-dir "{OUTPUT_DIR}" \
    --families {LUT_FAMILIES} \
    --max-abs {MAX_ABS} \
    --scope {SCOPE} \
    --save

In [None]:
#@title 5.2 List Generated Candidates
import json
from pathlib import Path

summary_file = Path(OUTPUT_DIR) / "candidates_summary.json"

if summary_file.exists():
    with open(summary_file) as f:
        summary = json.load(f)
    
    candidates = summary.get('candidates', [])
    candidates.sort(key=lambda x: x.get('avg_mae', float('inf')))
    
    print("=" * 60)
    print(f"GENERATED {len(candidates)} CANDIDATES")
    print("=" * 60)
    print(f"{'Rank':<5} {'Name':<35} {'Avg MAE':<12} {'Max MAE':<12}")
    print("-" * 65)
    
    for i, c in enumerate(candidates, 1):
        print(f"{i:<5} {c['name']:<35} {c['avg_mae']:.6f}     {c['max_mae']:.6f}")
    
    print()
    print(f"Best by MAE: {candidates[0]['name']}")
else:
    print(f"Summary not found: {summary_file}")

---
## 6. Evaluate Candidates (GPU/CPU)

**Optional:** Skip if no GPU and you don't want to wait.

Uses perplexity to rank candidates. Set `SKIP_PPL_EVAL=True` in Section 3 to skip.

In [None]:
#@title 6.1 Run PPL Evaluation
#@markdown - GPU: ~30s per candidate (fast), ~3min (full)
#@markdown - CPU: ~10-15min per candidate

import os
os.chdir("/content/repo")

EVAL_OUTPUT = f"{OUTPUT_DIR}/eval_results.json"

if SKIP_PPL_EVAL:
    print("PPL evaluation SKIPPED (SKIP_PPL_EVAL=True)")
    print("Use MAE ranking from Step 5 instead.")
else:
    cmd = f'''python scripts/eval_lut_candidates.py "{OUTPUT_DIR}" \
        --max-chunks {MAX_CHUNKS_FAST} \
        --top-k {TOP_K} \
        --model-id {MODEL_ID} \
        --device auto \
        --dtype auto \
        --output "{EVAL_OUTPUT}"'''
    
    if RUN_FULL_PPL:
        cmd += " --full-ppl"
    
    print("Running PPL evaluation...")
    !{cmd}

In [None]:
#@title 6.2 Display Evaluation Results
import json
from pathlib import Path

eval_file = Path(EVAL_OUTPUT) if 'EVAL_OUTPUT' in dir() else Path(f"{OUTPUT_DIR}/eval_results.json")

if eval_file.exists():
    with open(eval_file) as f:
        results = json.load(f)
    
    entries = results.get('results', [])
    entries.sort(key=lambda x: x.get('ppl_fast', float('inf')))
    
    print("=" * 70)
    print("EVALUATION RESULTS")
    print("=" * 70)
    print(f"{'Rank':<5} {'Candidate':<30} {'Fast PPL':<10} {'Full PPL':<10} {'MAE':<10}")
    print("-" * 70)
    
    for i, r in enumerate(entries, 1):
        fast = f"{r['ppl_fast']:.2f}" if r.get('ppl_fast') else "N/A"
        full = f"{r['ppl_full']:.2f}" if r.get('ppl_full') else "N/A"
        mae = f"{r['avg_mae']:.6f}" if r.get('avg_mae') else "N/A"
        print(f"{i:<5} {r['name']:<30} {fast:<10} {full:<10} {mae:<10}")
    
    if entries:
        print()
        print(f"BEST: {entries[0]['name']}")
elif SKIP_PPL_EVAL:
    print("PPL evaluation was skipped.")
else:
    print(f"Results not found: {eval_file}")

---
## 7. Get Best Candidate

In [None]:
#@title 7.1 Determine Best Candidate
#@markdown Uses PPL if available, otherwise MAE.

import json
from pathlib import Path

best_checkpoint = None
best_name = None

# Try PPL results
eval_file = Path(f"{OUTPUT_DIR}/eval_results.json")
if eval_file.exists():
    with open(eval_file) as f:
        results = json.load(f)
    entries = results.get('results', [])
    if entries:
        entries.sort(key=lambda x: x.get('ppl_fast', float('inf')))
        best_checkpoint = entries[0]['checkpoint_path']
        best_name = entries[0]['name']
        print(f"Best by PPL: {best_name}")

# Fallback to MAE
if not best_checkpoint:
    summary_file = Path(f"{OUTPUT_DIR}/candidates_summary.json")
    if summary_file.exists():
        with open(summary_file) as f:
            summary = json.load(f)
        candidates = summary.get('candidates', [])
        if candidates:
            candidates.sort(key=lambda x: x.get('avg_mae', float('inf')))
            best_checkpoint = candidates[0]['checkpoint_path']
            best_name = candidates[0]['name']
            print(f"Best by MAE: {best_name}")

if best_checkpoint:
    print(f"Checkpoint: {best_checkpoint}")
else:
    print("No candidates found!")

In [None]:
#@title 7.2 Test Inference (Optional)
RUN_INFERENCE = True  #@param {type:"boolean"}
TEST_PROMPT = "What is the capital of France?"  #@param {type:"string"}

if RUN_INFERENCE and best_checkpoint:
    !python scripts/test_inference.py "{best_checkpoint}" \
        --prompt "{TEST_PROMPT}" \
        --max-tokens 256
elif not best_checkpoint:
    print("No checkpoint available")
else:
    print("Inference skipped")

---
## 8. Save Results to Google Drive

In [None]:
#@title 8.1 Copy Results to Google Drive
COPY_BEST_CHECKPOINT = True  #@param {type:"boolean"}

import os
import shutil
from pathlib import Path

os.makedirs(GDRIVE_OUTPUT, exist_ok=True)

# Copy JSON files
output_path = Path(OUTPUT_DIR)
for f in output_path.glob("*.json"):
    dest = Path(GDRIVE_OUTPUT) / f.name
    shutil.copy(f, dest)
    print(f"Copied: {f.name}")

# Copy best checkpoint
if COPY_BEST_CHECKPOINT and best_checkpoint:
    ckpt = Path(best_checkpoint)
    if ckpt.exists():
        dest = Path(GDRIVE_OUTPUT) / ckpt.name
        shutil.copy(ckpt, dest)
        print(f"Copied: {ckpt.name}")

print(f"\nSaved to: {GDRIVE_OUTPUT}")

---
## 9. Decision Guide

| Observation | Recommended Action |
|-------------|--------------------|
| `p999_over_maxabs > 1.0` | Increase `MAX_ABS` (1.5x-2x) |
| `pct_Qeff_outside > 0.5%` | Increase `MAX_ABS` |
| High `tail_ratio` (p99/p90 > 2) | Try Family C (Heavy-tail) |
| Low `center_ratio` (p50/p90 < 0.3) | Try Family B (Dense-center) |
| All metrics benign | Family A (Uniform) is fine |
| Want optimal fit | Family D (Quantile) |

---
## 10. Alternative: Single Family with Wider Range

If stats suggest widening `max_abs`, run this section.

In [None]:
#@title 10.1 Generate Single Family Candidates
SINGLE_FAMILY = "C"  #@param ["A", "B", "C", "D"]
WIDER_MAX_ABS = 1.5  #@param {type:"number"}

import os
os.chdir("/content/repo")

SINGLE_OUTPUT = f"{OUTPUT_DIR}_family_{SINGLE_FAMILY}"

!python scripts/apply_lut_candidates.py "{CHECKPOINT_PATH}" \
    --output-dir "{SINGLE_OUTPUT}" \
    --families {SINGLE_FAMILY} \
    --max-abs {WIDER_MAX_ABS} \
    --scope {SCOPE} \
    --save

In [None]:
#@title 10.2 Evaluate Single Family (Optional)
if not SKIP_PPL_EVAL:
    !python scripts/eval_lut_candidates.py "{SINGLE_OUTPUT}" \
        --max-chunks {MAX_CHUNKS_FAST} \
        --top-k 3 \
        --model-id {MODEL_ID} \
        --device auto \
        --full-ppl \
        --output "{SINGLE_OUTPUT}/eval_results.json"
else:
    print("PPL skipped. Check candidates_summary.json for MAE ranking.")

---
## Notes

**Scripts:**
- `analyze_q_lut_stats.py` - Per-layer Q/LUT statistics
- `apply_lut_candidates.py` - Generate LUT candidate checkpoints
- `eval_lut_candidates.py` - Fast PPL screening

**Memory:**
- Stats + candidates: ~4-6 GB RAM
- PPL evaluation: ~6-8 GB VRAM

**CPU-only mode:**
- Set `SKIP_PPL_EVAL = True`
- Use MAE ranking from Step 5