# Knowledge Fidelity Demo

**Compress an LLM and audit whether it still knows truth vs popular myths.**

This notebook demonstrates the full pipeline:
1. Audit a model's confidence on true vs false statements (baseline)
2. Compress with CF90 (SVD + layer freezing)
3. Re-audit and compare: how much signal was preserved?

Runtime: ~2 min on Qwen-0.5B (CPU), ~15 min on 7B

In [None]:
import sys
sys.path.insert(0, '../src')

from knowledge_fidelity import compress_and_audit, get_default_probes, get_mandela_probes

## 1. One-Call Compress + Audit

The simplest way to use the toolkit. One function call that:
- Loads the model
- Measures confidence BEFORE compression
- Applies CF90 (SVD compress Q/K/O at 70% rank, freeze 75% of layers)
- Measures confidence AFTER compression
- Returns a full report

In [None]:
report = compress_and_audit(
    "Qwen/Qwen2.5-0.5B",
    ratio=0.7,
    freeze_ratio=0.75,
    device="cpu",
)

In [None]:
print(f"Retention:   {report['retention']:.0%}")
print(f"rho before:  {report['rho_before']:.3f}")
print(f"rho after:   {report['rho_after']:.3f}")
print(f"rho drop:    {report['rho_before'] - report['rho_after']:.3f}")
print(f"Compressed:  {report['compression']['n_compressed']} matrices")
print(f"Frozen:      {report['freeze']['n_frozen']}/{report['freeze']['n_layers']} layers")

## 2. Per-Probe Breakdown

Let's look at which probes retained their signal and which didn't.

In [None]:
import numpy as np

probes = get_default_probes()
audit = report['audit_after']

print(f"{'Probe ID':<25} {'Delta':>8} {'True Conf':>10} {'False Conf':>11} {'Status':>8}")
print("-" * 65)
for i, p in enumerate(probes):
    delta = audit['deltas'][i]
    tc = audit['true_confs'][i]
    fc = audit['false_confs'][i]
    status = 'OK' if delta > 0 else 'FLIP'
    print(f"{p['id']:<25} {delta:>+8.4f} {tc:>10.3f} {fc:>11.3f} {status:>8}")

## 3. Visualize Before vs After

Compare confidence distributions before and after compression.

In [None]:
import matplotlib.pyplot as plt
import matplotlib
matplotlib.rcParams['figure.figsize'] = (12, 5)
matplotlib.rcParams['figure.dpi'] = 100

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Before compression
ax = axes[0]
before = report['audit_before']
x = np.arange(len(probes))
ax.bar(x - 0.2, before['true_confs'], 0.4, label='True', color='#2ecc71', alpha=0.8)
ax.bar(x + 0.2, before['false_confs'], 0.4, label='False', color='#e74c3c', alpha=0.8)
ax.set_title(f'BEFORE Compression (rho={report["rho_before"]:.3f})', fontsize=13)
ax.set_ylabel('Mean Confidence')
ax.set_xticks(x)
ax.set_xticklabels([p['id'][:12] for p in probes], rotation=45, ha='right', fontsize=7)
ax.legend()
ax.set_ylim(0, max(max(before['true_confs']), max(before['false_confs'])) * 1.15)

# After compression
ax = axes[1]
after = report['audit_after']
ax.bar(x - 0.2, after['true_confs'], 0.4, label='True', color='#2ecc71', alpha=0.8)
ax.bar(x + 0.2, after['false_confs'], 0.4, label='False', color='#e74c3c', alpha=0.8)
ax.set_title(f'AFTER Compression (rho={report["rho_after"]:.3f})', fontsize=13)
ax.set_ylabel('Mean Confidence')
ax.set_xticks(x)
ax.set_xticklabels([p['id'][:12] for p in probes], rotation=45, ha='right', fontsize=7)
ax.legend()
ax.set_ylim(0, max(max(after['true_confs']), max(after['false_confs'])) * 1.15)

plt.suptitle('Knowledge Fidelity: Confidence Before vs After CF90 Compression', fontsize=14, y=1.02)
plt.tight_layout()
plt.savefig('../figures/demo_before_after.png', bbox_inches='tight', dpi=150)
plt.show()

## 4. Delta Distribution

The confidence delta (true - false) should stay positive after compression.
A positive delta means the model is still more confident about truth than myth.

In [None]:
fig, ax = plt.subplots(figsize=(10, 5))

deltas_before = np.array(report['audit_before']['deltas'])
deltas_after = np.array(report['audit_after']['deltas'])

x = np.arange(len(probes))
ax.bar(x - 0.2, deltas_before, 0.4, label='Before', color='#3498db', alpha=0.8)
ax.bar(x + 0.2, deltas_after, 0.4, label='After CF90', color='#e67e22', alpha=0.8)
ax.axhline(y=0, color='black', linestyle='-', linewidth=0.5)
ax.set_ylabel('Confidence Delta (true - false)')
ax.set_title('Confidence Delta Preservation Under CF90 Compression')
ax.set_xticks(x)
ax.set_xticklabels([p['id'][:12] for p in probes], rotation=45, ha='right', fontsize=7)
ax.legend()

n_preserved = sum(1 for d in deltas_after if d > 0)
ax.annotate(f'{n_preserved}/{len(probes)} probes preserved',
           xy=(0.98, 0.95), xycoords='axes fraction', ha='right', va='top',
           fontsize=11, fontweight='bold',
           bbox=dict(boxstyle='round,pad=0.3', facecolor='#2ecc71', alpha=0.3))

plt.tight_layout()
plt.savefig('../figures/demo_delta_preservation.png', bbox_inches='tight', dpi=150)
plt.show()

## 5. Mandela Effect Probes

Test on popular false memories. These are claims that many people (and LLMs) get wrong.

Note: The Mandela effect signal strengthens with model scale. At 0.5B you may not see
strong separation; at 6.9B+ the effect is highly significant (p=0.016).

In [None]:
from knowledge_fidelity import audit_model, get_mandela_probes

mandela = get_mandela_probes()
mandela_audit = audit_model(report['model'], report['tokenizer'], probes=mandela)

print(f"Mandela probes (post-compression):")
print(f"  rho={mandela_audit['rho']:.3f} (p={mandela_audit['rho_p']:.4f})")
print(f"  {mandela_audit['n_positive_delta']}/{mandela_audit['n_probes']} correct")
print()
for i, p in enumerate(mandela):
    d = mandela_audit['deltas'][i]
    marker = 'correct' if d > 0 else 'WRONG'
    print(f"  {p['id']:<30s} delta={d:+.4f}  [{marker}]")
    if 'note' in p:
        print(f"    Note: {p['note']}")

## 6. Using the Compressed Model

The compressed model is still a standard HuggingFace model â€” use it normally.

In [None]:
import torch

model = report['model']
tokenizer = report['tokenizer']

prompts = [
    "The capital of France is",
    "Water boils at",
    "Einstein developed the theory of",
]

model.eval()
for prompt in prompts:
    inputs = tokenizer(prompt, return_tensors='pt').to('cpu')
    with torch.no_grad():
        out = model.generate(**inputs, max_new_tokens=10, do_sample=False,
                            pad_token_id=tokenizer.pad_token_id)
    response = tokenizer.decode(out[0], skip_special_tokens=True)
    print(f"{prompt} -> {response[len(prompt):].strip()[:50]}")

## Next Steps

- **Larger models**: Try `meta-llama/Llama-3.1-8B-Instruct` for stronger Mandela effect signal
- **Custom probes**: `compress_and_audit(model, probes=your_probes)` with domain-specific facts
- **Aggressive compression**: Set `use_importance=True` for importance-guided SVD at ratios below 70%
- **Export**: Save with `output_dir='./compressed'` then use `deployment/export_gguf.py`