# 13.5a: Direct Quantization Test — The Three Questions

**Does f32→bf16 quantization alone create Qwen-like structure?**

## The Test

Initialize 2,221 tokens in float32: `randvec + Gaussian(0, σ)`, convert to bfloat16.

Sweep σ from 5e-6 to 5e-5 (centered on 1e-5, which worked in 13.4a).

## The Three Questions

For each σ:
1. **Fully connected?** Measure adjacency density
2. **Quantized at bf16 limit?** Measure pairwise L∞ distances
3. **Plausible demographics?** Show black hole populations

That's it. No overthinking.

## Parameters

In [31]:
N_TOKENS = 2221
HIDDEN_DIM = 2560

# Sigma sweep (centered on 1e-5 from 13.4a)
SIGMA_MIN = 1e-6
SIGMA_MAX = 1e-4
NUM_SIGMA = 10

# Topology threshold (from 13.4a)
LINF_THRESHOLD = 1e-4

RANDOM_SEED = 42

## Imports

In [32]:
import torch
import numpy as np
import matplotlib.pyplot as plt
from tqdm import tqdm
from collections import Counter

torch.manual_seed(RANDOM_SEED)
np.random.seed(RANDOM_SEED)

device = torch.device('mps' if torch.backends.mps.is_available() else 'cpu')
print(f"Using device: {device}")
print("✓ Imports complete")

Using device: mps
✓ Imports complete


## Generate Sigma Values

In [33]:
sigma_values = np.logspace(np.log10(SIGMA_MIN), np.log10(SIGMA_MAX), NUM_SIGMA)

print(f"Sigma sweep: {SIGMA_MIN:.2e} to {SIGMA_MAX:.2e}")
print(f"Samples: {NUM_SIGMA}\n")
for i, s in enumerate(sigma_values):
    print(f"  [{i}] {s:.6e}")

Sigma sweep: 1.00e-06 to 1.00e-04
Samples: 10

  [0] 1.000000e-06
  [1] 1.668101e-06
  [2] 2.782559e-06
  [3] 4.641589e-06
  [4] 7.742637e-06
  [5] 1.291550e-05
  [6] 2.154435e-05
  [7] 3.593814e-05
  [8] 5.994843e-05
  [9] 1.000000e-04


## Trial Function

In [34]:
def run_trial(sigma):
    """
    Answer the three questions:
    1. Fully connected?
    2. Quantized at bf16 limit?
    3. Plausible demographics?
    """
    # Initialize in f32
    random_vector = torch.randn(HIDDEN_DIM, dtype=torch.float32, device=device)
    random_vector = random_vector / random_vector.norm()
    
    noise = torch.randn(N_TOKENS, HIDDEN_DIM, dtype=torch.float32, device=device) * sigma
    init_f32 = random_vector + noise
    
    # Convert to bf16
    init_bf16 = init_f32.to(torch.bfloat16).float()
    
    # Q1 & Q2: Compute pairwise L∞ distances (chunked to avoid OOM)
    CHUNK_SIZE = 300
    n = N_TOKENS
    linf_distances = []
    
    for i in range(0, n, CHUNK_SIZE):
        chunk = init_bf16[i:i+CHUNK_SIZE]
        diff = torch.abs(chunk.unsqueeze(1) - init_bf16.unsqueeze(0))
        linf = torch.max(diff, dim=2)[0]
        linf_distances.append(linf)
    
    linf_dist = torch.cat(linf_distances, dim=0)  # [n, n]
    
    # Q1: Adjacency density
    adjacency = (linf_dist < LINF_THRESHOLD).float()
    adjacency.fill_diagonal_(0)
    
    n_possible_edges = N_TOKENS * (N_TOKENS - 1)
    n_actual_edges = adjacency.sum().item()
    density = n_actual_edges / n_possible_edges
    
    # Q2: L∞ statistics
    upper_tri_indices = torch.triu_indices(n, n, offset=1)
    pairwise_linf = linf_dist[upper_tri_indices[0], upper_tri_indices[1]].cpu().numpy()
    
    mean_linf = pairwise_linf.mean()
    median_linf = np.median(pairwise_linf)
    
    # Q3: Demographics
    hashes = [hash(tuple(vec.cpu().numpy())) for vec in init_bf16]
    populations = Counter(hashes).values()
    sorted_pops = sorted(populations, reverse=True)
    n_black_holes = len([p for p in sorted_pops if p >= 2])
    
    return {
        'density': density,
        'mean_linf': mean_linf,
        'median_linf': median_linf,
        'n_black_holes': n_black_holes,
        'demographics': sorted_pops[:13],
    }

print("✓ Trial function defined")

✓ Trial function defined


## Run Sweep

In [35]:
print(f"\nRunning {NUM_SIGMA} trials...\n")

results = {}

for sigma in tqdm(sigma_values, desc="Sigma sweep"):
    results[sigma] = run_trial(sigma)

print("\n✓ Sweep complete")


Running 10 trials...



Sigma sweep: 100%|██████████| 10/10 [00:26<00:00,  2.63s/it]


✓ Sweep complete





## Results Table

In [36]:
print(f"\n{'='*100}")
print(f"{'σ':>12} | {'Density':>8} | {'Mean L∞':>10} | {'Median L∞':>10} | {'Black Holes':>12} | Demographics")
print(f"{'='*100}")

for sigma in sigma_values:
    r = results[sigma]
    demo_str = str(r['demographics'][:5]) + '...' if len(r['demographics']) > 5 else str(r['demographics'])
    print(f"{sigma:12.2e} | {r['density']:8.4f} | {r['mean_linf']:10.2e} | {r['median_linf']:10.2e} | {r['n_black_holes']:12d} | {demo_str}")

print(f"{'='*100}")
print(f"\nQwen target: 13 black holes, demographics [814, 704, 306, 228, 11, ...]")
print(f"13.4a result (σ=1e-5): density=1.0, 13 unique vectors, demographics [19, 14, 3, 3, 3, ...]")


           σ |  Density |    Mean L∞ |  Median L∞ |  Black Holes | Demographics
    1.00e-06 |   0.0000 |   2.41e-04 |   2.44e-04 |            0 | [1, 1, 1, 1, 1]...
    1.67e-06 |   0.0000 |   2.26e-04 |   2.44e-04 |            0 | [1, 1, 1, 1, 1]...
    2.78e-06 |   0.0000 |   2.44e-04 |   2.44e-04 |            0 | [1, 1, 1, 1, 1]...
    4.64e-06 |   0.0000 |   3.00e-04 |   2.44e-04 |            0 | [1, 1, 1, 1, 1]...
    7.74e-06 |   0.0000 |   2.61e-04 |   2.44e-04 |            0 | [1, 1, 1, 1, 1]...
    1.29e-05 |   0.0000 |   3.63e-04 |   2.44e-04 |            0 | [1, 1, 1, 1, 1]...
    2.15e-05 |   0.0000 |   2.44e-04 |   2.44e-04 |            0 | [1, 1, 1, 1, 1]...
    3.59e-05 |   0.0000 |   3.51e-04 |   2.44e-04 |            0 | [1, 1, 1, 1, 1]...
    5.99e-05 |   0.0000 |   4.40e-04 |   4.88e-04 |            0 | [1, 1, 1, 1, 1]...
    1.00e-04 |   0.0000 |   5.31e-04 |   4.92e-04 |            0 | [1, 1, 1, 1, 1]...

Qwen target: 13 black holes, demographics [814, 704, 306, 

## Interpretation

**Question 1: Fully connected?**
- Density = 1.0 → every vector is within L∞ < 1e-4 of every other
- Density < 1.0 → partial connectivity

**Question 2: Quantized at bf16 limit?**
- Mean/median L∞ ~ 3e-5 to 6e-5 → quantization structure evident
- Much larger → vectors are distinguishable, no clustering

**Question 3: Plausible demographics?**
- Many black holes with varied populations → structure exists
- All singletons [1, 1, 1, ...] → no clustering
- One giant cluster [2221] → complete collapse