# 12.2e: Qwen 3 4B Comprehensive Statistics

**Goal:** Compute the EXACT SAME statistics as 12.2d, but for Qwen 3 4B Instruct 2507's actual dead token structure.

## What We Compute (Single Model)

Same 20+ metrics as 12.2d:
- Basic counts (n_unique, n_black_holes, n_singletons, populations)
- Per-black-hole statistics (largest, smallest, mean, median, top2, Gini)
- Spatial extent (max/mean/median L∞ distances)
- Topology (components, isolated nodes, largest component size/density, global density)

## Approach

- Load Qwen's gamma matrix and black hole mask (pre-computed)
- Extract dead token embeddings (2,100 tokens)
- Run `torch.unique()` to get unique vectors + counts
- Compute all statistics using same functions as 12.2d
- Save results to CSV: **one row** with all metrics

## Output

`../data/analysis/qwen_comprehensive.csv`

**Runtime:** <1 second (single trial)

## Why This Matters

Having Qwen's stats in the EXACT same format as the synthetic trials lets us do direct comparisons in 12.3e:
- Qwen value vs synthetic (mean ± std)
- How many σ away is Qwen from the synthetic distribution?
- Which metrics match perfectly (topology) vs which have variance (population)?

## Parameters

In [1]:
# Pre-computed tensors
GAMMA_PATH = "../data/tensors/gamma_qwen3_4b_instruct_2507.safetensors"
MASK_PATH = "../data/tensors/black_hole_mask.safetensors"

# Reference scale
EPSILON = 6e-5  # bfloat16 ULP at Qwen magnitude

# Adjacency threshold for topology
TOUCHING_THRESHOLD = 2 * EPSILON

# Output
OUTPUT_CSV = "../data/analysis/qwen_comprehensive.csv"

RANDOM_SEED = 42

## Imports

In [2]:
import torch
import numpy as np
import pandas as pd
import networkx as nx
from safetensors.torch import load_file
from pathlib import Path

torch.manual_seed(RANDOM_SEED)
np.random.seed(RANDOM_SEED)

print("✓ Imports complete")

✓ Imports complete


## Helper Functions (Same as 12.2d)

In [3]:
def compute_gini_coefficient(populations):
    """
    Compute Gini coefficient of inequality.
    """
    if len(populations) <= 1:
        return 0.0
    
    sorted_pops = torch.sort(populations.float())[0]
    n = len(sorted_pops)
    
    indices = torch.arange(1, n + 1, dtype=torch.float32)
    numerator = 2 * torch.sum(indices * sorted_pops)
    denominator = n * torch.sum(sorted_pops)
    
    if denominator == 0:
        return 0.0
    
    gini = (numerator / denominator) - (n + 1) / n
    return gini.item()


def compute_l_inf_stats(unique_vectors, epsilon):
    """
    Compute L∞ (Chebyshev) distance statistics.
    """
    n = len(unique_vectors)
    
    if n <= 1:
        return {'max_l_inf': 0.0, 'mean_l_inf': 0.0, 'median_l_inf': 0.0}
    
    # Compute pairwise L∞ distances (vectorized)
    v1 = unique_vectors.unsqueeze(1)
    v2 = unique_vectors.unsqueeze(0)
    diffs = v1 - v2
    l_inf_matrix = torch.abs(diffs).max(dim=2)[0]
    
    # Exclude diagonal
    mask = ~torch.eye(n, dtype=torch.bool)
    l_inf_values = l_inf_matrix[mask]
    
    # Normalize by epsilon
    l_inf_values = l_inf_values / epsilon
    
    return {
        'max_l_inf': l_inf_values.max().item(),
        'mean_l_inf': l_inf_values.mean().item(),
        'median_l_inf': l_inf_values.median().item(),
    }


def compute_topology(unique_vectors, threshold):
    """
    Compute graph topology statistics.
    """
    n = len(unique_vectors)
    
    if n == 0:
        return {
            'n_components': 0,
            'n_isolated': 0,
            'largest_component_size': 0,
            'largest_component_density': 0.0,
            'global_density': 0.0,
        }
    
    if n == 1:
        return {
            'n_components': 1,
            'n_isolated': 1,
            'largest_component_size': 1,
            'largest_component_density': 1.0,
            'global_density': 1.0,
        }
    
    # Compute pairwise L∞ distances
    v1 = unique_vectors.unsqueeze(1)
    v2 = unique_vectors.unsqueeze(0)
    diffs = v1 - v2
    l_inf_matrix = torch.abs(diffs).max(dim=2)[0]
    
    # Build adjacency matrix
    adjacency = (l_inf_matrix <= threshold) & (~torch.eye(n, dtype=torch.bool))
    
    # Convert to NetworkX graph
    G = nx.Graph()
    G.add_nodes_from(range(n))
    edges = torch.nonzero(adjacency, as_tuple=False).tolist()
    G.add_edges_from(edges)
    
    # Connected components
    components = list(nx.connected_components(G))
    component_sizes = sorted([len(c) for c in components], reverse=True)
    
    n_components = len(components)
    largest_size = component_sizes[0] if component_sizes else 0
    
    # Isolated nodes
    n_isolated = sum(1 for node in G.nodes() if G.degree(node) == 0)
    
    # Density of largest component
    if largest_size > 1:
        largest_component = max(components, key=len)
        subgraph = G.subgraph(largest_component)
        n_edges = subgraph.number_of_edges()
        max_edges = largest_size * (largest_size - 1) // 2
        largest_density = n_edges / max_edges if max_edges > 0 else 0.0
    else:
        largest_density = 1.0 if largest_size == 1 else 0.0
    
    # Global density
    n_edges = G.number_of_edges()
    max_edges = n * (n - 1) // 2
    global_density = n_edges / max_edges if max_edges > 0 else 0.0
    
    return {
        'n_components': n_components,
        'n_isolated': n_isolated,
        'largest_component_size': largest_size,
        'largest_component_density': largest_density,
        'global_density': global_density,
    }


def compute_trial_statistics(embeddings, epsilon, threshold):
    """
    Compute all statistics for a single trial.
    """
    # Get unique vectors and counts
    unique_vectors, _, counts = torch.unique(
        embeddings,
        dim=0,
        return_inverse=True,
        return_counts=True
    )
    
    # Basic counts
    n_tokens = len(embeddings)
    n_unique = len(unique_vectors)
    black_hole_mask = counts >= 2
    n_black_holes = black_hole_mask.sum().item()
    n_singletons = (~black_hole_mask).sum().item()
    total_population = counts.sum().item()
    black_hole_population = counts[black_hole_mask].sum().item() if n_black_holes > 0 else 0
    
    # Per-black-hole statistics
    if n_black_holes > 0:
        bh_populations = counts[black_hole_mask]
        largest_bh = bh_populations.max().item()
        smallest_bh = bh_populations.min().item()
        mean_bh_size = bh_populations.float().mean().item()
        median_bh_size = bh_populations.float().median().item()
        
        # Top-2 concentration
        top_k = min(2, len(bh_populations))
        top2_population = bh_populations.topk(top_k)[0].sum().item()
        
        # Gini coefficient
        gini = compute_gini_coefficient(bh_populations)
    else:
        largest_bh = 0
        smallest_bh = 0
        mean_bh_size = 0.0
        median_bh_size = 0.0
        top2_population = 0
        gini = 0.0
    
    # Spatial extent (L∞ distances)
    l_inf_stats = compute_l_inf_stats(unique_vectors, epsilon)
    
    # Topology
    topology_stats = compute_topology(unique_vectors, threshold)
    
    # Combine all statistics
    return {
        # Basic counts
        'n_tokens': n_tokens,
        'n_unique': n_unique,
        'n_black_holes': n_black_holes,
        'n_singletons': n_singletons,
        'total_population': total_population,
        'black_hole_population': black_hole_population,
        
        # Per-BH statistics
        'largest_bh': largest_bh,
        'smallest_bh': smallest_bh,
        'mean_bh_size': mean_bh_size,
        'median_bh_size': median_bh_size,
        'top2_population': top2_population,
        'gini_coefficient': gini,
        
        # Spatial extent
        'max_l_inf': l_inf_stats['max_l_inf'],
        'mean_l_inf': l_inf_stats['mean_l_inf'],
        'median_l_inf': l_inf_stats['median_l_inf'],
        
        # Topology
        'n_components': topology_stats['n_components'],
        'n_isolated': topology_stats['n_isolated'],
        'largest_component_size': topology_stats['largest_component_size'],
        'largest_component_density': topology_stats['largest_component_density'],
        'global_density': topology_stats['global_density'],
    }

print("✓ Helper functions defined")

✓ Helper functions defined


## Load Qwen Data

In [4]:
print("Loading Qwen 3 4B Instruct 2507 data...\n")

# Load gamma matrix (unembedding weights)
gamma_data = load_file(GAMMA_PATH)
gamma = gamma_data['gamma'].to(torch.float32)

print(f"✓ Gamma loaded")
print(f"  Shape: {gamma.shape}")
print(f"  Dtype: {gamma.dtype}")

# Load black hole mask
mask_data = load_file(MASK_PATH)
mask = mask_data['mask']

print(f"\n✓ Black hole mask loaded")
print(f"  Dead tokens: {mask.sum().item():,}")

# Extract dead token embeddings
dead_token_embeddings = gamma[mask]

print(f"\n✓ Extracted dead token embeddings")
print(f"  Shape: {dead_token_embeddings.shape}")

Loading Qwen 3 4B Instruct 2507 data...

✓ Gamma loaded
  Shape: torch.Size([151936, 2560])
  Dtype: torch.float32

✓ Black hole mask loaded
  Dead tokens: 2,100

✓ Extracted dead token embeddings
  Shape: torch.Size([2100, 2560])


## Compute Statistics

In [5]:
print("\nComputing comprehensive statistics for Qwen...\n")

# Compute all statistics using the same function as 12.2d
qwen_stats = compute_trial_statistics(dead_token_embeddings, EPSILON, TOUCHING_THRESHOLD)

# Add a trial_id (just 0 for Qwen)
qwen_stats['trial_id'] = 0

print("✓ Statistics computed")


Computing comprehensive statistics for Qwen...

✓ Statistics computed


## Display Results

In [6]:
print(f"\n{'='*70}")
print(f"QWEN 3 4B INSTRUCT 2507 COMPREHENSIVE STATISTICS")
print(f"{'='*70}\n")

print("BASIC COUNTS:")
print(f"  Total tokens: {qwen_stats['n_tokens']}")
print(f"  Unique vectors: {qwen_stats['n_unique']}")
print(f"  Black holes (C ≥ 2): {qwen_stats['n_black_holes']}")
print(f"  Singletons (C = 1): {qwen_stats['n_singletons']}")
print(f"  Total population: {qwen_stats['total_population']}")
print(f"  Black hole population: {qwen_stats['black_hole_population']}")

print(f"\nPER-BLACK-HOLE STATISTICS:")
print(f"  Largest BH: {qwen_stats['largest_bh']}")
print(f"  Smallest BH: {qwen_stats['smallest_bh']}")
print(f"  Mean BH size: {qwen_stats['mean_bh_size']:.1f}")
print(f"  Median BH size: {qwen_stats['median_bh_size']:.1f}")
print(f"  Top-2 population: {qwen_stats['top2_population']} ({qwen_stats['top2_population']/qwen_stats['total_population']*100:.1f}%)")
print(f"  Gini coefficient: {qwen_stats['gini_coefficient']:.3f}")

print(f"\nSPATIAL EXTENT (units of ε):")
print(f"  Max L∞: {qwen_stats['max_l_inf']:.3f}")
print(f"  Mean L∞: {qwen_stats['mean_l_inf']:.3f}")
print(f"  Median L∞: {qwen_stats['median_l_inf']:.3f}")

print(f"\nTOPOLOGY:")
print(f"  Connected components: {qwen_stats['n_components']}")
print(f"  Isolated nodes: {qwen_stats['n_isolated']}")
print(f"  Largest component size: {qwen_stats['largest_component_size']}")
print(f"  Largest component density: {qwen_stats['largest_component_density']:.3f}")
print(f"  Global density: {qwen_stats['global_density']:.3f}")

print(f"\n{'='*70}")


QWEN 3 4B INSTRUCT 2507 COMPREHENSIVE STATISTICS

BASIC COUNTS:
  Total tokens: 2100
  Unique vectors: 13
  Black holes (C ≥ 2): 13
  Singletons (C = 1): 0
  Total population: 2100
  Black hole population: 2100

PER-BLACK-HOLE STATISTICS:
  Largest BH: 814
  Smallest BH: 2
  Mean BH size: 161.5
  Median BH size: 6.0
  Top-2 population: 1518 (72.3%)
  Gini coefficient: 0.753

SPATIAL EXTENT (units of ε):
  Max L∞: 1.017
  Mean L∞: 0.483
  Median L∞: 0.509

TOPOLOGY:
  Connected components: 1
  Isolated nodes: 0
  Largest component size: 13
  Largest component density: 1.000
  Global density: 1.000



## Save to CSV

In [7]:
# Convert to DataFrame (single row)
df = pd.DataFrame([qwen_stats])

# Reorder columns to match 12.2d
column_order = [
    'trial_id',
    # Basic counts
    'n_tokens', 'n_unique', 'n_black_holes', 'n_singletons',
    'total_population', 'black_hole_population',
    # Per-BH stats
    'largest_bh', 'smallest_bh', 'mean_bh_size', 'median_bh_size',
    'top2_population', 'gini_coefficient',
    # Spatial
    'max_l_inf', 'mean_l_inf', 'median_l_inf',
    # Topology
    'n_components', 'n_isolated', 'largest_component_size',
    'largest_component_density', 'global_density',
]

df = df[column_order]

# Save
output_path = Path(OUTPUT_CSV)
output_path.parent.mkdir(parents=True, exist_ok=True)
df.to_csv(output_path, index=False)

print(f"\n✓ Saved to {output_path}")
print(f"  Columns: {len(df.columns)}")
print(f"  File size: {output_path.stat().st_size / 1024:.2f} KB")

print(f"\n{'='*70}")
print(f"QWEN STATISTICS SAVED")
print(f"{'='*70}")
print(f"\nReady for comparison in 12.3e!")


✓ Saved to ../data/analysis/qwen_comprehensive.csv
  Columns: 21
  File size: 0.44 KB

QWEN STATISTICS SAVED

Ready for comparison in 12.3e!
