# 08.4c: Pairwise Distance Matrices (Parallel Runs)

**Compute pairwise distance matrices for all 16 parallel training runs**

This notebook processes all 16 training runs and computes Chebyshev distance matrices for each step. These matrices are essential for adjacency graph analysis and black hole fission detection.

## Distance Metric

**Chebyshev (L∞)**: Maximum absolute difference across dimensions
$$d_\infty(u, v) = \max_i |u_i - v_i|$$

Chebyshev is ideal for detecting quantization neighbors—tokens differing by less than the bfloat16 quantization threshold in every dimension.

## Output

For each run, saves:
- `chebyshev_distances`: (10001, 128, 128) float32
- Total per run: ~655 MB
- Total for 16 runs: ~10.5 GB

## Parameters

In [1]:
# Data directories
DATA_DIR = "../data"
RUN_PATTERN = "embeddings_128vocab_qweninit_run_*"
EMBEDDING_FILE = "embedding_evolution.safetensors"
EMBEDDING_KEY = "embedding_history"
OUTPUT_FILE = "pairwise_distances.safetensors"

# Expected dimensions
EXPECTED_RUNS = 16
EXPECTED_STEPS = 10001
VOCAB_SIZE = 128
HIDDEN_DIM = 64

RANDOM_SEED = 42

## Imports

In [2]:
import torch
import numpy as np
from safetensors.torch import load_file, save_file
from pathlib import Path
from tqdm.auto import tqdm

torch.manual_seed(RANDOM_SEED)
np.random.seed(RANDOM_SEED)

## Hardware Setup

In [3]:
# Auto-detect best available device
if torch.cuda.is_available():
    device = torch.device("cuda")
    device_name = torch.cuda.get_device_name(0)
elif torch.backends.mps.is_available():
    device = torch.device("mps")
    device_name = "Apple Silicon (MPS)"
else:
    device = torch.device("cpu")
    device_name = "CPU"

print(f"Using device: {device} ({device_name})")

Using device: mps (Apple Silicon (MPS))


## Find All Runs

In [4]:
data_dir = Path(DATA_DIR)
run_dirs = sorted(data_dir.glob(RUN_PATTERN))

print(f"Found {len(run_dirs)} runs:")
for run_dir in run_dirs:
    print(f"  {run_dir.name}")

if len(run_dirs) != EXPECTED_RUNS:
    print(f"\n⚠ WARNING: Expected {EXPECTED_RUNS} runs, found {len(run_dirs)}")
else:
    print(f"\n✓ Found all {EXPECTED_RUNS} runs")

Found 16 runs:
  embeddings_128vocab_qweninit_run_001
  embeddings_128vocab_qweninit_run_002
  embeddings_128vocab_qweninit_run_003
  embeddings_128vocab_qweninit_run_004
  embeddings_128vocab_qweninit_run_005
  embeddings_128vocab_qweninit_run_006
  embeddings_128vocab_qweninit_run_007
  embeddings_128vocab_qweninit_run_008
  embeddings_128vocab_qweninit_run_009
  embeddings_128vocab_qweninit_run_010
  embeddings_128vocab_qweninit_run_011
  embeddings_128vocab_qweninit_run_012
  embeddings_128vocab_qweninit_run_013
  embeddings_128vocab_qweninit_run_014
  embeddings_128vocab_qweninit_run_015
  embeddings_128vocab_qweninit_run_016

✓ Found all 16 runs


## Process Each Run

For each run:
1. Load embedding history
2. Compute Chebyshev distance matrices (GPU-accelerated)
3. Save to run directory

In [5]:
print(f"\nProcessing runs...\n")

for run_dir in tqdm(run_dirs, desc="Runs"):
    run_name = run_dir.name.split('_')[-1]
    embedding_path = run_dir / EMBEDDING_FILE
    output_path = run_dir / OUTPUT_FILE
    
    # Skip if already computed
    if output_path.exists():
        print(f"  {run_name}: already exists, skipping")
        continue
    
    # Load embedding history
    data = load_file(embedding_path)
    embedding_history = data[EMBEDDING_KEY]
    n_snapshots, vocab_size, hidden_dim = embedding_history.shape
    
    # Validate dimensions
    if (n_snapshots, vocab_size, hidden_dim) != (EXPECTED_STEPS, VOCAB_SIZE, HIDDEN_DIM):
        print(f"  {run_name}: unexpected shape {embedding_history.shape}, skipping")
        continue
    
    # Allocate distance matrix
    chebyshev_distances = torch.zeros((n_snapshots, vocab_size, vocab_size), dtype=torch.float32)
    
    # Move to GPU and compute distances
    embedding_history_gpu = embedding_history.float().to(device)
    
    for i in tqdm(range(n_snapshots), desc=f"  {run_name}", leave=False):
        gamma = embedding_history_gpu[i]  # (128, 64) on GPU
        
        # Compute pairwise differences using broadcasting
        # gamma.unsqueeze(0): (1, 128, 64)
        # gamma.unsqueeze(1): (128, 1, 64)
        # diff: (128, 128, 64)
        diff = gamma.unsqueeze(0) - gamma.unsqueeze(1)
        
        # Chebyshev distance: max absolute difference across dimensions
        # chebyshev: (128, 128)
        chebyshev = torch.abs(diff).max(dim=2)[0]
        
        # Move to CPU and store
        chebyshev_distances[i] = chebyshev.cpu()
    
    # Free GPU memory
    del embedding_history_gpu
    if device.type == "cuda":
        torch.cuda.empty_cache()
    elif device.type == "mps":
        torch.mps.empty_cache()
    
    # Save
    save_file({'chebyshev_distances': chebyshev_distances}, output_path)
    
    file_size_mb = output_path.stat().st_size / 1e6
    print(f"  {run_name}: saved {file_size_mb:.1f} MB")

print(f"\n✓ All runs processed")


Processing runs...



Runs:   0%|          | 0/16 [00:00<?, ?it/s]

  001:   0%|          | 0/10001 [00:00<?, ?it/s]

  001: saved 655.4 MB


  002:   0%|          | 0/10001 [00:00<?, ?it/s]

  002: saved 655.4 MB


  003:   0%|          | 0/10001 [00:00<?, ?it/s]

  003: saved 655.4 MB


  004:   0%|          | 0/10001 [00:00<?, ?it/s]

  004: saved 655.4 MB


  005:   0%|          | 0/10001 [00:00<?, ?it/s]

  005: saved 655.4 MB


  006:   0%|          | 0/10001 [00:00<?, ?it/s]

  006: saved 655.4 MB


  007:   0%|          | 0/10001 [00:00<?, ?it/s]

  007: saved 655.4 MB


  008:   0%|          | 0/10001 [00:00<?, ?it/s]

  008: saved 655.4 MB


  009:   0%|          | 0/10001 [00:00<?, ?it/s]

  009: saved 655.4 MB


  010:   0%|          | 0/10001 [00:00<?, ?it/s]

  010: saved 655.4 MB


  011:   0%|          | 0/10001 [00:00<?, ?it/s]

  011: saved 655.4 MB


  012:   0%|          | 0/10001 [00:00<?, ?it/s]

  012: saved 655.4 MB


  013:   0%|          | 0/10001 [00:00<?, ?it/s]

  013: saved 655.4 MB


  014:   0%|          | 0/10001 [00:00<?, ?it/s]

  014: saved 655.4 MB


  015:   0%|          | 0/10001 [00:00<?, ?it/s]

  015: saved 655.4 MB


  016:   0%|          | 0/10001 [00:00<?, ?it/s]

  016: saved 655.4 MB

✓ All runs processed


## Summary

In [6]:
# Check what we created
print(f"\n{'='*80}")
print("SUMMARY")
print(f"{'='*80}\n")

total_size = 0
for run_dir in sorted(run_dirs):
    output_path = run_dir / OUTPUT_FILE
    if output_path.exists():
        size_mb = output_path.stat().st_size / 1e6
        total_size += size_mb
        run_name = run_dir.name.split('_')[-1]
        print(f"  {run_name}: {size_mb:.1f} MB")

print(f"\nTotal storage: {total_size / 1e3:.2f} GB")
print(f"\nEach file contains:")
print(f"  chebyshev_distances: (10001, 128, 128) float32")
print(f"\n{'='*80}")


SUMMARY

  001: 655.4 MB
  002: 655.4 MB
  003: 655.4 MB
  004: 655.4 MB
  005: 655.4 MB
  006: 655.4 MB
  007: 655.4 MB
  008: 655.4 MB
  009: 655.4 MB
  010: 655.4 MB
  011: 655.4 MB
  012: 655.4 MB
  013: 655.4 MB
  014: 655.4 MB
  015: 655.4 MB
  016: 655.4 MB

Total storage: 10.49 GB

Each file contains:
  chebyshev_distances: (10001, 128, 128) float32

