# 09.1b: Black Hole Cluster Centroid

**Compute and save the centroid of the Qwen 3 4B black hole cluster**

Simple, methodical, reusable. We have 2,100 black hole tokens. Let's find their center of mass in γ-space and save it for future analysis.

## Why This Matters

The centroid serves as:
- Reference point for measuring distances to the cluster
- Center for radius-based searches
- Geometric anchor for understanding cluster structure

We'll save it as a reusable tensor so downstream notebooks don't need to recompute it.

## Parameters

In [1]:
TENSOR_DIR = "../data/tensors"

# Input files
GAMMA_FILE = "gamma_centered_qwen3_4b_instruct_2507.safetensors"
GAMMA_KEY = "gamma_centered"

BLACK_HOLE_MASK_FILE = "black_hole_mask.safetensors"
BLACK_HOLE_MASK_KEY = "mask"

# Output file
OUTPUT_FILE = "black_hole_centroid_qwen3_4b.safetensors"
OUTPUT_KEY = "centroid"

RANDOM_SEED = 42

## Imports

In [2]:
import torch
import numpy as np
from safetensors.torch import load_file, save_file
from pathlib import Path

torch.manual_seed(RANDOM_SEED)
np.random.seed(RANDOM_SEED)

print("✓ Imports loaded")

✓ Imports loaded


## Load Data

In [3]:
data_dir = Path(TENSOR_DIR)

print("Loading gamma matrix...")
gamma_data = load_file(data_dir / GAMMA_FILE)
gamma = gamma_data[GAMMA_KEY]
N, d = gamma.shape
print(f"  Shape: ({N:,}, {d:,})")
print(f"  Dtype: {gamma.dtype}")
print()

print("Loading black hole mask...")
bh_data = load_file(data_dir / BLACK_HOLE_MASK_FILE)
black_hole_mask = bh_data[BLACK_HOLE_MASK_KEY]
n_bh = black_hole_mask.sum().item()
print(f"  Black hole tokens: {n_bh:,}")
print()

print("✓ Data loaded")

Loading gamma matrix...
  Shape: (151,936, 2,560)
  Dtype: torch.float32

Loading black hole mask...
  Black hole tokens: 2,100

✓ Data loaded


## Extract Black Hole Vectors

In [4]:
print("Extracting black hole vectors...")
bh_vectors = gamma[black_hole_mask]
print(f"  Shape: {bh_vectors.shape}")
print(f"  Memory: {bh_vectors.numel() * 4 / 1e6:.2f} MB")
print()
print("✓ Black hole vectors extracted")

Extracting black hole vectors...
  Shape: torch.Size([2100, 2560])
  Memory: 21.50 MB

✓ Black hole vectors extracted


## Compute Centroid

The centroid is simply the mean across all black hole vectors:

$$\mathbf{c} = \frac{1}{N_{\text{bh}}} \sum_{i \in \text{black holes}} \gamma_i$$

where $N_{\text{bh}} = 2{,}100$.

In [5]:
print("Computing centroid...\n")

centroid = bh_vectors.mean(dim=0)

print(f"Centroid shape: {centroid.shape}")
print(f"Centroid dtype: {centroid.dtype}")
print()

# Basic statistics
print("Centroid component statistics:")
print(f"  Min:    {centroid.min().item():.6e}")
print(f"  Max:    {centroid.max().item():.6e}")
print(f"  Mean:   {centroid.mean().item():.6e}")
print(f"  Median: {centroid.median().item():.6e}")
print(f"  Std:    {centroid.std().item():.6e}")
print()

# L2 norm (distance from origin)
centroid_norm = torch.norm(centroid, p=2).item()
print(f"Centroid L2 norm (distance from origin): {centroid_norm:.6f}")
print()

print("✓ Centroid computed")

Computing centroid...

Centroid shape: torch.Size([2560])
Centroid dtype: torch.float32

Centroid component statistics:
  Min:    -2.559250e-02
  Max:    3.293261e-02
  Mean:   -3.200741e-05
  Median: -9.289977e-05
  Std:    3.282552e-03

Centroid L2 norm (distance from origin): 0.166061

✓ Centroid computed


## Verify: Mean Distance to Centroid

Sanity check - compute the mean distance from black hole tokens to their centroid. This should be small since they form a tight cluster.

In [6]:
print("Computing distances from black holes to centroid...\n")

# L2 distances
diffs = bh_vectors - centroid.unsqueeze(0)
distances_l2 = torch.norm(diffs, p=2, dim=1)

# L∞ distances
distances_linf = torch.abs(diffs).max(dim=1)[0]

print("L2 distances to centroid:")
print(f"  Min:    {distances_l2.min().item():.6e}")
print(f"  Max:    {distances_l2.max().item():.6e}")
print(f"  Mean:   {distances_l2.mean().item():.6e}")
print(f"  Median: {distances_l2.median().item():.6e}")
print()

print("L∞ distances to centroid:")
print(f"  Min:    {distances_linf.min().item():.6e}")
print(f"  Max:    {distances_linf.max().item():.6e}")
print(f"  Mean:   {distances_linf.mean().item():.6e}")
print(f"  Median: {distances_linf.median().item():.6e}")
print()

print("✓ Verification complete")

Computing distances from black holes to centroid...

L2 distances to centroid:
  Min:    1.671312e-05
  Max:    6.527175e-05
  Mean:   1.816918e-05
  Median: 1.784609e-05

L∞ distances to centroid:
  Min:    1.496798e-05
  Max:    6.091897e-05
  Mean:   1.537513e-05
  Median: 1.496798e-05

✓ Verification complete


## Save Centroid

In [7]:
output_path = data_dir / OUTPUT_FILE

print(f"Saving centroid to {output_path}...")

save_file({OUTPUT_KEY: centroid}, output_path)

print(f"  File size: {output_path.stat().st_size / 1024:.2f} KB")
print()
print(f"✓ Centroid saved to {OUTPUT_FILE}")
print()
print("To load in future notebooks:")
print(f"  data = load_file('{OUTPUT_FILE}')")
print(f"  centroid = data['{OUTPUT_KEY}']")

Saving centroid to ../data/tensors/black_hole_centroid_qwen3_4b.safetensors...
  File size: 10.08 KB

✓ Centroid saved to black_hole_centroid_qwen3_4b.safetensors

To load in future notebooks:
  data = load_file('black_hole_centroid_qwen3_4b.safetensors')
  centroid = data['centroid']


## Conclusion

**What we computed:**
- Centroid of 2,100 black hole tokens in γ-space
- Mean distance from black holes to centroid
- Saved for reuse

**Next steps:**
- 09.1c: Find all tokens within radius r of centroid
- 09.1d: Compute pairwise distances for cluster neighborhood
- 09.1e: Build adjacency graph and analyze connectivity