# 04.6a2 Quasar Angles: Are They Clustered or Scattered?

From 04.6a, we found the top 3 quasars:
1. `<|endoftext|>` - 85.3 logometers
2. `\n` (newline) - 82.1 logometers
3. `\u200b\u200b` (double zero-width space) - 81.2 logometers

**Question:** Are these three quasars pointing in similar directions (angularly clustered), or are they scattered across different regions of the 2560D hypersphere?

**Method:**
- Compute pairwise angles between the three quasars
- Measure in both Euclidean and causal metrics
- Compare to typical token pair angles (~85° from 04.5c)

**Interpretation:**
- Small angles (< 30°) → quasars are clustered in same region ("structural token constellation")
- Large angles (~90°) → quasars are scattered, nearly orthogonal
- Very large angles (> 120°) → quasars are in opposite hemihyperspheres

## Configuration

In [1]:
import torch
import numpy as np
from pathlib import Path
from transformers import AutoModelForCausalLM, AutoTokenizer

# Model and paths
MODEL_NAME = "Qwen/Qwen3-4B-Instruct-2507"
METRIC_PATH = Path("../data/vectors/causal_metric_tensor_qwen3_4b.pt")

# Device
device = torch.device("mps" if torch.backends.mps.is_available() else "cpu")
print(f"Using device: {device}")

Using device: mps


## Load Model, Tokenizer, and Metric Tensor

In [2]:
# Load metric tensor
print("Loading causal metric tensor...")
metric_data = torch.load(METRIC_PATH, map_location=device, weights_only=True)
M = metric_data['M']
print(f"M shape: {M.shape}")
print()

# Load model for gamma matrix
print(f"Loading {MODEL_NAME}...")
model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    torch_dtype=torch.bfloat16,
    device_map=device,
)
model.eval()

# Extract gamma (unembedding matrix)
gamma = model.lm_head.weight.data.to(dtype=torch.float32, device=device)
vocab_size, hidden_dim = gamma.shape
print(f"Gamma shape: {gamma.shape}")
print()

# Load tokenizer for decoding
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
print("Tokenizer loaded.")

Loading causal metric tensor...
M shape: torch.Size([2560, 2560])

Loading Qwen/Qwen3-4B-Instruct-2507...


`torch_dtype` is deprecated! Use `dtype` instead!


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

Gamma shape: torch.Size([151936, 2560])

Tokenizer loaded.


## Extract Quasar Vectors

In [3]:
# Token IDs from 04.6a results
quasar_ids = [
    151643,  # <|endoftext|>
    198,     # \n
    72363,   # \u200b\u200b (double zero-width space)
]

quasar_names = [
    '<|endoftext|>',
    '\\n (newline)',
    '\\u200b\\u200b (zero-width space)',
]

# Extract vectors
quasar_vectors = gamma[quasar_ids]  # [3, hidden_dim]

print("Quasar vectors extracted:")
for i, (qid, name) in enumerate(zip(quasar_ids, quasar_names)):
    decoded = tokenizer.decode([qid])
    print(f"  {i+1}. Token ID {qid:6d} | {name:30s} | Decoded: {repr(decoded)}")
print()

Quasar vectors extracted:
  1. Token ID 151643 | <|endoftext|>                  | Decoded: '<|endoftext|>'
  2. Token ID    198 | \n (newline)                   | Decoded: '\n'
  3. Token ID  72363 | \u200b\u200b (zero-width space) | Decoded: ' \u200b\u200b'



## Compute Euclidean Angles

In [4]:
print("Computing Euclidean angles between quasars...")
print("="*70)
print()

# Pairwise angles
pairs = [(0, 1), (0, 2), (1, 2)]
pair_labels = [
    ('endoftext', 'newline'),
    ('endoftext', 'zero-width'),
    ('newline', 'zero-width'),
]

euclidean_angles = []

for (i, j), (name1, name2) in zip(pairs, pair_labels):
    v1 = quasar_vectors[i]
    v2 = quasar_vectors[j]
    
    # Euclidean angle
    dot_euc = (v1 * v2).sum()
    norm1_euc = torch.norm(v1)
    norm2_euc = torch.norm(v2)
    cos_theta_euc = dot_euc / (norm1_euc * norm2_euc)
    cos_theta_euc = torch.clamp(cos_theta_euc, -1.0, 1.0)
    theta_euc = torch.acos(cos_theta_euc)
    theta_euc_deg = torch.rad2deg(theta_euc).item()
    
    euclidean_angles.append(theta_euc_deg)
    
    print(f"{name1:15s} ↔ {name2:15s} : {theta_euc_deg:6.2f}°")

print()
print(f"Mean Euclidean angle: {np.mean(euclidean_angles):.2f}°")
print(f"For reference: typical token pairs are ~85° apart (from 04.5c)")
print()

Computing Euclidean angles between quasars...

endoftext       ↔ newline         :  69.72°
endoftext       ↔ zero-width      :  93.15°
newline         ↔ zero-width      :  96.31°

Mean Euclidean angle: 86.39°
For reference: typical token pairs are ~85° apart (from 04.5c)



## Compute Causal Angles

In [5]:
print("Computing causal angles between quasars...")
print("="*70)
print()

causal_angles = []

for (i, j), (name1, name2) in zip(pairs, pair_labels):
    v1 = quasar_vectors[i]
    v2 = quasar_vectors[j]
    
    # Causal inner product: v1^T M v2
    v1_M = v1 @ M
    dot_caus = (v1_M * v2).sum()
    
    # Causal norms
    norm1_caus_sq = (v1 @ M * v1).sum()
    norm1_caus = torch.sqrt(torch.clamp(norm1_caus_sq, min=0))
    
    norm2_caus_sq = (v2 @ M * v2).sum()
    norm2_caus = torch.sqrt(torch.clamp(norm2_caus_sq, min=0))
    
    # Causal angle
    cos_theta_caus = dot_caus / (norm1_caus * norm2_caus)
    cos_theta_caus = torch.clamp(cos_theta_caus, -1.0, 1.0)
    theta_caus = torch.acos(cos_theta_caus)
    theta_caus_deg = torch.rad2deg(theta_caus).item()
    
    causal_angles.append(theta_caus_deg)
    
    print(f"{name1:15s} ↔ {name2:15s} : {theta_caus_deg:6.2f}°")

print()
print(f"Mean causal angle: {np.mean(causal_angles):.2f}°")
print(f"For reference: typical token pairs are ~81° apart (from 04.5c)")
print()

Computing causal angles between quasars...

endoftext       ↔ newline         :  79.62°
endoftext       ↔ zero-width      :  89.28°
newline         ↔ zero-width      :  88.41°

Mean causal angle: 85.77°
For reference: typical token pairs are ~81° apart (from 04.5c)



## Compare Euclidean vs Causal Angles

In [6]:
print("Comparison: Euclidean vs Causal Angles")
print("="*70)
print()
print(f"{'Pair':<35s} {'Euclidean':>10s} {'Causal':>10s} {'Δθ':>10s}")
print("-"*70)

for (name1, name2), θ_euc, θ_caus in zip(pair_labels, euclidean_angles, causal_angles):
    delta_theta = θ_caus - θ_euc
    pair_label = f"{name1} ↔ {name2}"
    print(f"{pair_label:<35s} {θ_euc:>9.2f}° {θ_caus:>9.2f}° {delta_theta:>+9.2f}°")

print()
print("Interpretation:")
mean_euc = np.mean(euclidean_angles)
mean_caus = np.mean(causal_angles)

if mean_euc < 30:
    print("  → Quasars are TIGHTLY CLUSTERED (same region of hypersphere)")
elif mean_euc < 60:
    print("  → Quasars are MODERATELY CLUSTERED (nearby but distinct)")
elif mean_euc < 100:
    print("  → Quasars are WIDELY SEPARATED (nearly orthogonal)")
else:
    print("  → Quasars are in OPPOSITE HEMISPHERES (obtuse angles)")

print()
print(f"Average angular distortion: {np.mean(np.array(causal_angles) - np.array(euclidean_angles)):.2f}°")
print(f"(Typical distortion from 04.5c: -4.32°)")

Comparison: Euclidean vs Causal Angles

Pair                                 Euclidean     Causal         Δθ
----------------------------------------------------------------------
endoftext ↔ newline                     69.72°     79.62°     +9.90°
endoftext ↔ zero-width                  93.15°     89.28°     -3.87°
newline ↔ zero-width                    96.31°     88.41°     -7.90°

Interpretation:
  → Quasars are WIDELY SEPARATED (nearly orthogonal)

Average angular distortion: -0.62°
(Typical distortion from 04.5c: -4.32°)


## Summary

This notebook measured the angular relationships between the three brightest quasars in token space.

**Key questions answered:**
- Are structural tokens (endoftext, newline, zero-width space) angularly clustered?
- Or are they scattered across different regions of the 2560D hypersphere?
- How does the causal metric warp their angular relationships?

**Implications:**
- If clustered: Structural tokens form a distinct "constellation" we can use as a reference frame
- If scattered: Each quasar points to a different region—we have multiple independent reference directions