# Core Density Validation

**Goal:** Validate that density threshold ρ > 0.08 correctly identifies the dense core.

**Method:**
- Load 3D UMAP embedding
- Compute KDE density field (or load from 07.45)
- Create interactive 3D visualization: red if ρ > 0.08, blue otherwise
- Visually confirm core structure

**Inputs:**
- `data/vectors/umap_embedding_32k_3d_causal.npy` - 3D coordinates
- Density computed via KDE (bandwidth 0.3)

**Expected runtime:** ~2 minutes (KDE computation)

## Configuration

In [23]:
# Input files
INPUT_EMBEDDING = '../data/vectors/umap_embedding_32k_3d_causal.npy'

# Density parameters
DENSITY_THRESHOLD = 0.08
KDE_BANDWIDTH = 0.3

print(f"Configuration:")
print(f"  Input embedding: {INPUT_EMBEDDING}")
print(f"  Density threshold: {DENSITY_THRESHOLD}")
print(f"  KDE bandwidth: {KDE_BANDWIDTH}")

Configuration:
  Input embedding: ../data/vectors/umap_embedding_32k_3d_causal.npy
  Density threshold: 0.08
  KDE bandwidth: 0.3


## Setup

In [24]:
import numpy as np
import plotly.graph_objects as go
from scipy.stats import gaussian_kde

print("✓ Imports complete")

✓ Imports complete


## Load Data

In [25]:
print(f"Loading embedding from {INPUT_EMBEDDING}...")
embedding_3d = np.load(INPUT_EMBEDDING)

print(f"✓ Loaded embedding")
print(f"  Shape: {embedding_3d.shape}")
print(f"  X range: [{embedding_3d[:, 0].min():.2f}, {embedding_3d[:, 0].max():.2f}]")
print(f"  Y range: [{embedding_3d[:, 1].min():.2f}, {embedding_3d[:, 1].max():.2f}]")
print(f"  Z range: [{embedding_3d[:, 2].min():.2f}, {embedding_3d[:, 2].max():.2f}]")

N = len(embedding_3d)
print(f"\n✓ Ready to analyze {N:,} tokens")

Loading embedding from ../data/vectors/umap_embedding_32k_3d_causal.npy...
✓ Loaded embedding
  Shape: (32000, 3)
  X range: [4.15, 9.74]
  Y range: [7.56, 12.25]
  Z range: [1.05, 6.24]

✓ Ready to analyze 32,000 tokens


## Compute Kernel Density Estimation

In [28]:
print("\nComputing kernel density estimation...")
print(f"  This may take 1-2 minutes for {N:,} points...\\n")

# Transpose for KDE (expects [n_features, n_samples])
xyz = embedding_3d.T  # [3, 32000]

# Compute KDE
kde = gaussian_kde(xyz, bw_method=KDE_BANDWIDTH)
density = kde(xyz)

print(f"✓ Computed density field")
print(f"  Min density: {density.min():.6f}")
print(f"  Max density: {density.max():.6f}")
print(f"  Mean density: {density.mean():.6f}")
print(f"  Median density: {np.median(density):.6f}")

# Count core points
n_core = (density > DENSITY_THRESHOLD).sum()
print(f"\n  Density > {DENSITY_THRESHOLD}: {n_core:,} points ({100*n_core/N:.2f}%)")


Computing kernel density estimation...
  This may take 1-2 minutes for 32,000 points...\n
✓ Computed density field
  Min density: 0.000614
  Max density: 0.107859
  Mean density: 0.032178
  Median density: 0.027052

  Density > 0.08: 1,453 points (4.54%)


## Core Validation: Red/Blue Visualization

Red = density > 0.08 (hypothesized dense core)  
Blue = density ≤ 0.08 (everything else)

In [31]:
# Separate core from non-core based on density threshold
is_core = density > DENSITY_THRESHOLD

core_points = embedding_3d[is_core]
other_points = embedding_3d[~is_core]

print(f"Population split:")
print(f"  Core (ρ > {DENSITY_THRESHOLD}): {len(core_points):,} points")
print(f"  Other (ρ ≤ {DENSITY_THRESHOLD}): {len(other_points):,} points")

# Create interactive 3D scatter plot
# IMPORTANT: Add core (red) FIRST, then other (blue) on top
# This fixes z-ordering so blue points render in front when they're spatially in front
fig = go.Figure()

# Add core points (red) FIRST
fig.add_trace(go.Scatter3d(
    x=core_points[:, 0],
    y=core_points[:, 1],
    z=core_points[:, 2],
    mode='markers',
    marker=dict(
        size=1,
        color='red',
        opacity=0.8
    ),
    name=f'Core ({len(core_points):,})'
))

# Add non-core points (blue) SECOND (renders on top)
fig.add_trace(go.Scatter3d(
    x=other_points[:, 0],
    y=other_points[:, 1],
    z=other_points[:, 2],
    mode='markers',
    marker=dict(
        size=1,
        color='blue',
        opacity=0.3
    ),
    name=f'Other ({len(other_points):,})'
))

fig.update_layout(
    title=f'Core Validation: Density Threshold ρ > {DENSITY_THRESHOLD}',
    scene=dict(
        xaxis_title='UMAP 1',
        yaxis_title='UMAP 2',
        zaxis_title='UMAP 3',
        camera=dict(eye=dict(x=1.5, y=1.5, z=1.5)),
        aspectmode='data'
    ),
    width=1000,
    height=800,
    showlegend=True
)

fig.show()

print("\n💡 Rotate the plot to see if red points form a coherent dense core!")
print("💡 If they do, we've validated our density threshold.")

Population split:
  Core (ρ > 0.08): 1,453 points
  Other (ρ ≤ 0.08): 30,547 points



💡 Rotate the plot to see if red points form a coherent dense core!
💡 If they do, we've validated our density threshold.
