# Delta Observer: Quick-Start Demo

**Fast overview of the Delta Observer project with pre-computed results**

This notebook provides a quick demonstration of the key findings from our paper:
> *"Delta Observer: Learning Continuous Semantic Manifolds Between Neural Network Representations"*

📄 **Paper:** [OSF MetaArXiv](https://doi.org/10.17605/OSF.IO/CNJTP)  
🔗 **Code:** [github.com/EntroMorphic/delta-observer](https://github.com/EntroMorphic/delta-observer)

---

## Key Finding

**Semantic information can be linearly accessible (R²=0.9505) without geometric clustering (Silhouette=0.0320)**

This challenges the assumption that interpretability requires discrete, spatially separated feature clusters.

---

## Setup

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.linear_model import Ridge
from sklearn.metrics import r2_score, silhouette_score
from sklearn.cluster import KMeans
import sys
sys.path.append('..')

# Set plotting style
plt.style.use('seaborn-v0_8-darkgrid')
plt.rcParams['figure.figsize'] = (12, 8)
plt.rcParams['font.size'] = 11

## Load Pre-computed Data

We load the Delta Observer's latent space representations that were computed by training on 4-bit binary addition.

In [None]:
# Load Delta Observer latent space with smart path detection
import os

# Try multiple paths (Colab vs local)
paths = ['../data', 'data', 'delta-observer/data']
data_path = None
for p in paths:
    full_path = os.path.join(p, 'delta_latent_umap.npz')
    if os.path.exists(full_path):
        data_path = full_path
        break

if not data_path:
    raise FileNotFoundError('delta_latent_umap.npz not found. Clone the repo or run previous notebooks.')

print(f'✓ Loading data from: {data_path}')
data = np.load(data_path)

latent_space = data['latents']  # 512 × 16D
carry_counts = data['carry_counts']  # 0-4 carries
bit_positions = data['bit_positions']  # 0-3 bit positions

print(f'Latent space shape: {latent_space.shape}')
print(f'Carry count distribution: {np.bincount(carry_counts)}')
print(f'Bit position distribution: {np.bincount(bit_positions)}')

## Visualization: 2D Projection

Project the 16D latent space to 2D using PCA for visualization.

In [None]:
# PCA projection
pca = PCA(n_components=2)
latent_2d = pca.fit_transform(latent_space)

print(f"Variance explained: {pca.explained_variance_ratio_.sum():.1%}")

# Create figure with two subplots
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# Plot 1: Colored by carry count
scatter1 = ax1.scatter(latent_2d[:, 0], latent_2d[:, 1], 
                       c=carry_counts, cmap='viridis', 
                       alpha=0.6, s=30)
ax1.set_title('Delta Latent Space: Carry Count', fontsize=14, fontweight='bold')
ax1.set_xlabel('PC1')
ax1.set_ylabel('PC2')
cbar1 = plt.colorbar(scatter1, ax=ax1)
cbar1.set_label('Carry Count')

# Plot 2: Colored by bit position
scatter2 = ax2.scatter(latent_2d[:, 0], latent_2d[:, 1], 
                       c=bit_positions, cmap='plasma', 
                       alpha=0.6, s=30)
ax2.set_title('Delta Latent Space: Bit Position', fontsize=14, fontweight='bold')
ax2.set_xlabel('PC1')
ax2.set_ylabel('PC2')
cbar2 = plt.colorbar(scatter2, ax=ax2)
cbar2.set_label('Bit Position')

plt.tight_layout()
plt.show()

## Key Metric 1: Linear Accessibility

**Question:** Can we predict semantic properties (carry count) from the latent space using a simple linear model?

**Method:** Train a Ridge regression model to predict carry count from latent representations.

In [None]:
# Split data
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    latent_space, carry_counts, test_size=0.2, random_state=42
)

# Train linear probe
probe = Ridge(alpha=1.0)
probe.fit(X_train, y_train)

# Evaluate
y_pred = probe.predict(X_test)
r2 = r2_score(y_test, y_pred)

print(f"\n{'='*50}")
print(f"LINEAR ACCESSIBILITY (R² Score)")
print(f"{'='*50}")
print(f"R² = {r2:.4f}")
print(f"\nInterpretation: {r2:.1%} of carry count variance is explained")
print(f"by a LINEAR model of the latent space.")
print(f"{'='*50}\n")

# Visualization
plt.figure(figsize=(8, 6))
plt.scatter(y_test, y_pred, alpha=0.5)
plt.plot([0, 4], [0, 4], 'r--', label='Perfect prediction')
plt.xlabel('True Carry Count')
plt.ylabel('Predicted Carry Count')
plt.title(f'Linear Probe Performance (R² = {r2:.4f})', fontweight='bold')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

## Key Metric 2: Geometric Clustering

**Question:** Are points with similar carry counts clustered together in space?

**Method:** Compute Silhouette score (measures cluster separation).

- **Score near 1:** Strong, well-separated clusters
- **Score near 0:** Overlapping, continuous distribution

In [None]:
# Compute silhouette score
silhouette = silhouette_score(latent_space, carry_counts)

print(f"\n{'='*50}")
print(f"GEOMETRIC CLUSTERING (Silhouette Score)")
print(f"{'='*50}")
print(f"Silhouette = {silhouette:.4f}")
print(f"\nInterpretation: Score near 0 indicates MINIMAL clustering.")
print(f"Carry count groups are NOT spatially separated.")
print(f"{'='*50}\n")

## The Paradox

**High Linear Accessibility + Low Geometric Clustering = Continuous Semantic Gradients**

This combination reveals that:
1. Semantic information (carry count) is **linearly accessible** (R² ≈ 0.95)
2. But **NOT organized into discrete clusters** (Silhouette ≈ 0.03)

**Implication:** Semantic primitives exist as **continuous gradients** in latent space, not discrete categorical labels.

This challenges the prevailing assumption in mechanistic interpretability that features must be geometrically clustered to be interpretable.

In [None]:
# Summary visualization
fig, ax = plt.subplots(figsize=(10, 6))

metrics = ['Linear\nAccessibility\n(R²)', 'Geometric\nClustering\n(Silhouette)']
values = [r2, silhouette]
colors = ['#2ecc71' if v > 0.5 else '#e74c3c' for v in values]

bars = ax.barh(metrics, values, color=colors, alpha=0.7)
ax.set_xlim(0, 1)
ax.set_xlabel('Score', fontsize=12)
ax.set_title('The Accessibility-Clustering Paradox', fontsize=14, fontweight='bold')
ax.axvline(0.5, color='gray', linestyle='--', alpha=0.5, label='Threshold')

# Add value labels
for i, (bar, val) in enumerate(zip(bars, values)):
    ax.text(val + 0.02, i, f'{val:.4f}', va='center', fontweight='bold')

ax.legend()
plt.tight_layout()
plt.show()

print("\n" + "="*60)
print("CONCLUSION: Semantic information is LINEARLY ACCESSIBLE")
print("without requiring GEOMETRIC CLUSTERING.")
print("="*60)

## Next Steps

Explore the detailed notebooks:

1. **`01_training_models.ipynb`** - Train monolithic and compositional models from scratch
2. **`02_delta_observer_training.ipynb`** - Train the Delta Observer architecture
3. **`03_analysis_visualization.ipynb`** - Deep dive into geometric analysis
4. **`99_full_reproduction.ipynb`** - Complete end-to-end reproduction

Or read the paper: [OSF MetaArXiv](https://doi.org/10.17605/OSF.IO/CNJTP)