# Superposition Explorer Tutorial

Investigate polysemanticity and feature superposition in transformer representations.

## What is Superposition?

Neural networks often represent more features than they have dimensions. This is called **superposition** - features are encoded in overlapping, non-orthogonal directions.

### Key Concepts

- **Polysemanticity**: A single neuron responds to multiple unrelated concepts
- **Superposition**: More features than dimensions through interference
- **Feature interference**: When features aren't perfectly orthogonal

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from superposition_explorer import SuperpositionAnalyzer, FeatureProbe

print("Libraries loaded!")

## 1. Create Synthetic Data

We'll create activations with known superposition properties.

In [None]:
# Create synthetic activations with superposition
n_samples = 1000
n_dims = 256
n_true_features = 512  # More features than dimensions!

# Random feature directions (some overlap due to superposition)
feature_directions = np.random.randn(n_true_features, n_dims)
feature_directions /= np.linalg.norm(feature_directions, axis=1, keepdims=True)

# Sparse feature activations
feature_activations = np.random.exponential(1, (n_samples, n_true_features))
feature_activations *= (np.random.rand(n_samples, n_true_features) > 0.9)  # 90% sparse

# Reconstruct activations through superposition
activations = feature_activations @ feature_directions

print(f"Created {n_samples} samples with {n_dims} dimensions")
print(f"Encoding {n_true_features} features (superposition ratio: {n_true_features/n_dims:.1f}x)")

## 2. Analyze Superposition

In [None]:
# Initialize analyzer
analyzer = SuperpositionAnalyzer(n_features=n_dims)

# Compute metrics
metrics = analyzer.compute_metrics(activations)

print("Superposition Metrics:")
print(f"  Effective dimensionality: {metrics.effective_dimensionality:.2f}")
print(f"  Sparsity: {metrics.sparsity:.4f}")
print(f"  Feature interference: {metrics.interference:.4f}")

## 3. Visualize Feature Space

In [None]:
# PCA visualization
from sklearn.decomposition import PCA

pca = PCA(n_components=2)
activations_2d = pca.fit_transform(activations[:500])

plt.figure(figsize=(10, 8))
plt.scatter(activations_2d[:, 0], activations_2d[:, 1], alpha=0.5, s=10)
plt.xlabel('PC1')
plt.ylabel('PC2')
plt.title('Activation Space (PCA)')
plt.show()

print(f"Variance explained: {pca.explained_variance_ratio_.sum():.2%}")

## 4. Find Polysemantic Neurons

In [None]:
# Create concept labels
concept_labels = np.random.randint(0, 10, n_samples)

# Find polysemantic neurons
polysemantic = analyzer.find_polysemantic_neurons(
    activations,
    concept_labels,
    threshold=0.3,
)

print(f"Found {len(polysemantic)} polysemantic neurons out of {n_dims}")
print(f"Polysemanticity rate: {len(polysemantic)/n_dims:.1%}")

## 5. Interference Matrix

Visualize how features interfere with each other.

In [None]:
# Compute interference (correlation) matrix
interference = np.corrcoef(activations.T)[:50, :50]  # First 50 dims

plt.figure(figsize=(10, 8))
plt.imshow(interference, cmap='RdBu', vmin=-1, vmax=1)
plt.colorbar(label='Correlation')
plt.title('Feature Interference Matrix')
plt.xlabel('Feature Index')
plt.ylabel('Feature Index')
plt.show()

## Conclusion

This tutorial demonstrated:
1. How superposition allows encoding more features than dimensions
2. Metrics to quantify superposition (effective dimensionality, sparsity)
3. Visualization of feature spaces and interference
4. Detection of polysemantic neurons

For more, see the Anthropic papers on superposition and polysemanticity!