# Universal Subspace Hypothesis - Quick Example

This notebook demonstrates how to use the framework to investigate the universal subspace hypothesis.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from src.datasets import DatasetManager, ALL_DATASETS
from src.trainer import Trainer
from src.geometry_analysis import GeometricAnalyzer
from src.visualization import Visualizer

print(f"Available datasets: {ALL_DATASETS}")

## Step 1: Train models on diverse datasets

We'll train small neural networks (2 hidden layers with 16 neurons each) on multiple datasets.

In [None]:
# Select a subset of datasets for quick demo
datasets_to_use = [
    'binary_moons',
    'binary_circles',
    'breast_cancer',
    'wine',
    'digits',
    'regression_synthetic',
    'time_series_sine'
]

# Create trainer with small architecture
trainer = Trainer(
    hidden_dims=[16, 16],
    learning_rate=0.001,
    epochs=50,  # Fewer epochs for quick demo
    patience=10
)

# Train on all datasets and collect weights
weight_matrix, metadata_list = trainer.train_on_all_datasets(
    dataset_names=datasets_to_use,
    save_dir='results_demo'
)

print(f"\nWeight matrix shape: {weight_matrix.shape}")
print(f"Each model has {weight_matrix.shape[1]} parameters")

## Step 2: Geometric Analysis

Analyze the geometry of the weight space using multiple methods.

In [None]:
# Create analyzer
analyzer = GeometricAnalyzer(weight_matrix, metadata_list)

# Run full analysis
results = analyzer.full_analysis()

## Step 3: Visualize Results

In [None]:
# Create visualizations
visualizer = Visualizer(results, metadata_list, save_dir='results_demo/figures')
visualizer.create_all_plots()

print("\nAll visualizations saved to results_demo/figures/")

## Step 4: Examine Key Results

In [None]:
# Display key metrics
print("Key Findings:")
print("=" * 60)
print(f"Total parameters per model: {weight_matrix.shape[1]}")
print(f"\nDimensionality Analysis:")
print(f"  PCA effective dimension (95% var): {results['pca']['effective_dim_95']}")
print(f"  Intrinsic dimension (MLE): {results['intrinsic_dim']['intrinsic_dimension_mean']:.2f}")
print(f"  Fractal dimension (box-count): {results['fractal_boxcount']['fractal_dimension']:.2f}")
print(f"  Correlation dimension: {results['fractal_correlation']['correlation_dimension']:.2f}")
print(f"\nClustering:")
print(f"  Number of clusters found: {results['clustering']['n_clusters']}")

# Interpretation
pca_dim = results['pca']['effective_dim_95']
intrinsic = results['intrinsic_dim']['intrinsic_dimension_mean']
fractal = results['fractal_correlation']['correlation_dimension']

print("\nInterpretation:")
if pca_dim < 10:
    print(f"  ✓ Very low dimensionality - STRONG support for universal subspace")
else:
    print(f"  ~ Moderate dimensionality - PARTIAL support")

if abs(intrinsic - fractal) < 1.0:
    print("  ✓ Smooth manifold (dimensions agree)")
else:
    print("  ⚠ Possible fractal structure (dimensions differ)")

## Visualize UMAP Embeddings

See how models cluster in low-dimensional space.

In [None]:
# Plot UMAP 2D embedding inline
from IPython.display import Image, display

print("UMAP 2D Embedding:")
display(Image('results_demo/figures/umap_2d.png'))

print("\nUMAP 3D Embedding:")
display(Image('results_demo/figures/umap_3d.png'))

## Investigate PCA

How much variance is captured by the first few principal components?

In [None]:
display(Image('results_demo/figures/pca_variance.png'))

# Print first 10 components
variance = results['pca']['explained_variance_ratio']
cumsum = results['pca']['cumulative_variance']

print("\nVariance explained by each component:")
for i in range(min(10, len(variance))):
    print(f"  PC{i+1}: {variance[i]:.1%} (cumulative: {cumsum[i]:.1%})")

## Fractal Analysis

Check for fractal-like structure in the weight manifold.

In [None]:
display(Image('results_demo/figures/fractal_dimension.png'))

print(f"\nBox-counting fractal dimension: {results['fractal_boxcount']['fractal_dimension']:.2f}")
print(f"R² of fit: {results['fractal_boxcount']['r_squared']:.3f}")

print(f"\nCorrelation dimension: {results['fractal_correlation']['correlation_dimension']:.2f}")
print(f"R² of fit: {results['fractal_correlation']['r_squared']:.3f}")

if results['fractal_boxcount']['fractal_dimension'] % 1 > 0.1:
    print("\n⚠ Non-integer dimension suggests fractal-like structure!")
else:
    print("\n✓ Close to integer dimension - smooth manifold likely")

## Next Steps

- Try different architectures: `trainer = Trainer(hidden_dims=[20, 20])`
- Add more datasets to increase statistical power
- Investigate clustering by task type
- Compare with larger networks