# TDA Analysis

**Goal:** Compute persistent homology for clean CIFAR-10 features and visualize topological structure

### Overview
In this notebook, we will:
1. Load the clean features (50-dim PCA-reduced)
2. Compute persistence diagrams for H0 (connected components) and H1 (loops)
3. Visualize persistence diagrams
4. Analyze Betti curves
5. Extract topological statistics
6. Save diagrams for later comparison with adversarial features

---

### Why TDA?
Traditional ML looks at distances and gradients in feature space. TDA reveals the shape of the data:
- **H0 (homology dim 0):** Connected components: how clustered is the data?
- **H1 (homology dim 1):** Loops/cycles: are there circular structures?

**Our hypothesis:** Adversarial perturbations will fragment clusters (more H0 features) and **destroy** loops (fewer H1 features).

### Setup and Imports

In [1]:
import sys
sys.path.append('../')
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from src.models.feature_extractor import FeatureExtractor
from src.tda.persistence import compute_persistence, save_diagrams, get_persistence_stats
from src.utils.plotting import plot_persistence_diagram, plot_betti_curve

np.random.seed(111)

print("Successfully imported everything")

Successfully imported everything


### Load Clean Features

We'll use the 50-dimensional PCA-reduced features from `02_feature_extraction.ipynb`. These features were extracted from 45,000 clean CIFAR-10 training images using ResNet50.

In [2]:
data = np.load('../results/features/clean_features_reduced.npz')
clean_features = data['features']
clean_labels = data['labels']

print(f"Clean features shape: {clean_features.shape}")
print(f"Feature range: [{clean_features.min():.4f}, {clean_features.max():.4f}]")
print(f"Labels: {clean_labels.shape}")

Clean features shape: (45000, 50)
Feature range: [-2.4786, 1.6122]
Labels: (45000,)
