# Takens' Embedding Framework - Demo with Imputed Datasets

This notebook demonstrates how to use the reusable Takens' Embedding Framework to analyze:
- Original datasets
- Imputed datasets (with 10-30% missing values)
- Multiple imputation methods (KNN, Interpolation, LOCF, GAN)
- Multiple missing patterns (MCAR, MAR, MNAR)

## Framework Features:
- **Dataset-agnostic**: Works with any time series dataset
- **Easy integration**: Simple API for loading and comparing datasets
- **Imputation support**: Built-in support for imputed datasets
- **Comparison tools**: Compare original vs imputed point clouds
- **Visualization**: Automatic plotting and analysis


In [None]:
# Import the framework
import sys
import os
sys.path.append(os.path.dirname(os.path.abspath('.')))

from takens_framework import TakensEmbeddingFramework
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import pickle
import glob

print("✓ Framework imported successfully")


## Step 1: Initialize the Framework

Configure the embedding parameters:
- **dimension**: Embedding dimension (default: 3 for 3D visualization)
- **time_delay**: Time delay parameter τ (default: 1)
- **normalize**: Whether to normalize data (recommended: True)


In [None]:
# Initialize framework with custom parameters
framework = TakensEmbeddingFramework(
    dimension=3,        # 3D embedding for visualization
    time_delay=1,       # Time delay parameter
    stride=1,           # Sample every point
    normalize=True,     # Normalize data
    random_state=42     # For reproducibility
)

print("✓ Framework initialized")
print(f"  - Embedding dimension: {framework.dimension}")
print(f"  - Time delay: {framework.time_delay}")
print(f"  - Normalization: {framework.normalize}")


## Step 2: Load Original and Imputed Datasets

The framework can automatically load:
- Original complete dataset
- All imputed datasets from a pickle file
- Handles multiple imputation methods and missing patterns


In [None]:
# Find imputed datasets
imp_data_dir = 'imp_data'
if os.path.exists(imp_data_dir):
    pickle_files = glob.glob(f'{imp_data_dir}/imputed_datasets_*.pkl')
    if pickle_files:
        # Use the most recent file
        imputed_data_path = sorted(pickle_files)[-1]
        print(f"Found imputed datasets: {imputed_data_path}")
    else:
        print("No imputed datasets found. Please run imputation.py first.")
        imputed_data_path = None
else:
    print("imp_data directory not found. Please run imputation.py first.")
    imputed_data_path = None

# Load original dataset
original_data_path = 'data/eeg_eye_state_full.csv'
if os.path.exists(original_data_path):
    print(f"Found original dataset: {original_data_path}")
else:
    print("Original dataset not found. Will try to load from UCI repository.")
    original_data_path = None


In [None]:
# Load datasets using the framework
if imputed_data_path and original_data_path:
    # Load imputed datasets with original
    framework.load_imputed_datasets(
        imputed_data_path=imputed_data_path,
        original_data=original_data_path,
        original_name='original',
        target_column='target'  # For grouping by eye state
    )
    
    print(f"\n✓ Loaded {len(framework.datasets)} datasets")
    print(f"  Available datasets: {list(framework.datasets.keys())[:5]}...")  # Show first 5
elif original_data_path:
    # Load only original if imputed not available
    framework.load_dataset(
        original_data_path,
        name='original',
        target_column='target'
    )
    print(f"\n✓ Loaded original dataset")
else:
    print("⚠ No datasets found. Please ensure data files exist.")


## Step 3: Create Point Clouds for Comparison

Compare point clouds across:
- Original vs Imputed datasets
- Different imputation methods (KNN, Interpolation, LOCF, GAN)
- Different missing patterns (MCAR, MAR, MNAR)


In [None]:
# Select datasets to compare
# You can customize this list based on available datasets
datasets_to_compare = ['original']

# Add imputed datasets if available
if len(framework.datasets) > 1:
    # Example: Compare original with KNN imputed for MCAR pattern
    for name in framework.datasets.keys():
        if 'knn' in name.lower() and 'mcar' in name.lower():
            datasets_to_compare.append(name)
            break
        if 'gan' in name.lower() and 'mcar' in name.lower():
            datasets_to_compare.append(name)
            break

print(f"Comparing datasets: {datasets_to_compare}")

# Create point clouds for comparison
channel_idx = 0  # First EEG channel
comparison_results = framework.compare_datasets(
    dataset_names=datasets_to_compare,
    channel_idx=channel_idx,
    group_by_target=True,  # Separate by eye state (closed/open)
    max_samples_per_group=500  # Limit for efficiency
)

print(f"\n✓ Created point clouds for {len(comparison_results)} datasets")


## Step 4: Visualize Point Clouds

Compare the structure of point clouds between original and imputed datasets.


In [None]:
# Visualize original dataset (closed vs open eyes)
if 'original' in comparison_results:
    original_pcs = comparison_results['original']
    
    # Filter for first channel
    original_pcs_filtered = {
        k: v for k, v in original_pcs.items() 
        if f'_ch{channel_idx}' in k
    }
    
    if len(original_pcs_filtered) >= 2:
        # Create comparison visualization
        framework.visualize_comparison(
            original_pcs_filtered,
            titles={
                k: k.replace('original_ch0_', '').replace('target', 'Eye ').title()
                for k in original_pcs_filtered.keys()
            },
            colors={
                k: 'blue' if 'target0' in k else 'red'
                for k in original_pcs_filtered.keys()
            }
        )
        plt.suptitle('Original Dataset - Eye Closed vs Eye Open', y=1.02)
        plt.show()
        print("✓ Original dataset visualization created")


## Step 5: Compare Original vs Imputed Datasets

Visualize how different imputation methods affect the point cloud structure.


In [None]:
# Compare original with imputed datasets
if len(comparison_results) > 1:
    # Select one target group for comparison (e.g., closed eyes)
    target_key = f'_ch{channel_idx}_target0'  # Closed eyes
    
    comparison_pcs = {}
    for dataset_name, pcs in comparison_results.items():
        key = f"{dataset_name}{target_key}"
        if key in pcs:
            # Shorten name for display
            display_name = dataset_name.replace('_data_', '_').replace('_', ' ').title()
            comparison_pcs[display_name] = pcs[key]
    
    if len(comparison_pcs) > 1:
        framework.visualize_comparison(
            comparison_pcs,
            figsize=(6 * len(comparison_pcs), 5)
        )
        plt.suptitle('Original vs Imputed Datasets (Eye Closed)', y=1.02)
        plt.show()
        print(f"✓ Compared {len(comparison_pcs)} datasets")


## Step 6: PCA Projection for Easier Comparison

Use PCA to project high-dimensional point clouds to 2D for easier visualization.


In [None]:
# PCA projection comparison
if len(comparison_results) > 1:
    target_key = f'_ch{channel_idx}_target0'  # Closed eyes
    
    comparison_pcs = {}
    for dataset_name, pcs in comparison_results.items():
        key = f"{dataset_name}{target_key}"
        if key in pcs:
            display_name = dataset_name.replace('_data_', '_').replace('_', ' ').title()
            comparison_pcs[display_name] = pcs[key]
    
    if len(comparison_pcs) > 1:
        framework.visualize_pca_projection(comparison_pcs)
        plt.show()
        print("✓ PCA projection created")


## Step 7: Analyze Point Cloud Properties

Compute statistics and distances to quantify differences between datasets.


In [None]:
# Analyze point cloud properties
target_key = f'_ch{channel_idx}_target0'  # Closed eyes

print("Point Cloud Statistics:")
print("=" * 80)

for dataset_name, pcs in comparison_results.items():
    key = f"{dataset_name}{target_key}"
    if key in pcs:
        pc = pcs[key]
        stats = framework.analyze_point_cloud(pc, dataset_name)
        
        print(f"\n{dataset_name}:")
        print(f"  Points: {stats['n_points']}")
        print(f"  Mean: {stats['mean']:.4f}")
        print(f"  Std: {stats['std']:.4f}")
        print(f"  Range: {stats['range']:.4f}")
        if 'mean_pairwise_distance' in stats:
            print(f"  Mean pairwise distance: {stats['mean_pairwise_distance']:.4f}")
        print("-" * 80)


## Step 8: Compute Distance Matrix

Calculate distances between point clouds to quantify how different imputation methods affect the structure.


In [None]:
# Compute distance matrix
target_key = f'_ch{channel_idx}_target0'  # Closed eyes

comparison_pcs = {}
for dataset_name, pcs in comparison_results.items():
    key = f"{dataset_name}{target_key}"
    if key in pcs:
        display_name = dataset_name.replace('_data_', '_').replace('_', ' ').title()
        comparison_pcs[display_name] = pcs[key]

if len(comparison_pcs) > 1:
    distance_matrix = framework.compute_distance_matrix(comparison_pcs, metric='euclidean')
    
    print("Distance Matrix (Centroid Distance):")
    print("=" * 80)
    print(distance_matrix.round(4))
    print("\nLower values indicate more similar point cloud structures.")


## Step 9: Generate Comprehensive Report

Generate a detailed report comparing all datasets.


In [None]:
# Generate report
report = framework.generate_report(
    dataset_names=list(comparison_results.keys()),
    channel_idx=channel_idx,
    output_file='takens_analysis_report.txt'
)

print(report)


## Step 10: Compare Multiple Imputation Methods

Compare all imputation methods for a specific missing pattern (e.g., MCAR).


In [None]:
# Compare all imputation methods for MCAR pattern
mcar_datasets = [name for name in framework.datasets.keys() 
                 if 'mcar' in name.lower() and name != 'original']
mcar_datasets.insert(0, 'original')  # Add original at the beginning

if len(mcar_datasets) > 1:
    print(f"Comparing MCAR imputation methods: {mcar_datasets}")
    
    mcar_comparison = framework.compare_datasets(
        dataset_names=mcar_datasets,
        channel_idx=channel_idx,
        group_by_target=True,
        max_samples_per_group=500
    )
    
    # Visualize for closed eyes
    target_key = f'_ch{channel_idx}_target0'
    mcar_pcs = {}
    for name, pcs in mcar_comparison.items():
        key = f"{name}{target_key}"
        if key in pcs:
            display_name = name.replace('_data_', '_').replace('_', ' ').title()
            mcar_pcs[display_name] = pcs[key]
    
    if len(mcar_pcs) > 1:
        framework.visualize_comparison(mcar_pcs, figsize=(6 * len(mcar_pcs), 5))
        plt.suptitle('MCAR Pattern: Original vs All Imputation Methods (Eye Closed)', y=1.02)
        plt.show()
        
        # Distance matrix
        mcar_distances = framework.compute_distance_matrix(mcar_pcs)
        print("\nMCAR Imputation Methods - Distance Matrix:")
        print(mcar_distances.round(4))


## Summary

This framework provides:

1. **Easy Dataset Loading**: Load original and imputed datasets with one function call
2. **Automatic Point Cloud Creation**: Convert time series to point clouds automatically
3. **Comparison Tools**: Compare multiple datasets side-by-side
4. **Visualization**: Automatic plotting for 3D and 2D (PCA) views
5. **Quantitative Analysis**: Statistics and distance metrics
6. **Report Generation**: Comprehensive text reports

### Key Advantages:

- **Dataset-agnostic**: Works with any time series dataset
- **Flexible**: Easy to customize parameters and comparisons
- **Scalable**: Handles multiple datasets and imputation methods
- **Reproducible**: Built-in random state control

### Next Steps:

- Experiment with different embedding dimensions and time delays
- Compare across different missing patterns (MCAR, MAR, MNAR)
- Analyze multiple channels simultaneously
- Apply to other datasets (climate, traffic, etc.)
