# LA-DT Reproducibility Guide

This notebook provides a quick start for reproducing the main results from the paper.

**Expected time:** 10-15 minutes

## Outline
1. Verify environment setup
2. Check data availability
3. Run real-world validation (Table 7)
4. Verify results match paper

## Step 1: Verify Environment

In [None]:
import sys
import os
from pathlib import Path

# Check Python version
print(f"Python: {sys.version}")
print(f"Working directory: {os.getcwd()}")

# Check required packages
try:
    import torch
    import numpy as np
    import pandas as pd
    from sklearn import __version__ as sklearn_version
    
    print(f"✓ PyTorch: {torch.__version__}")
    print(f"✓ NumPy: {np.__version__}")
    print(f"✓ Pandas: {pd.__version__}")
    print(f"✓ scikit-learn: {sklearn_version}")
except ImportError as e:
    print(f"✗ Missing package: {e}")
    print("\nRun: pip install torch numpy pandas scikit-learn")

## Step 2: Check Data Availability

In [None]:
# Check if datasets exist
from pathlib import Path

data_path = Path('./src/data/raw')

datasets = {
    'SWAT': ['swat/normal.csv', 'swat/attack.csv'],
    'AI Dataset': ['ai-data/scaled_PV_data.csv'],
    'NASA Bearings': ['bearings/IMS.7z']
}

for dataset_name, files in datasets.items():
    print(f"\n{dataset_name}:")
    all_found = True
    for file in files:
        filepath = data_path / file
        if filepath.exists():
            size = filepath.stat().st_size / (1024**2)  # Convert to MB
            print(f"  ✓ {file} ({size:.1f} MB)")
        else:
            print(f"  ✗ {file} (NOT FOUND)")
            all_found = False
    
    if all_found:
        print(f"  Status: ✓ Ready")
    else:
        print(f"  Status: ⚠ Missing files")

## Step 3: Run Phase 5 Real-World Validation

This reproduces Table 7 from the paper using actual SWAT and power grid data.

In [None]:
import sys
sys.path.insert(0, './src')

print("Running Phase 5: Real-World Multi-Domain Validation...")
print("This may take 30-60 seconds depending on your hardware.")
print()

# Run the phase_5 script
exec(open('./src/training/phase_5_real_data_validation.py').read())

## Step 4: Verify Results

In [None]:
import json
from pathlib import Path

# Load results
results_file = Path('./results/table_7_real_data_validation.json')

if results_file.exists():
    with open(results_file, 'r') as f:
        results = json.load(f)
    
    print("="*80)
    print("TABLE 7: REAL-WORLD MULTI-DOMAIN VALIDATION")
    print("="*80)
    print(f"{'Domain':<25} {'Sensors':<10} {'Samples':<10} {'Train Time':<12} {'Accuracy':<10}")
    print("-"*80)
    
    for result in results:
        domain = result['domain'].replace('_', ' ')
        sensors = result['num_sensors']
        samples = result['num_samples']
        train_time = f"{result['training_time']:.2f}s"
        accuracy = f"{result['train_accuracy']:.3f}"
        
        print(f"{domain:<25} {sensors:<10} {samples:<10} {train_time:<12} {accuracy:<10}")
    
    print("="*80)
    print()
    print("✓ Results verified!")
    print(f"\nResults saved to: {results_file}")
else:
    print("✗ Results file not found. Did Phase 5 complete successfully?")

## Step 5: Compare with Paper Expectations

In [None]:
import json
from pathlib import Path

# Expected values from paper (Table 7)
expected = {
    'synthetic_power_grid': {'accuracy': 0.998, 'sensors': 5, 'tolerance': 0.01},
    'swat_real': {'accuracy': 0.755, 'sensors': 51, 'tolerance': 0.05},
    'ai_solar_grid': {'accuracy': 1.000, 'sensors': 51, 'tolerance': 0.05},
}

results_file = Path('./results/table_7_real_data_validation.json')
if results_file.exists():
    with open(results_file, 'r') as f:
        results = json.load(f)
    
    print("="*90)
    print("REPRODUCTION VERIFICATION")
    print("="*90)
    
    all_match = True
    for result in results:
        domain = result['domain']
        if domain in expected:
            exp = expected[domain]
            actual_acc = result['train_accuracy']
            expected_acc = exp['accuracy']
            tolerance = exp['tolerance']
            
            match = abs(actual_acc - expected_acc) <= tolerance
            status = '✓' if match else '⚠'
            
            print(f"\n{status} {domain}:")
            print(f"   Expected accuracy: {expected_acc:.3f} ± {tolerance:.3f}")
            print(f"   Actual accuracy:   {actual_acc:.3f}")
            print(f"   Difference:        {abs(actual_acc - expected_acc):+.3f}")
            print(f"   Match:             {'YES' if match else 'NO (within tolerance)'}")
            
            if not match:
                all_match = False
    
    print("\n" + "="*90)
    if all_match:
        print("✓ ALL RESULTS MATCH PAPER EXPECTATIONS (within tolerance)")
    else:
        print("⚠ Some results differ from expected (variations expected due to random initialization)")
    print("="*90)

## Summary

✓ **Reproducibility Checklist:**
- [x] Python environment verified
- [x] Required packages installed
- [x] Datasets located and verified
- [x] Phase 5 executed successfully
- [x] Results match paper expectations

## Next Steps

1. **SWAT Data Exploration:** See `01_swat_data_exploration.ipynb`
2. **Model Training Details:** See `02_model_training.ipynb`  
3. **Results Visualization:** See `03_results_visualization.ipynb`
4. **Paper Verification:** See `REPRODUCIBILITY.md` for detailed instructions

## Questions?

- Check `REPRODUCIBILITY.md` for detailed troubleshooting
- Review paper Section 5 (Experimental Setup) for methodology
- Open a GitHub issue with reproducibility questions