# 🚀 TFT Monitoring System - Complete Pipeline

## Temporal Fusion Transformer for Server Monitoring & Prediction

### 📋 **What This Notebook Does:**
1. **Setup & Validation** - Check environment and dependencies
2. **Demo Data Generation** - Create realistic 5-minute server fleet simulation
3. **Production Data Generation** - Generate large-scale training datasets
4. **Model Training** - Train TFT models (fleet-wide or per-server)
5. **Live Dashboards** - Interactive monitoring with real-time visualization

### 🎯 **Key Features:**
- **Multi-horizon forecasting**: Predict up to 8 hours ahead (96 timesteps)
- **Parquet-first**: Fast binary format for large datasets
- **Attention mechanism**: Automatic feature importance detection
- **Per-server models**: Train specialized models for each server
- **Live dashboards**: Real-time monitoring with anomaly detection

### ⏱️ **Estimated Cell Execution Times:**
- **Cells 1-3** (Setup): < 5 seconds each
- **Cell 4** (Demo Data): ~5-10 seconds
- **Cell 5** (Demo Dashboard): Runs continuously (5 minutes total)
- **Cell 6** (24h Dataset): ~30-60 seconds | (720h Dataset): ~5-10 minutes
- **Cell 7** (Train 20 epochs): **~6-10 hours** | (Train 5 epochs): ~1.5-2.5 hours
- **Cell 8** (Inspect): < 1 second
- **Cell 9** (Dashboard): Runs continuously
- **Cells 10-12**: < 2 seconds each

### ⚡ **Quick Start:**
Run cells in order for a complete demo, or jump to specific sections:
- **Demo** → Cells 2-6 (5 minutes)
- **Production Training** → Cells 7-10 (1-10 hours depending on epochs)
- **Dashboard** → Cells 11-12 (continuous)

---

**Architecture:** PyTorch 2.0+ | Lightning 2.0+ | PyTorch Forecasting 1.0  
**Model:** TemporalFusionTransformer with multi-head attention  
**Hardware:** Best with GPU (RTX 4090 or similar)

In [1]:
# Cell 1: Import and Setup
# Expected time: < 5 seconds

import time
_cell_start = time.time()

import warnings
warnings.filterwarnings('ignore')

from pathlib import Path
from config import CONFIG

print("🎯 TFT Monitoring System - Optimized Pipeline")
print("=" * 60)
print("✅ Configuration loaded")
print(f"📊 Training time span: {CONFIG['time_span_hours']} hours")
print(f"🖥️  Fleet size: {CONFIG['servers_count']} servers")
print(f"🔮 Prediction horizon: {CONFIG['prediction_horizon']} steps (8 hours)")
print(f"📈 Context length: {CONFIG['context_length']} steps (24 hours)")
print("=" * 60)

_cell_elapsed = time.time() - _cell_start
print(f"\n⏱️  Cell execution time: {_cell_elapsed:.2f} seconds")

🎯 TFT Monitoring System - Optimized Pipeline
✅ Configuration loaded
📊 Training time span: 720 hours
🖥️  Fleet size: 15 servers
🔮 Prediction horizon: 96 steps (8 hours)
📈 Context length: 288 steps (24 hours)

⏱️  Cell execution time: 0.00 seconds


In [2]:
# Cell 2: Environment Validation
# Expected time: < 5 seconds

import time
_cell_start = time.time()

from main import setup

print("🔍 STEP 1: Environment Validation")
print("-" * 60)

if setup():
    print("\n✅ Environment ready for TFT training!")
    print("📦 All dependencies installed")
    print("🚀 Ready to proceed with data generation")
else:
    print("\n❌ Setup failed - install missing dependencies")
    print("   pip install torch lightning pytorch-forecasting safetensors pandas pyarrow")

_cell_elapsed = time.time() - _cell_start
print(f"\n⏱️  Cell execution time: {_cell_elapsed:.2f} seconds")

🔍 STEP 1: Environment Validation
------------------------------------------------------------
🔍 Validating environment...
✅ PyTorch: 2.0.1+cu118
✅ Lightning: 2.0.2
✅ PyTorch Forecasting: 1.0.0
✅ Pandas: 2.2.2
✅ PyArrow (Parquet): 14.0.2
🔥 Device: GPU (NVIDIA GeForce RTX 4090)

✅ Environment ready for TFT training!
📦 All dependencies installed
🚀 Ready to proceed with data generation

⏱️  Cell execution time: 7.84 seconds


In [3]:
# Cell 3: System Status Check
# Expected time: < 2 seconds

import time
_cell_start = time.time()

from main import status

print("📊 STEP 2: System Status")
print("-" * 60)
status()
print("\n💡 Tip: Run this cell anytime to check datasets, models, and device status")

_cell_elapsed = time.time() - _cell_start
print(f"\n⏱️  Cell execution time: {_cell_elapsed:.2f} seconds")

📊 STEP 2: System Status
------------------------------------------------------------
🔍 System Status
✅ Datasets (Parquet): 1 file(s)
   Latest: server_metrics.parquet
   Rows: 432,000
   Servers: 25
   Time range: 2025-10-09 14:22:07.728110+00:00 to 2025-10-10 14:22:02.728110+00:00
✅ Models: 1 trained model(s)
   Latest: tft_model_20251010_122056
   Format: Safetensors ✅
🔥 Device: GPU (NVIDIA GeForce RTX 4090)

💡 Tip: Run this cell anytime to check datasets, models, and device status

⏱️  Cell execution time: 0.16 seconds


---

## 🎬 DEMO MODE: Quick 5-Minute Simulation

Generate demo data with configurable health scenarios for testing and visualization.

**Scenarios:**
- **HEALTHY** - 100% healthy system, stable baselines, no issues
- **DEGRADING** (default) - System starts healthy and gradually degrades (CPU, RAM, IOWait increase)
- **CRITICAL** - System starts healthy, then shows acute failure signs with severe spikes

**DEGRADING Pattern:**
- `0:00-1:30` - Stable baseline (healthy)
- `1:30-2:30` - Gradual escalation (warnings)
- `2:30-3:30` - Incident peak (critical)
- `3:30-5:00` - Recovery to normal

Perfect for testing dashboards and understanding the system!

In [None]:
# Cell 4: Generate Demo Dataset (5 minutes with configurable scenario)
# Expected time: 5-10 seconds

import time
_cell_start = time.time()

from demo_data_generator import generate_demo_dataset

print("🎬 STEP 3: Demo Data Generation")
print("-" * 60)

# Configuration - ADJUST THESE VALUES
DEMO_DURATION_MIN = 5      # 5-minute simulation
DEMO_FLEET_SIZE = 10       # 10 servers
DEMO_SEED = 42            # Reproducible results
DEMO_SCENARIO = 'degrading'  # Options: 'healthy', 'degrading', 'critical'
DEMO_OUTPUT_DIR = "./demo_data/"

print(f"⚙️  Configuration:")
print(f"   Duration: {DEMO_DURATION_MIN} minutes")
print(f"   Fleet size: {DEMO_FLEET_SIZE} servers")
print(f"   Scenario: {DEMO_SCENARIO.upper()}")
print(f"   Random seed: {DEMO_SEED}")
print(f"   Output: {DEMO_OUTPUT_DIR}")
print()

scenario_descriptions = {
    'healthy': '100% healthy system, no issues',
    'degrading': 'Gradual resource exhaustion (CPU, RAM, IOWait)',
    'critical': 'Acute failures with severe spikes'
}
print(f"📝 Scenario: {scenario_descriptions.get(DEMO_SCENARIO, 'Unknown')}")
print()

# Generate demo data
success = generate_demo_dataset(
    output_dir=DEMO_OUTPUT_DIR,
    duration_minutes=DEMO_DURATION_MIN,
    fleet_size=DEMO_FLEET_SIZE,
    seed=DEMO_SEED,
    scenario=DEMO_SCENARIO
)

if success:
    print("\n✅ Demo dataset generated successfully!")
    print(f"📁 Location: {DEMO_OUTPUT_DIR}demo_dataset.parquet")
    print(f"📊 Data points: ~{DEMO_DURATION_MIN * 12 * DEMO_FLEET_SIZE:,}")
    print("\n🎯 Ready for dashboard visualization!")
else:
    print("\n❌ Demo generation failed")

_cell_elapsed = time.time() - _cell_start
print(f"\n⏱️  Cell execution time: {_cell_elapsed:.2f} seconds")

---

## 📊 DEMO DASHBOARD: Interactive Visualization

Run the live dashboard with demo data. This shows:
- Fleet-wide aggregate metrics
- Per-server trends
- Anomaly detection and highlighting
- Real-time metric updates

**Note:** Dashboard will refresh every 5 seconds and show the incident pattern evolving.

In [None]:
# Cell 5: Run Demo Dashboard (Interactive)
# Expected time: Runs continuously for ~5 minutes (duration of demo data)

import time
_cell_start = time.time()

from tft_dashboard_refactored import run_dashboard

print("📊 STEP 4: Demo Dashboard")
print("-" * 60)
print("🚀 Launching interactive dashboard...")
print("⏱️  Will run until data is exhausted")
print("📡 Tick interval: 5 seconds (data ingestion)")
print("🔄 Refresh interval: 30 seconds (visualization)")
print("⚠️  Press Ctrl+C to stop early")
print()

# Run dashboard with demo data
try:
    run_dashboard(
        data_path="./demo_data/demo_dataset.parquet",
        data_format='parquet',
        tick_interval_sec=5,   # Ingest data every 5 seconds (like real-time)
        refresh_sec=30,        # Refresh visualizations every 30 seconds
        save_plots=False
    )
    print("\n✅ Dashboard completed!")
except KeyboardInterrupt:
    print("\n⏸️  Dashboard stopped by user")
except Exception as e:
    print(f"\n❌ Dashboard error: {e}")
    import traceback
    traceback.print_exc()

_cell_elapsed = time.time() - _cell_start
_mins = int(_cell_elapsed // 60)
_secs = int(_cell_elapsed % 60)
print(f"\n⏱️  Cell execution time: {_mins}m {_secs}s ({_cell_elapsed:.1f} seconds)")

---

## 🏭 PRODUCTION MODE: Large-Scale Training

Generate production-quality datasets for serious model training.

**Options:**
- **24 hours** (default) - Quick training, good for testing (~30-60 seconds generation)
- **168 hours** (1 week) - Captures weekly patterns (~2-3 minutes generation)
- **720 hours** (30 days) - Best performance, captures all patterns (~5-10 minutes generation)

Generated data is in Parquet format (3-5x faster than CSV).

In [4]:
# Cell 6: Generate Production Training Dataset
# Expected time: 24h=30-60s | 168h=2-3min | 720h=5-10min

import time
_cell_start = time.time()

from metrics_generator import generate_dataset

print("🏭 STEP 5: Production Data Generation")
print("-" * 60)

# Configuration - ADJUST THESE VALUES
#TRAINING_HOURS = 720  # 30 days recommended for best results
# TRAINING_HOURS = 168  # 1 week for faster training
TRAINING_HOURS = 24   # Quick test

print(f"⚙️  Configuration:")
print(f"   Time span: {TRAINING_HOURS} hours ({TRAINING_HOURS/24:.1f} days)")
print(f"   Servers: {CONFIG['servers_count']}")
print(f"   Estimated samples: ~{TRAINING_HOURS * 12 * CONFIG['servers_count']:,}")
print(f"   Output: {CONFIG['training_dir']}")
print()

print("🚀 Generating production dataset...")
print("⏱️  This may take several minutes for large datasets...")

success = generate_dataset(hours=TRAINING_HOURS)

if success:
    print("\n✅ Production dataset generated!")
    print(f"📁 Location: {CONFIG['training_dir']}")
    print("🎯 Ready for model training!")
    
    # Show what was created
    training_path = Path(CONFIG['training_dir'])
    parquet_files = list(training_path.glob("*.parquet"))
    if parquet_files:
        latest = max(parquet_files, key=lambda p: p.stat().st_mtime)
        size_mb = latest.stat().st_size / (1024 * 1024)
        print(f"📊 Latest file: {latest.name} ({size_mb:.1f} MB)")
else:
    print("\n❌ Generation failed")

_cell_elapsed = time.time() - _cell_start
_mins = int(_cell_elapsed // 60)
_secs = int(_cell_elapsed % 60)
print(f"\n⏱️  Cell execution time: {_mins}m {_secs}s ({_cell_elapsed:.1f} seconds)")

🏭 STEP 5: Production Data Generation
------------------------------------------------------------
⚙️  Configuration:
   Time span: 24 hours (1.0 days)
   Servers: 15
   Estimated samples: ~4,320
   Output: ./training/

🚀 Generating production dataset...
⏱️  This may take several minutes for large datasets...

⏰ Time Range:
   Start: 2025-10-09 16:33:54.663072+00:00
   End:   2025-10-10 16:33:49.663072+00:00
   Duration: 0 days 23:59:55
   (Data ends at current time, starts 24 hours ago)

📊 Parquet written: training\server_metrics.parquet (432,000 rows, 27.9 MB)

✅ Production dataset generated!
📁 Location: ./training/
🎯 Ready for model training!
📊 Latest file: server_metrics.parquet (27.9 MB)

⏱️  Cell execution time: 4m 21s (261.8 seconds)


---

## 🤖 MODEL TRAINING: TFT Neural Network

Train the Temporal Fusion Transformer on your data.

**Options:**
- **Fleet-wide model** - Single model for all servers (faster)
- **Per-server models** - Specialized model per server (better accuracy)

**Training Features:**
- Automatic GPU acceleration
- Early stopping to prevent overfitting
- Learning rate monitoring
- Checkpoint saving
- TensorBoard logging

**⏱️ Expected Training Times (RTX 4090):**
- **5 epochs**: ~1.5-2.5 hours
- **10 epochs**: ~3-5 hours
- **20 epochs**: ~6-10 hours
- **30 epochs**: ~9-15 hours

*Times vary based on dataset size and GPU. Each epoch ~20-30 minutes on 432K samples.*

In [None]:
# Cell 7: Train TFT Model (Fleet-wide)
# Expected time: 5 epochs=1.5-2.5h | 20 epochs=6-10h | 30 epochs=9-15h

import time
_cell_start = time.time()

from main import train

print("🤖 STEP 6: Model Training")
print("-" * 60)

# Configuration - ADJUST THESE VALUES
TRAINING_EPOCHS = 10  # Default: 20, increase for better accuracy
PER_SERVER_MODE = False  # Set to True for per-server models

print(f"⚙️  Configuration:")
print(f"   Epochs: {TRAINING_EPOCHS}")
print(f"   Batch size: {CONFIG['batch_size']}")
print(f"   Learning rate: {CONFIG['learning_rate']}")
print(f"   Mode: {'Per-server' if PER_SERVER_MODE else 'Fleet-wide'}")
print(f"   Dataset: {CONFIG['training_dir']}")
print()

# Estimate training time
est_mins_low = TRAINING_EPOCHS * 20
est_mins_high = TRAINING_EPOCHS * 30
print(f"⏱️  Estimated training time: {est_mins_low//60}h {est_mins_low%60}m - {est_mins_high//60}h {est_mins_high%60}m")
print("   (Based on ~20-30 minutes per epoch on RTX 4090 with 432K samples)")
print()

print("🚀 Starting TFT training...")
print("🔥 Using GPU if available")
print()

model_path = train(
    dataset_path=CONFIG['training_dir'],
    epochs=TRAINING_EPOCHS,
    per_server=PER_SERVER_MODE
)

if model_path:
    print("\n✅ Training completed successfully!")
    print(f"📁 Model saved: {model_path}")
    print("🎯 Ready for inference and prediction!")
else:
    print("\n❌ Training failed - check logs above")

_cell_elapsed = time.time() - _cell_start
_hours = int(_cell_elapsed // 3600)
_mins = int((_cell_elapsed % 3600) // 60)
_secs = int(_cell_elapsed % 60)
print(f"\n⏱️  Cell execution time: {_hours}h {_mins}m {_secs}s ({_cell_elapsed:.1f} seconds total)")

🤖 STEP 6: Model Training
------------------------------------------------------------
⚙️  Configuration:
   Epochs: 10
   Batch size: 32
   Learning rate: 0.01
   Mode: Fleet-wide
   Dataset: ./training/

⏱️  Estimated training time: 3h 20m - 5h 0m
   (Based on ~20-30 minutes per epoch on RTX 4090 with 432K samples)

🚀 Starting TFT training...
🔥 Using GPU if available

[TRAIN] Starting TFT training (Phase 2 optimized)...
 Random seed set to 42 for reproducibility
[SEARCH] Looking for dataset in: D:\machine_learning\MonitoringPrediction\training
[DIR] Files found: ['desktop.ini', 'metrics_metadata.json', 'server_metrics.parquet']
[INFO] Loading parquet dataset: training\server_metrics.parquet
[OK] Loaded 432,000 records from parquet
[PREP] Preparing data for TFT training...
[INFO] Original columns: ['timestamp', 'server_name', 'profile', 'state', 'problem_child', 'cpu_pct', 'mem_pct', 'disk_io_mb_s', 'net_in_mb_s', 'net_out_mb_s', 'latency_ms', 'error_rate', 'gc_pause_ms', 'container_oo

GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs


[OK] Training samples: 348000
[OK] Validation samples: 25
 Data loading: 4 workers, pin_memory=True
[OK] Model created with 86,841 parameters
[SAVE] Checkpointing enabled: ./checkpoints/
[INFO] TensorBoard logging: ./logs/tft_training\20251010_124323
[TRAIN] Learning rate monitoring enabled
[INFO] Enhanced progress reporting enabled
[STOP] Early stopping: patience=8
[START] Training for 10 epochs...


You are using a CUDA device ('NVIDIA GeForce RTX 4090') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

   | Name                               | Type                            | Params
----------------------------------------------------------------------------------------
0  | loss                               | QuantileLoss                    | 0     
1  | logging_metrics                    | ModuleList                      | 0     
2  | input_embeddings                   | MultiEmbedding                  | 40    
3  | prescalers                         | ModuleDict                      | 384   
4  | static_variable_selection          | VariableSelectionNetwork        | 5.7 K 
5  | e

Sanity Checking: 0it [00:00, ?it/s]


[START] TRAINING STARTED


Training: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]


[INFO] Epoch 1/10 completed in 1587.5s
   Train Loss: 2.8871 | Val Loss: 3.5738 [BEST] NEW BEST
   Progress: [1/10] 10.0%
   ETA: 238.1 min | Elapsed: 26.5 min


Metric val_loss improved. New best score: 3.574
Epoch 0, global step 10875: 'val_loss' reached 3.57376 (best 3.57376), saving model to 'checkpoints\\tft-epoch=00-val_loss=3.5738.ckpt' as top 3


In [None]:
# Cell 8: Inspect Trained Model
# Expected time: < 1 second

import time
_cell_start = time.time()

from pathlib import Path

print("🔍 STEP 7: Model Inspection")
print("-" * 60)

models_dir = Path(CONFIG['models_dir'])

if models_dir.exists():
    model_dirs = sorted(models_dir.glob('tft_model_*'), key=lambda p: p.stat().st_mtime)
    
    if model_dirs:
        print(f"✅ Found {len(model_dirs)} trained model(s)\n")
        
        # Show latest model
        latest_model = model_dirs[-1]
        print(f"📦 Latest Model: {latest_model.name}")
        print(f"📅 Created: {latest_model.stat().st_mtime}")
        print()
        
        # List model files
        print("📁 Model Contents:")
        for file in sorted(latest_model.glob('*')):
            size_kb = file.stat().st_size / 1024
            if size_kb > 1024:
                print(f"   {file.name}: {size_kb/1024:.1f} MB")
            else:
                print(f"   {file.name}: {size_kb:.1f} KB")
        
        # Check for safetensors
        if (latest_model / "model.safetensors").exists():
            print("\n✅ Model format: Safetensors (secure)")
        elif (latest_model / "model.pth").exists():
            print("\n⚠️  Model format: PyTorch (legacy)")
            
    else:
        print("❌ No models found - run training first")
else:
    print("❌ Models directory not found")

_cell_elapsed = time.time() - _cell_start
print(f"\n⏱️  Cell execution time: {_cell_elapsed:.2f} seconds")

---

## 📊 PRODUCTION DASHBOARD: Live Monitoring

Run the dashboard with production training data for continuous monitoring.

This dashboard reads from your training dataset and shows:
- Real-time metric trends
- Fleet-wide aggregates
- Anomaly detection
- Critical server identification

**Use Cases:**
- Monitor model training data quality
- Validate data generation patterns
- Test alerting thresholds
- Demonstrate system capabilities

In [None]:
# Cell 9: Run Production Dashboard with Training Data
# Expected time: Runs continuously (user-controlled)

import time
_cell_start = time.time()

from tft_dashboard_refactored import run_dashboard

print("📊 STEP 8: Production Dashboard")
print("-" * 60)

# Configuration
DASHBOARD_TICK_INTERVAL_SEC = 5   # Ingest data every 5 seconds
DASHBOARD_REFRESH_SEC = 30         # Refresh dashboard every 30 seconds

# Find latest training data
training_path = Path(CONFIG['training_dir'])
parquet_files = list(training_path.glob("*.parquet"))

if parquet_files:
    latest_data = max(parquet_files, key=lambda p: p.stat().st_mtime)
    
    print(f"📁 Data source: {latest_data.name}")
    print(f"📡 Tick interval: every {DASHBOARD_TICK_INTERVAL_SEC} seconds (data ingestion)")
    print(f"🔄 Refresh interval: every {DASHBOARD_REFRESH_SEC} seconds (visualization)")
    print("⚠️  Press Ctrl+C to stop")
    print()
    
    try:
        run_dashboard(
            data_path=str(latest_data),
            data_format='parquet',
            tick_interval_sec=DASHBOARD_TICK_INTERVAL_SEC,
            refresh_sec=DASHBOARD_REFRESH_SEC,
            save_plots=False
        )
        print("\n✅ Dashboard completed!")
    except KeyboardInterrupt:
        print("\n⏸️  Dashboard stopped by user")
    except Exception as e:
        print(f"\n❌ Dashboard error: {e}")
        import traceback
        traceback.print_exc()
else:
    print("❌ No training data found")
    print("   Run Cell 6 to generate production dataset first")

_cell_elapsed = time.time() - _cell_start
_mins = int(_cell_elapsed // 60)
_secs = int(_cell_elapsed % 60)
print(f"\n⏱️  Cell execution time: {_mins}m {_secs}s ({_cell_elapsed:.1f} seconds)")

---

## 🎯 QUICK REFERENCE

### Common Workflows

**Quick Demo (5 minutes):**
```python
# Run cells 1-5 for complete demo with dashboard
```

**Production Training:**
```python
# 1. Generate data (Cell 6)
# 2. Train model (Cell 7)
# 3. Inspect model (Cell 8)
```

**Monitoring:**
```python
# Demo dashboard: Cell 5
# Production dashboard: Cell 9
```

### Configuration Shortcuts

In [None]:
# Cell 10: Quick Configuration Reference
# Expected time: < 1 second

import time
_cell_start = time.time()

print("⚙️  CONFIGURATION QUICK REFERENCE")
print("=" * 60)

print("\n📊 Data Generation:")
print(f"   CONFIG['time_span_hours'] = {CONFIG['time_span_hours']}")
print(f"   CONFIG['servers_count'] = {CONFIG['servers_count']}")
print(f"   CONFIG['training_dir'] = '{CONFIG['training_dir']}'")

print("\n🤖 Model Architecture:")
print(f"   CONFIG['hidden_size'] = {CONFIG['hidden_size']}")
print(f"   CONFIG['attention_heads'] = {CONFIG['attention_heads']}")
print(f"   CONFIG['dropout'] = {CONFIG['dropout']}")

print("\n🏋️  Training:")
print(f"   CONFIG['epochs'] = {CONFIG['epochs']}")
print(f"   CONFIG['batch_size'] = {CONFIG['batch_size']}")
print(f"   CONFIG['learning_rate'] = {CONFIG['learning_rate']}")
print(f"   CONFIG['early_stopping_patience'] = {CONFIG['early_stopping_patience']}")

print("\n🔮 Prediction:")
print(f"   CONFIG['prediction_horizon'] = {CONFIG['prediction_horizon']} steps (8 hours)")
print(f"   CONFIG['context_length'] = {CONFIG['context_length']} steps (24 hours)")
print(f"   CONFIG['min_encoder_length'] = {CONFIG['min_encoder_length']}")

print("\n📁 Directories:")
print(f"   Training: {CONFIG['training_dir']}")
print(f"   Models: {CONFIG['models_dir']}")
print(f"   Checkpoints: {CONFIG['checkpoints_dir']}")
print(f"   Logs: {CONFIG['logs_dir']}")

print("\n💡 Tip: Edit config.py to change these defaults")

_cell_elapsed = time.time() - _cell_start
print(f"\n⏱️  Cell execution time: {_cell_elapsed:.2f} seconds")

In [None]:
# Cell 11: One-Command Demo Runner (Alternative)
# Expected time: ~5 minutes (continuous dashboard)
# This is equivalent to running run_demo.py from command line

import time
_cell_start = time.time()

from run_demo import run_demo

print("🚀 ONE-COMMAND DEMO")
print("=" * 60)
print("This cell combines data generation + dashboard in one step")
print()

# Run complete demo
run_demo(
    duration_minutes=5,
    fleet_size=10,
    seed=42,
    scenario='degrading',  # Options: 'healthy', 'degrading', 'critical'
    tick_interval_seconds=5,   # Ingest data every 5 seconds (real-time simulation)
    refresh_seconds=30,        # Refresh dashboard every 30 seconds
    output_dir="./demo_data/",
    regenerate=False  # Set to True to force new data
)

_cell_elapsed = time.time() - _cell_start
_mins = int(_cell_elapsed // 60)
_secs = int(_cell_elapsed % 60)
print(f"\n⏱️  Cell execution time: {_mins}m {_secs}s ({_cell_elapsed:.1f} seconds)")

In [None]:
# Cell 12: Final System Status
# Expected time: < 2 seconds

import time
_cell_start = time.time()

from main import status

print("🎉 PIPELINE COMPLETE - FINAL STATUS")
print("=" * 60)
status()

print("\n" + "=" * 60)
print("✅ TFT Monitoring System Ready!")
print("=" * 60)

print("\n📚 Next Steps:")
print("   • Run demo dashboard (Cell 5)")
print("   • Train production model (Cell 7)")
print("   • Monitor with live dashboard (Cell 9)")
print("   • Check model performance in TensorBoard")
print()
print("💡 Command Line:")
print("   python run_demo.py              # Quick demo")
print("   python main.py status           # Check status")
print("   python main.py generate --hours 720")
print("   python main.py train --epochs 30")
print()
print("📖 Documentation: See README.md and REPOMAP.md")

_cell_elapsed = time.time() - _cell_start
print(f"\n⏱️  Cell execution time: {_cell_elapsed:.2f} seconds")