# TFT MONITORING SYSTEM

## Temporal Fusion Transformer for Server Monitoring

### üöÄ STREAMLINED TFT WORKFLOW:
1. `setup()` - Initialize TFT environment
2. `generate_dataset()` - Generate realistic server metrics
3. `train()` - Train TFT model with PyTorch Forecasting
4. `test()` - Test multi-horizon predictions
5. `demo()` - Run live monitoring demo

### üéØ TFT FEATURES:
- **Multi-horizon forecasting**: Predict 6 steps ahead (30 minutes)
- **Attention mechanism**: Identify important features automatically
- **Uncertainty quantification**: Get confidence intervals with predictions
- **GPU acceleration**: Optimized for CUDA with mixed precision
- **Secure storage**: Models saved with Safetensors format

### üìä MONITORING:
- `status()` - Check system status
- `cleanup()` - Clean old files

**Architecture:** PyTorch 2.0.1 + PyTorch Lightning 2.0.2 + PyTorch Forecasting

**Model:** TemporalFusionTransformer with attention-based feature importance

#### Generate dataset
python metrics_generator.py --hours 168 --output training/metrics_dataset.json

#### Train model
python tft_model_trainer.py --epochs 30 --batch-size 32

#### Run inference
python tft_inference.py --input-file test_data.json --output-file predictions.json

## üõ†Ô∏è Troubleshooting & Advanced Usage

### Common Commands:
- `status()` - Complete system status
- `cleanup()` - Clean old checkpoints and logs
- `generate_dataset(hours=X, force_regenerate=True)` - Regenerate data

### Configuration:
Modify `CONFIG` in `config.py` for advanced settings:
```python
CONFIG['epochs'] = 50              # More training
CONFIG['batch_size'] = 64          # Larger batches (if GPU allows)
CONFIG['prediction_horizon'] = 12  # Predict 1 hour ahead
CONFIG['context_length'] = 48      # Use 4 hours of history

In [None]:
# Import the streamlined TFT system
from main_notebook import *
from config import CONFIG, get_system_info

print("üéØ TFT Monitoring System")
print("üìà Temporal Fusion Transformer for Server Prediction")
print(f"‚ö° Ready for GPU-accelerated time-series forecasting")

In [None]:
# 1. Setup TFT environment
print("üöÄ Setting up TFT Monitoring System...")
print("This includes: environment validation, directories, GPU detection")

setup_success = setup()

if setup_success:
    print("\n‚úÖ TFT setup complete!")
    
    # Show system information
    info = get_system_info()
    print(f"\nüñ•Ô∏è  System Configuration:")
    print(f"   Environment: {info['environment']}")
    print(f"   Framework: {info['framework']}")
    if info.get('gpu_name'):
        print(f"   GPU: {info['gpu_name']} ({info['gpu_memory_gb']}GB)")
    
    print(f"\nüéØ TFT Model Configuration:")
    print(f"   Context length: {CONFIG['context_length']} steps (2 hours)")
    print(f"   Prediction horizon: {CONFIG['prediction_horizon']} steps (30 minutes)")
    print(f"   Batch size: {CONFIG['batch_size']} (optimized for your hardware)")
    print(f"   Mixed precision: {CONFIG['mixed_precision']}")
    
    print("\nNext: generate_dataset() to create training data")
else:
    print("\n‚ùå Setup failed. Check PyTorch Forecasting installation.")

In [None]:
# 2. Check current system status
status()

In [None]:
# 3. Generate TFT training dataset
print("üìä GENERATING TFT TRAINING DATASET")
print("="*50)
print("üéØ Creating realistic server metrics with temporal patterns")
print("‚è±Ô∏è  Default: 168 hours (1 week) across 57 servers")
print("üìà Features: CPU, Memory, Disk, Load, Network, Java metrics")
print("üîÑ Patterns: idle, healthy, spikes, critical, recovery states")
print()

# Generate 1 week of data (can be customized)
HOURS = 168  # 1 week - adjust as needed
print(f"Generating {HOURS} hours of data...")

generation_success = generate_dataset(hours=HOURS)

if generation_success:
    print(f"\n‚úÖ Dataset generation completed!")
    print("üìä Dataset includes:")
    print("   - Realistic server behavioral patterns")
    print("   - Time-series with 5-minute polling intervals")
    print("   - Normal operations and anomaly conditions")
    print("   - Multiple server profiles (production, staging, etc.)")
    print("\nNext: train() to train the TFT model")
else:
    print("\n‚ùå Dataset generation failed")

In [None]:
# 4. Train TFT model
print("üèãÔ∏è TRAINING TFT MODEL")
print("="*40)
print("ü§ñ Model: Temporal Fusion Transformer")
print("‚ö° Framework: PyTorch Forecasting")
print("üéØ Task: Multi-horizon time-series prediction")
print()

training_success = train()

if training_success:
    print("\nüéâ TFT training completed!")
    print("üí° Model capabilities:")
    print("   ‚úÖ Multi-step ahead forecasting")
    print("   ‚úÖ Attention-based feature selection")
    print("   ‚úÖ Uncertainty quantification")
    print("   ‚úÖ Anomaly detection")
    print("\nNext: test() to validate predictions")
else:
    print("\n‚ùå Training failed - check dataset and GPU memory")

In [None]:
# 5. Test TFT model predictions
print("üß™ TESTING TFT MODEL")
print("="*30)
print("Testing scenarios:")
print("‚Ä¢ Normal operation trends")
print("‚Ä¢ Gradual performance degradation")
print("‚Ä¢ Spike pattern detection")
print()

test_success = test()

if test_success:
    print("\n‚úÖ TFT model testing successful!")
    print("üí° Model demonstrates:")
    print("   - Accurate multi-horizon predictions")
    print("   - Uncertainty quantification")
    print("   - Automatic alert generation")
    print("   - Feature importance analysis")
    print("\nNext: demo() to run live monitoring")
else:
    print("\n‚ùå Testing failed - check model training")

In [None]:
# 6. Run TFT monitoring demo
print("üé≠ TFT MONITORING DEMO")
print("="*25)
print("Features:")
print("‚Ä¢ Real-time multi-horizon forecasting")
print("‚Ä¢ Attention-based predictions")
print("‚Ä¢ Automated alert generation")
print("‚Ä¢ Uncertainty quantification")
print()

# Customize demo duration
DEMO_MINUTES = 3

print(f"Running {DEMO_MINUTES}-minute live demo...")
print("üîÆ Will simulate real-time server monitoring with TFT predictions")
print()

try:
    demo_success = demo(minutes=DEMO_MINUTES)
    
    if demo_success:
        print("\n‚úÖ TFT demo completed!")
        print("üéØ Demo showcased:")
        print("   - Multi-step ahead predictions")
        print("   - Real-time anomaly detection")
        print("   - Attention mechanism insights")
        print("   - Uncertainty-aware forecasting")
    else:
        print("\n‚ùå Demo encountered issues")
        
except KeyboardInterrupt:
    print("\n‚èπÔ∏è  Demo stopped by user")

In [None]:
# 7. Final system status and summary
print("üìã FINAL TFT SYSTEM STATUS")
print("="*40)

status()

print("\nüéâ TFT SYSTEM COMPLETE!")
print("="*30)
print("Your TFT monitoring system includes:")
print("  ‚úÖ Temporal Fusion Transformer model")
print("  ‚úÖ Multi-horizon forecasting (6 steps ahead)")
print("  ‚úÖ Attention-based feature importance")
print("  ‚úÖ GPU-accelerated training pipeline")
print("  ‚úÖ Safetensors secure model storage")
print("  ‚úÖ Real-time anomaly detection")
print("  ‚úÖ Uncertainty quantification")
print()
print("üîß NEXT STEPS:")
print("  ‚Ä¢ Connect to real MongoDB data sources")
print("  ‚Ä¢ Customize alert thresholds in config.py")
print("  ‚Ä¢ Set up continuous retraining pipeline")
print("  ‚Ä¢ Deploy for production monitoring")
print()
print("üí° COMMAND LINE USAGE:")
print("  python metrics_generator.py --hours 168")
print("  python tft_model_trainer.py --epochs 30")
print("  python tft_inference.py --input-file data.json")

In [None]:
# 8. Quick command reference
def show_commands():
    """Show available TFT system commands."""
    print("üîß TFT SYSTEM COMMANDS")
    print("="*30)
    print()
    print("üìä Dataset Management:")
    print("generate_dataset(hours=168)     # Generate training data")
    print("generate_dataset(hours=720, force_regenerate=True)  # 30 days, force regen")
    print()
    print("üèãÔ∏è Model Training:")
    print("train()                         # Train TFT model")
    print("train(resume=True)              # Resume from checkpoint")
    print()
    print("üß™ Testing & Demo:")
    print("test()                          # Test model predictions")
    print("demo(minutes=5)                 # Live monitoring demo")
    print()
    print("üîç System Management:")
    print("status()                        # System status")
    print("cleanup()                       # Clean old files")
    print("quick_start_guide()             # Full documentation")
    print()
    print("‚öôÔ∏è  Configuration:")
    print("CONFIG['epochs'] = 50           # Adjust training epochs")
    print("CONFIG['batch_size'] = 64       # Adjust batch size")
    print("CONFIG['prediction_horizon'] = 12  # Predict further ahead")

# Show commands
show_commands()