# Phase 10: Production Hardening & Operations

Preparing the system for reliable, continuous operation.

This notebook covers:
- **System monitoring** â€” health checks and metrics
- **Alerting** â€” drawdown and anomaly detection
- **Configuration management** â€” parameter updates
- **Performance reporting** â€” daily/weekly summaries

---

```bash
pip install pandas numpy matplotlib
```

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime, timedelta
from dataclasses import dataclass, field
from typing import List, Dict, Optional
import json

plt.style.use('seaborn-v0_8-darkgrid')
np.random.seed(42)

---
## 10.1 System Health Monitoring

Track key metrics to ensure the trading system is operating correctly.

In [None]:
@dataclass
class SystemMetrics:
    timestamp: datetime
    equity: float
    open_positions: int
    daily_pnl: float
    daily_trades: int
    latency_ms: float
    api_errors: int
    cpu_percent: float
    memory_mb: float


class HealthMonitor:
    def __init__(self):
        self.metrics_history: List[SystemMetrics] = []
        self.alerts: List[Dict] = []
        
        # Thresholds
        self.max_latency_ms = 500
        self.max_api_errors = 5
        self.max_cpu_percent = 80
        self.max_memory_mb = 4000
        self.max_daily_drawdown = 0.03  # 3%
    
    def record(self, metrics: SystemMetrics):
        self.metrics_history.append(metrics)
        self._check_alerts(metrics)
    
    def _check_alerts(self, m: SystemMetrics):
        if m.latency_ms > self.max_latency_ms:
            self._add_alert('WARNING', f'High latency: {m.latency_ms:.0f}ms', m.timestamp)
        
        if m.api_errors > self.max_api_errors:
            self._add_alert('ERROR', f'API errors: {m.api_errors}', m.timestamp)
        
        if m.cpu_percent > self.max_cpu_percent:
            self._add_alert('WARNING', f'High CPU: {m.cpu_percent:.0f}%', m.timestamp)
        
        if m.memory_mb > self.max_memory_mb:
            self._add_alert('WARNING', f'High memory: {m.memory_mb:.0f}MB', m.timestamp)
        
        if len(self.metrics_history) > 1:
            prev = self.metrics_history[-2]
            if prev.equity > 0:
                daily_dd = (m.equity - prev.equity) / prev.equity
                if daily_dd < -self.max_daily_drawdown:
                    self._add_alert('CRITICAL', f'Daily drawdown: {daily_dd:.2%}', m.timestamp)
    
    def _add_alert(self, level: str, message: str, timestamp: datetime):
        alert = {'level': level, 'message': message, 'timestamp': timestamp}
        self.alerts.append(alert)
        print(f"[{level}] {timestamp}: {message}")
    
    def get_status(self) -> Dict:
        if not self.metrics_history:
            return {'status': 'NO_DATA'}
        
        latest = self.metrics_history[-1]
        recent_alerts = [a for a in self.alerts 
                        if a['timestamp'] > datetime.now() - timedelta(hours=1)]
        
        status = 'HEALTHY'
        if any(a['level'] == 'CRITICAL' for a in recent_alerts):
            status = 'CRITICAL'
        elif any(a['level'] == 'ERROR' for a in recent_alerts):
            status = 'ERROR'
        elif any(a['level'] == 'WARNING' for a in recent_alerts):
            status = 'WARNING'
        
        return {
            'status': status,
            'equity': latest.equity,
            'open_positions': latest.open_positions,
            'daily_pnl': latest.daily_pnl,
            'latency_ms': latest.latency_ms,
            'recent_alerts': len(recent_alerts)
        }


# Simulate system operation
monitor = HealthMonitor()
base_time = datetime.now()
equity = 100_000

print("Simulating 24 hours of system operation...\n")

for hour in range(24):
    # Simulate some metrics with occasional issues
    pnl = np.random.normal(100, 500)
    equity += pnl
    
    latency = np.random.exponential(50)
    if hour == 14:  # Simulate a latency spike
        latency = 800
    
    api_errors = np.random.poisson(1)
    if hour == 18:  # Simulate API issues
        api_errors = 8
    
    metrics = SystemMetrics(
        timestamp=base_time + timedelta(hours=hour),
        equity=equity,
        open_positions=np.random.randint(0, 5),
        daily_pnl=pnl,
        daily_trades=np.random.randint(5, 20),
        latency_ms=latency,
        api_errors=api_errors,
        cpu_percent=np.random.uniform(20, 60),
        memory_mb=np.random.uniform(1000, 2000)
    )
    monitor.record(metrics)

print(f"\n=== Current Status ===")
status = monitor.get_status()
for k, v in status.items():
    print(f"  {k}: {v}")

---
## 10.2 Alerting System

Configurable alerts for key events.

In [None]:
@dataclass
class AlertRule:
    name: str
    condition: str  # e.g., "drawdown > 5%"
    severity: str   # INFO, WARNING, CRITICAL
    cooldown_minutes: int = 60  # Don't repeat alert within this period
    last_triggered: Optional[datetime] = None


class AlertManager:
    def __init__(self):
        self.rules: List[AlertRule] = [
            AlertRule('Daily Drawdown', 'daily_dd > 0.03', 'WARNING', 120),
            AlertRule('Weekly Drawdown', 'weekly_dd > 0.10', 'CRITICAL', 240),
            AlertRule('Max Drawdown', 'max_dd > 0.15', 'CRITICAL', 480),
            AlertRule('Consecutive Losses', 'consec_losses >= 5', 'WARNING', 60),
            AlertRule('Win Rate Drop', 'win_rate < 0.35', 'WARNING', 120),
            AlertRule('High Exposure', 'exposure > 0.8', 'WARNING', 30),
            AlertRule('Connection Lost', 'connected == False', 'CRITICAL', 5),
        ]
        self.triggered_alerts: List[Dict] = []
    
    def evaluate(self, metrics: Dict, timestamp: datetime):
        for rule in self.rules:
            # Check cooldown
            if rule.last_triggered:
                if timestamp < rule.last_triggered + timedelta(minutes=rule.cooldown_minutes):
                    continue
            
            # Evaluate condition (simplified)
            triggered = False
            if 'daily_dd' in rule.condition and metrics.get('daily_dd', 0) > 0.03:
                triggered = True
            elif 'weekly_dd' in rule.condition and metrics.get('weekly_dd', 0) > 0.10:
                triggered = True
            elif 'max_dd' in rule.condition and metrics.get('max_dd', 0) > 0.15:
                triggered = True
            elif 'consec_losses' in rule.condition and metrics.get('consec_losses', 0) >= 5:
                triggered = True
            elif 'win_rate' in rule.condition and metrics.get('win_rate', 1) < 0.35:
                triggered = True
            
            if triggered:
                rule.last_triggered = timestamp
                alert = {
                    'rule': rule.name,
                    'severity': rule.severity,
                    'timestamp': timestamp,
                    'metrics': metrics
                }
                self.triggered_alerts.append(alert)
                self._send_alert(alert)
    
    def _send_alert(self, alert: Dict):
        # In production: send email, Slack, SMS, etc.
        print(f"ðŸš¨ [{alert['severity']}] {alert['rule']} triggered at {alert['timestamp']}")


# Demo alert manager
alert_mgr = AlertManager()

# Simulate some concerning metrics
test_metrics = [
    {'daily_dd': 0.02, 'weekly_dd': 0.05, 'consec_losses': 2, 'win_rate': 0.45},
    {'daily_dd': 0.04, 'weekly_dd': 0.08, 'consec_losses': 5, 'win_rate': 0.40},
    {'daily_dd': 0.05, 'weekly_dd': 0.12, 'consec_losses': 7, 'win_rate': 0.30},
]

print("Evaluating metrics...\n")
for i, metrics in enumerate(test_metrics):
    timestamp = datetime.now() + timedelta(hours=i*3)
    print(f"Hour {i*3}: {metrics}")
    alert_mgr.evaluate(metrics, timestamp)
    print()

---
## 10.3 Configuration Management

Manage strategy parameters with versioning and validation.

In [None]:
class ConfigManager:
    def __init__(self):
        self.current_config = {}
        self.config_history: List[Dict] = []
        self.schema = {
            'risk_per_trade': {'type': float, 'min': 0.001, 'max': 0.05},
            'max_leverage': {'type': float, 'min': 1.0, 'max': 10.0},
            'max_positions': {'type': int, 'min': 1, 'max': 20},
            'stop_loss_atr_mult': {'type': float, 'min': 0.5, 'max': 5.0},
            'take_profit_atr_mult': {'type': float, 'min': 1.0, 'max': 10.0},
            'ma_fast_period': {'type': int, 'min': 5, 'max': 50},
            'ma_slow_period': {'type': int, 'min': 20, 'max': 200},
        }
    
    def validate(self, config: Dict) -> List[str]:
        errors = []
        for key, value in config.items():
            if key not in self.schema:
                errors.append(f"Unknown parameter: {key}")
                continue
            
            spec = self.schema[key]
            if not isinstance(value, spec['type']):
                errors.append(f"{key}: expected {spec['type'].__name__}, got {type(value).__name__}")
            elif value < spec['min'] or value > spec['max']:
                errors.append(f"{key}: {value} out of range [{spec['min']}, {spec['max']}]")
        
        # Cross-parameter validation
        if config.get('ma_fast_period', 0) >= config.get('ma_slow_period', float('inf')):
            errors.append("ma_fast_period must be less than ma_slow_period")
        
        return errors
    
    def update(self, new_config: Dict, reason: str = "") -> bool:
        errors = self.validate(new_config)
        if errors:
            print(f"Config validation failed:")
            for e in errors:
                print(f"  - {e}")
            return False
        
        # Save history
        self.config_history.append({
            'timestamp': datetime.now(),
            'config': self.current_config.copy(),
            'reason': reason
        })
        
        self.current_config.update(new_config)
        print(f"Config updated successfully. Reason: {reason}")
        return True
    
    def rollback(self, steps: int = 1) -> bool:
        if steps > len(self.config_history):
            print("Cannot rollback: not enough history")
            return False
        
        old_config = self.config_history[-steps]['config']
        self.current_config = old_config.copy()
        print(f"Rolled back {steps} step(s)")
        return True


# Demo config management
config_mgr = ConfigManager()

# Initial config
initial = {
    'risk_per_trade': 0.01,
    'max_leverage': 3.0,
    'max_positions': 5,
    'stop_loss_atr_mult': 2.0,
    'take_profit_atr_mult': 4.0,
    'ma_fast_period': 20,
    'ma_slow_period': 50
}
config_mgr.update(initial, "Initial configuration")

print("\nCurrent config:")
for k, v in config_mgr.current_config.items():
    print(f"  {k}: {v}")

# Try invalid update
print("\nAttempting invalid update...")
config_mgr.update({'risk_per_trade': 0.10}, "Increase risk")  # Too high

# Valid update
print("\nAttempting valid update...")
config_mgr.update({'risk_per_trade': 0.015, 'max_leverage': 4.0}, "Increase risk slightly")

---
## 10.4 Performance Reporting

Generate daily/weekly performance summaries.

In [None]:
def generate_performance_report(trades_df: pd.DataFrame, equity_curve: pd.Series) -> Dict:
    """Generate comprehensive performance report."""
    returns = equity_curve.pct_change().dropna()
    
    # Basic stats
    total_return = equity_curve.iloc[-1] / equity_curve.iloc[0] - 1
    ann_return = (1 + total_return) ** (252 / len(returns)) - 1
    ann_vol = returns.std() * np.sqrt(252)
    sharpe = ann_return / ann_vol if ann_vol > 0 else 0
    
    # Drawdown
    peak = equity_curve.expanding().max()
    drawdown = (equity_curve - peak) / peak
    max_dd = drawdown.min()
    
    # Trade stats
    n_trades = len(trades_df)
    win_rate = (trades_df['pnl'] > 0).mean() if n_trades > 0 else 0
    avg_win = trades_df[trades_df['pnl'] > 0]['pnl'].mean() if (trades_df['pnl'] > 0).any() else 0
    avg_loss = trades_df[trades_df['pnl'] <= 0]['pnl'].mean() if (trades_df['pnl'] <= 0).any() else 0
    profit_factor = abs(trades_df[trades_df['pnl'] > 0]['pnl'].sum() / 
                        trades_df[trades_df['pnl'] <= 0]['pnl'].sum()) if (trades_df['pnl'] <= 0).sum() != 0 else float('inf')
    
    return {
        'period': f"{equity_curve.index[0].date()} to {equity_curve.index[-1].date()}",
        'total_return': f"{total_return:.2%}",
        'annualized_return': f"{ann_return:.2%}",
        'annualized_volatility': f"{ann_vol:.2%}",
        'sharpe_ratio': f"{sharpe:.2f}",
        'max_drawdown': f"{max_dd:.2%}",
        'total_trades': n_trades,
        'win_rate': f"{win_rate:.1%}",
        'avg_winning_trade': f"${avg_win:,.2f}",
        'avg_losing_trade': f"${avg_loss:,.2f}",
        'profit_factor': f"{profit_factor:.2f}"
    }


# Generate sample data
np.random.seed(42)
n_days = 252
dates = pd.date_range('2024-01-01', periods=n_days, freq='B')

# Equity curve
daily_returns = np.random.normal(0.0005, 0.015, n_days)
equity = 100_000 * np.exp(np.cumsum(daily_returns))
equity_curve = pd.Series(equity, index=dates)

# Trade log
n_trades = 150
trades = pd.DataFrame({
    'date': np.random.choice(dates, n_trades),
    'symbol': np.random.choice(['AAPL', 'GOOGL', 'MSFT', 'BTC'], n_trades),
    'side': np.random.choice(['long', 'short'], n_trades),
    'pnl': np.random.normal(200, 800, n_trades)
})

report = generate_performance_report(trades, equity_curve)

print("=" * 50)
print("       PERFORMANCE REPORT")
print("=" * 50)
for k, v in report.items():
    print(f"  {k.replace('_', ' ').title():<25} {v:>20}")
print("=" * 50)

# Visualize
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Equity curve
axes[0, 0].plot(equity_curve.index, equity_curve, color='steelblue')
axes[0, 0].set_ylabel('Equity ($)')
axes[0, 0].set_title('Equity Curve')

# Drawdown
peak = equity_curve.expanding().max()
dd = (equity_curve - peak) / peak * 100
axes[0, 1].fill_between(dd.index, dd, 0, color='red', alpha=0.3)
axes[0, 1].set_ylabel('Drawdown (%)')
axes[0, 1].set_title('Drawdown')

# Monthly returns
monthly = equity_curve.resample('ME').last().pct_change().dropna() * 100
colors = ['green' if x > 0 else 'red' for x in monthly]
axes[1, 0].bar(monthly.index, monthly, width=20, color=colors, alpha=0.7)
axes[1, 0].axhline(y=0, color='gray', linewidth=0.5)
axes[1, 0].set_ylabel('Return (%)')
axes[1, 0].set_title('Monthly Returns')

# Trade P&L distribution
axes[1, 1].hist(trades['pnl'], bins=30, color='steelblue', alpha=0.7, edgecolor='white')
axes[1, 1].axvline(x=0, color='red', linestyle='--')
axes[1, 1].axvline(x=trades['pnl'].mean(), color='green', linestyle='--', label=f"Mean: ${trades['pnl'].mean():.0f}")
axes[1, 1].set_xlabel('P&L ($)')
axes[1, 1].set_title('Trade P&L Distribution')
axes[1, 1].legend()

plt.tight_layout()
plt.show()

---
## 10.5 Comprehension Check

1. Your bot is live. What's the first thing you should monitor?
2. An alert triggers for 3% daily drawdown. Should you:
   - (a) Immediately stop trading
   - (b) Investigate the cause
   - (c) Reduce position sizes
   - (d) All of the above, in what order?
3. Why is configuration versioning important? What happens if a parameter change causes losses?
4. Your Sharpe ratio drops from 1.5 to 0.8 over a month. What would you investigate?
5. Design a checklist for deploying a new strategy to production.

In [None]:
# YOUR ANSWERS HERE


---
## Congratulations!

You've completed all 10 phases of the Trading Bot Roadmap:

1. âœ… Core Infrastructure
2. âœ… Risk Management & Position Sizing
3. âœ… Indicator Engine & Basic Strategies
4. âœ… Advanced Strategy Implementation
5. âœ… Backtesting & Simulation
6. âœ… Pyramiding & Scaling
7. âœ… Quantitative & Adaptive Analysis
8. âœ… Derivatives & Hedging
9. âœ… HFT & Algorithmic Execution
10. âœ… Production Hardening & Operations

**Remember the key principles:**
- Risk management is more important than the strategy itself
- Start with simple strategies before adding complexity
- Always backtest thoroughly before deploying
- Monitor your system continuously
- Consistency beats maximum leverage

Good luck with your trading bot!