# ML Workflow Example

This notebook demonstrates how to integrate machine learning models into the trading system workflow.

## Overview

The ML workflow includes:
1. Training models on historical backtest data
2. Using models for signal prediction/enhancement
3. Integrating ML predictions into backtesting
4. Model versioning and management

## ML Integration Modes

- **Score Enhancement**: ML prediction adds weighted component to signal score
- **Filter**: Signals below confidence threshold are filtered out
- **Replace**: Signal score is replaced with ML prediction


In [None]:
import sys
from pathlib import Path
import pandas as pd
import numpy as np

# Add project root to path
project_root = Path().resolve().parent.parent.parent
sys.path.insert(0, str(project_root))

from trading_system.integration.runner import BacktestRunner
from trading_system.configs.run_config import RunConfig
from trading_system.ml.training import MLTrainer
from trading_system.ml.predictor import MLPredictor
from trading_system.ml.models import MLModel
from trading_system.ml.feature_engineering import MLFeatureEngineer
from trading_system.ml.versioning import MLModelVersioning

print("Setup complete!")


## Step 1: Generate Training Data

First, run a backtest to generate training data (features and labels).


In [None]:
# Run backtest to generate training data
config_path = project_root / "tests" / "fixtures" / "configs" / "run_test_config.yaml"
config = RunConfig.from_yaml(str(config_path))

runner = BacktestRunner(config)
runner.initialize()

# Run on training period
print("Running backtest to generate training data...")
train_results = runner.run_backtest(period="train")

print("Backtest complete! Training data available.")
print(f"Results directory: {train_results.output_dir}")


## Step 2: Train ML Model

Extract features and labels, then train a model. Note: This is a simplified example with dummy data. In production, extract actual features from backtest results.


In [None]:
# Initialize trainer
trainer = MLTrainer(
    model_type="random_forest",  # or "xgboost", "lightgbm"
    target_variable="r_multiple",  # Predict R-multiples
    random_seed=42
)

# Create dummy training data (in practice, extract from backtest results)
print("Creating training data...")
n_samples = 1000
n_features = 20

X_train = np.random.randn(n_samples, n_features)
y_train = np.random.randn(n_samples)  # R-multiples

# Train model
print("Training model...")
model = trainer.train(X_train, y_train)

print(f"Model trained successfully!")
print(f"  Model type: {type(model).__name__}")
print(f"  Training samples: {n_samples}")

# Save model
model_dir = Path("models/ml_model_v1")
model_dir.mkdir(parents=True, exist_ok=True)
model.save(str(model_dir))
print(f"Model saved to: {model_dir}")


## Step 3: Configure ML in Strategy

To use ML in backtests, add ML configuration to your strategy config YAML:

```yaml
ml:
  enabled: true
  model_path: "models/ml_model_v1"
  prediction_mode: "score_enhancement"  # or "filter", "replace"
  ml_weight: 0.3  # For score_enhancement mode
  confidence_threshold: 0.5  # For filter mode
```


In [None]:
# Example: Load and use model for predictions
model_dir = Path("models/ml_model_v1")

if model_dir.exists():
    # Load model
    model = MLModel.load(str(model_dir))
    print(f"Model loaded from: {model_dir}")
    
    # Create feature engineer (must match training setup)
    feature_engineer = MLFeatureEngineer(
        include_technical_indicators=True,
        include_price_features=True,
        include_volume_features=True,
        normalize=True
    )
    
    # Create predictor
    predictor = MLPredictor(
        model=model,
        feature_engineer=feature_engineer,
        prediction_mode="score_enhancement",
        confidence_threshold=0.5
    )
    
    print(f"Predictor created with mode: {predictor.prediction_mode}")
else:
    print("Model directory not found. Please run training cells first.")


## Summary

This notebook demonstrated:
1. Training ML models on backtest data
2. Saving and loading models
3. Creating predictors for signal enhancement
4. Configuring ML in strategy configs

**Next Steps:**
- Extract actual features from backtest results
- Create labels from trade outcomes (R-multiples, win/loss)
- Train on train period, validate on validation period
- Test on holdout period

See `examples/ml_workflow.py` for a complete Python script example.
