# Neural Network Fitter for Alpha Strategy Optimization

This notebook documents the design and implementation of a neural network-based parameter optimization engine for trading strategies. The goal is to create a system that can automatically tune strategy parameters to optimize for PnL and Sharpe ratio.

# 1. Requirements Analysis and Input/Output Design

## Problem Statement
We need to create a neural network-based engine that can optimize the parameters of trading strategies by learning from historical data and backtest results. The initial use case is optimizing RSI strategy parameters, but the system should be flexible enough to handle other strategies.

## Input Requirements
1. Historical Data:
   - Multi-index DataFrame with (symbol, timestamp) index
   - OHLCV data for each symbol
   - Same format as required by existing AlphaEngine

2. Alpha Strategy:
   - The alpha strategy function (e.g., rsi_reversion)
   - Default parameters and their ranges
   - Parameter types and constraints

3. Initial Parameters:
   - For RSI strategy:
     - rsi_period: int [5-50]
     - overbought: float [50-90]
     - oversold: float [10-50]
     - smoothing: int [1-10]

4. Training Configuration:
   - Training/validation split
   - Optimization metrics weights (PnL vs Sharpe)
   - Training hyperparameters

## Output Requirements
1. Optimized Parameters:
   - Dictionary of optimized parameter values
   - Parameter history during optimization
   - Confidence metrics for each parameter

2. Performance Metrics:
   - Training history
   - Backtest results with optimized parameters
   - Comparison with baseline parameters

## Key Challenges
1. Non-differentiable backtesting process
2. Complex relationship between parameters and performance
3. Need for parameter constraints and validation
4. Balancing multiple optimization objectives (PnL, Sharpe)

# 2. Machine Learning Solution Design

## Approach Options

### Option 1: Direct Parameter Prediction
- Input: Market state features
- Output: Optimal parameters for next period
- Pros: Simple architecture, direct optimization
- Cons: May not capture temporal dependencies well

### Option 2: Reinforcement Learning
- State: Market conditions + current parameters
- Action: Parameter adjustments
- Reward: PnL and Sharpe ratio
- Pros: Natural fit for sequential decision making
- Cons: Complex training, stability issues

### Option 3: Hybrid Architecture (Recommended)
- Combines supervised learning with reinforcement learning
- Two-stage process:
  1. Parameter initialization network (supervised)
  2. Parameter refinement network (RL)
- Pros: Stable training, better exploration
- Cons: More complex implementation

## Input Feature Design

1. Market State Features:
   - Price momentum at multiple timeframes
   - Volatility metrics
   - Volume profiles
   - Market regime indicators

2. Parameter State Features:
   - Current parameter values
   - Parameter gradients from previous iterations
   - Performance metrics with current parameters

3. Performance Metrics:
   - Rolling PnL
   - Rolling Sharpe ratio
   - Rolling turnover
   - Maximum drawdown

## Output Structure

1. Parameter Updates:
   - Direct values for each parameter
   - Scaling factors to apply to current parameters
   - Confidence scores for each prediction

2. Auxiliary Outputs:
   - Expected performance metrics
   - Parameter stability indicators
   - Market regime classification

## Training Strategy

1. Initial Phase:
   - Supervised pretraining on historical data
   - Learn parameter mappings for different market regimes

2. Fine-tuning Phase:
   - Online learning with reinforcement
   - Adaptive parameter updates based on performance

3. Validation:
   - Out-of-sample backtesting
   - Parameter stability analysis
   - Performance attribution

# 3. Neural Network Architecture Design

## Network Components

### 1. Feature Extraction Module
```python
class FeatureExtractor(nn.Module):
    def __init__(self, input_dim):
        super().__init__()
        self.conv1d = nn.Conv1d(input_dim, 64, kernel_size=3)
        self.lstm = nn.LSTM(64, 128, num_layers=2, dropout=0.1)
        self.attention = MultiHeadAttention(128, num_heads=4)
```

- Input dimension: 5 (OHLCV) + derived features
- Convolutional layers for local pattern detection
- LSTM layers for temporal dependencies
- Multi-head attention for market regime focus

### 2. Parameter Prediction Module
```python
class ParameterPredictor(nn.Module):
    def __init__(self, hidden_dim):
        super().__init__()
        self.fc1 = nn.Linear(hidden_dim, 256)
        self.fc2 = nn.Linear(256, 128)
        self.param_heads = nn.ModuleDict({
            'rsi_period': nn.Linear(128, 1),
            'overbought': nn.Linear(128, 1),
            'oversold': nn.Linear(128, 1),
            'smoothing': nn.Linear(128, 1)
        })
```

- Separate prediction heads for each parameter
- Parameter-specific activation functions
- Range constraints via sigmoid/softplus

### 3. Performance Prediction Module
```python
class PerformancePredictor(nn.Module):
    def __init__(self, hidden_dim):
        super().__init__()
        self.fc1 = nn.Linear(hidden_dim, 128)
        self.fc2 = nn.Linear(128, 64)
        self.metrics = nn.Linear(64, 3)  # PnL, Sharpe, Turnover
```

- Predicts expected performance metrics
- Used for parameter validation
- Helps in early stopping

## Layer Sizes and Activation Functions

1. Feature Extraction:
   - Conv1D: in_channels=5, out_channels=64, kernel_size=3
   - LSTM: input_size=64, hidden_size=128, num_layers=2
   - Attention: embed_dim=128, num_heads=4

2. Parameter Prediction:
   - FC1: 128 → 256 (ReLU)
   - FC2: 256 → 128 (ReLU)
   - Parameter Heads: 128 → 1 (Custom activation)

3. Performance Prediction:
   - FC1: 128 → 128 (ReLU)
   - FC2: 128 → 64 (ReLU)
   - Metrics: 64 → 3 (Linear)

## Custom Components

### 1. Multi-Head Attention
```python
class MultiHeadAttention(nn.Module):
    def __init__(self, embed_dim, num_heads):
        super().__init__()
        self.mha = nn.MultiheadAttention(
            embed_dim, num_heads, dropout=0.1
        )
```

### 2. Parameter Constraints
```python
class ParameterConstraints(nn.Module):
    def __init__(self):
        super().__init__()
        self.constraints = {
            'rsi_period': (5, 50),
            'overbought': (50, 90),
            'oversold': (10, 50),
            'smoothing': (1, 10)
        }
```

### 3. Loss Components
```python
class HybridLoss(nn.Module):
    def __init__(self, pnl_weight=0.5, sharpe_weight=0.5):
        super().__init__()
        self.pnl_weight = pnl_weight
        self.sharpe_weight = sharpe_weight
```

## Integration Architecture
```python
class NeuralAlphaOptimizer(nn.Module):
    def __init__(self, input_dim):
        super().__init__()
        self.feature_extractor = FeatureExtractor(input_dim)
        self.param_predictor = ParameterPredictor(128)
        self.perf_predictor = PerformancePredictor(128)
        self.constraints = ParameterConstraints()
```

## Training Configuration

1. Optimizer:
   - Adam with learning rate scheduling
   - Initial lr: 0.001
   - Weight decay: 0.0001

2. Batch Processing:
   - Batch size: 32 days
   - Sequence length: 252 days
   - Sliding window with 126 day overlap

3. Loss Weights:
   - PnL component: 0.5
   - Sharpe ratio component: 0.5
   - Parameter stability regularization: 0.1

In [None]:
# 4. Engine Interface Design and Implementation

from dataclasses import dataclass
from typing import Optional, Callable, Any, Dict, List
import torch
import torch.nn as nn
import pandas as pd
from src.engines.alpha_engine import AlphaSignals
from src.engines.backtesting_engine import BacktestingEngine, SimulationResults

@dataclass
class OptimizationResults:
    """Container for optimization results"""
    optimized_parameters: Dict[str, Any]
    training_history: pd.DataFrame
    backtest_results: SimulationResults
    parameter_history: List[Dict[str, Any]]
    confidence_scores: Dict[str, float]
    
class NeuralNetworkFitter:
    """Engine for optimizing alpha strategy parameters using neural networks"""
    
    def __init__(
        self,
        model: nn.Module,
        backtesting_engine: BacktestingEngine,
        device: str = "cuda" if torch.cuda.is_available() else "cpu"
    ):
        self.model = model.to(device)
        self.device = device
        self.backtesting_engine = backtesting_engine
        self.optimizer = torch.optim.Adam(model.parameters())
        self._validation_window = 126  # ~6 months
        
    def optimize_parameters(
        self,
        historical_data: pd.DataFrame,
        alpha_function: Callable[..., pd.DataFrame],
        initial_parameters: Dict[str, Any],
        n_iterations: int = 100,
        learning_rate: float = 0.001,
        pnl_weight: float = 0.5,
        sharpe_weight: float = 0.5,
        early_stopping_patience: int = 10,
        show_progress: bool = True
    ) -> OptimizationResults:
        """
        Optimize strategy parameters using neural network predictions.
        
        Args:
            historical_data: Multi-index DataFrame with market data
            alpha_function: The alpha strategy to optimize
            initial_parameters: Starting parameter values
            n_iterations: Number of optimization iterations
            learning_rate: Learning rate for parameter updates
            pnl_weight: Weight for PnL in optimization objective
            sharpe_weight: Weight for Sharpe ratio in optimization
            early_stopping_patience: Iterations before early stopping
            show_progress: Whether to show progress bar
        
        Returns:
            OptimizationResults with optimized parameters and metrics
        """
        pass
    
    def _prepare_features(
        self, 
        historical_data: pd.DataFrame
    ) -> torch.Tensor:
        """Prepare input features for the neural network"""
        pass
    
    def _validate_parameters(
        self, 
        parameters: Dict[str, Any]
    ) -> Dict[str, Any]:
        """Ensure parameters meet constraints"""
        pass
    
    def _calculate_objective(
        self,
        backtest_results: SimulationResults,
        pnl_weight: float,
        sharpe_weight: float
    ) -> torch.Tensor:
        """Calculate optimization objective from backtest results"""
        pass
    
    def save_model(
        self,
        filepath: str
    ) -> None:
        """Save trained model and optimization state"""
        pass
    
    @classmethod
    def load_model(
        cls,
        filepath: str,
        backtesting_engine: BacktestingEngine
    ) -> 'NeuralNetworkFitter':
        """Load trained model and optimization state"""
        pass

# 5. Implementation Plan and Next Steps

## Development Phases

### Phase 1: Core Infrastructure
1. Set up project structure:
   - Create `engines/neural_network_fitter.py`
   - Add tests in `tests/engines/test_neural_network_fitter.py`
   - Create utility modules for feature extraction

2. Implement base classes:
   - NeuralNetworkFitter engine
   - Custom PyTorch modules
   - Data preprocessing utilities

3. Create validation framework:
   - Parameter validation
   - Performance metrics tracking
   - Early stopping logic

### Phase 2: Model Development
1. Implement neural network components:
   - Feature extraction module
   - Parameter prediction module
   - Performance prediction module

2. Create training pipeline:
   - Data loading and batching
   - Training loop
   - Validation cycle

3. Add optimization logic:
   - Parameter constraints
   - Gradient calculations
   - Update strategies

### Phase 3: Integration and Testing
1. Integrate with existing engines:
   - AlphaEngine integration
   - BacktestingEngine integration
   - Logging and monitoring

2. Implement persistence:
   - Model saving/loading
   - Parameter history tracking
   - Performance logging

3. Create examples and documentation:
   - Usage examples
   - Parameter tuning guide
   - Performance optimization tips

## Testing Strategy

1. Unit Tests:
   - Parameter validation
   - Feature extraction
   - Model components

2. Integration Tests:
   - End-to-end optimization
   - Backtest integration
   - Performance metrics

3. Performance Tests:
   - Training speed
   - Memory usage
   - Optimization convergence

## Documentation Requirements

1. Class Documentation:
   - Detailed docstrings
   - Usage examples
   - Parameter explanations

2. Architecture Documentation:
   - Component diagrams
   - Data flow descriptions
   - Integration points

3. User Guide:
   - Setup instructions
   - Optimization examples
   - Troubleshooting guide

## Timeline and Milestones

1. Week 1:
   - Core infrastructure setup
   - Basic model implementation
   - Initial testing framework

2. Week 2:
   - Neural network implementation
   - Training pipeline development
   - Integration with existing engines

3. Week 3:
   - Testing and validation
   - Documentation
   - Performance optimization

4. Week 4:
   - User acceptance testing
   - Performance tuning
   - Production deployment

# 6. Enhanced System Design

## Weights & Biases Integration

### W&B Configuration
```python
@dataclass
class WandBConfig:
    project_name: str
    entity: str
    tags: List[str]
    config: Dict[str, Any]
    
    def setup(self):
        wandb.init(
            project=self.project_name,
            entity=self.entity,
            tags=self.tags,
            config=self.config
        )
```

### Tracked Metrics
1. Training Metrics:
   - Loss components (PnL, Sharpe, Turnover)
   - Parameter gradients
   - Learning rate changes
   
2. Validation Metrics:
   - Backtest performance
   - Parameter stability
   - Market regime detection

3. Model Artifacts:
   - Best performing models
   - Parameter evolution
   - Training configs

## Market Configuration System

### Market Config Interface
```python
@dataclass
class MarketConfig:
    universe: str  # USTOP100, USTOP200, USTOP500, USTOP1000
    timeframe: str  # minute, hour, day
    start_date: datetime
    end_date: datetime
    metrics: List[str]  # PnL, Sharpe, Turnover
    constraints: Dict[str, Any]
```

### Universe-Specific Features
1. USTOP100:
   - Market cap weighted features
   - Sector correlation features
   - High liquidity metrics

2. USTOP500:
   - Cross-sector indicators
   - Market breadth features
   - Volume profile analysis

3. Timeframe Adaptations:
   - Minute: Microstructure features
   - Hour: Intraday patterns
   - Day: Overnight gaps, earnings impact

## Deployable Model Interface

### ModelInterface Class
```python
class AlphaModelInterface:
    """Interface for deploying optimized alpha models"""
    
    def __init__(
        self,
        model_path: str,
        market_config: MarketConfig,
        update_frequency: str = "daily"
    ):
        self.model = self._load_model(model_path)
        self.market_config = market_config
        self.update_frequency = update_frequency
        self._last_update = None
        
    def get_optimal_parameters(
        self, 
        current_market_state: pd.DataFrame
    ) -> Dict[str, Any]:
        """Get optimal parameters for current market state"""
        
    def update_model(
        self, 
        new_market_data: pd.DataFrame
    ) -> None:
        """Update model with new market data"""
        
    def validate_performance(
        self
    ) -> Dict[str, float]:
        """Validate model performance"""
```

### Deployment Pipeline
1. Model Export:
   ```python
   class ModelExporter:
       def export_for_production(
           self,
           model: NeuralNetworkFitter,
           market_config: MarketConfig
       ) -> AlphaModelInterface:
           """Export model for production deployment"""
   ```

2. Real-time Integration:
   ```python
   class AlphaStrategyRunner:
       def __init__(
           self,
           alpha_function: Callable,
           model_interface: AlphaModelInterface
       ):
           self.alpha = alpha_function
           self.model = model_interface
           
       def generate_signals(
           self,
           market_data: pd.DataFrame
       ) -> pd.DataFrame:
           """Generate signals with optimized parameters"""
   ```

## Neural Network Adaptations

### Market-Specific Architectures
1. Large Universe (USTOP1000):
   - Hierarchical attention networks
   - Sector-level feature aggregation
   - Sparse input processing

2. High-Frequency (Minute):
   - Temporal convolutional networks
   - Real-time feature processing
   - Adaptive parameter updates

3. Market Regime Components:
   - Regime detection subnet
   - Parameter adaptation layers
   - Cross-asset attention mechanism

### Enhanced Feature Processing
```python
class MarketAwareFeatureExtractor(nn.Module):
    def __init__(self, market_config: MarketConfig):
        super().__init__()
        self.market_config = market_config
        self.features = self._build_feature_layers()
        
    def _build_feature_layers(self) -> nn.ModuleDict:
        """Build market-specific feature layers"""
        if self.market_config.universe == "USTOP100":
            return self._build_concentrated_market_layers()
        elif self.market_config.universe == "USTOP1000":
            return self._build_broad_market_layers()
```