In [1]:
import random
import numpy as np
import torch
import tensorflow as tf
import os

# utility functions
from utils import *

# Models
from models.chromosome_model import create_ga_chromosome_metrics
from models.cnn_model import *
from models.lstm_model import *
from models.transformer_model import *
from models.mlp_model import *

# Scripts
from scripts.train_ga import *
from scripts.train_mlp import *
from scripts.train_cnn import *
from scripts.train_lstm import *
from scripts.train_transformer import *
from scripts.ensemble import *

# Set seeds
seed = 42
random.seed(seed)
np.random.seed(seed)
tf.random.set_seed(seed)
torch.manual_seed(seed)
if torch.cuda.is_available():
    torch.cuda.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)

# Ensure deterministic behavior in PyTorch
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False

# Set environment seed (optional)
os.environ['PYTHONHASHSEED'] = str(seed)



# Ensemble Trading Strategy Pipeline

## Table of Contents

1. [Introduction](#Introduction)
2. [Data Preparation](#Data-Preparation)
3. [Genetic Algorithm Best Chromosome](#Genetic-Algorithm-Best-Chromosome)
4. [Model Loading and Predictions](#Model-Loading-and-Predictions)
   - 4.1 [MLP Model](#MLP-Model)
   - 4.2 [CNN Model](#CNN-Model)
   - 4.3 [LSTM Model](#LSTM-Model)
   - 4.4 [Transformer Model](#Transformer-Model)
5. [Ensemble Voting Strategies](#Ensemble-Voting-Strategies)
   - 5.1 [Majority Voting](#Majority-Voting)
   - 5.2 [Probabilistic Voting](#Probabilistic-Voting)
6. [Backtesting and Evaluation](#Backtesting-and-Evaluation)
   - 6.1 [Performance Metrics](#Performance-Metrics)
   - 6.2 [Advanced Trading Metrics](#Advanced-Trading-Metrics)
7. [Visualization](#Visualization)
   - 7.1 [Equity Curve](#Equity-Curve)
   - 7.2 [Prediction Signals](#Prediction-Signals)
   - 7.3 [Confusion Matrix](#Confusion-Matrix)
8. [Conclusions and Next Steps](#Conclusions-and-Next-Steps)


## Introduction

This notebook demonstrates a comprehensive ensemble trading strategy pipeline integrating:

- Genetic Algorithm optimization for technical indicator parameters
- Machine learning models including MLP, CNN, LSTM, and Transformer architectures
- Ensemble voting methods, both majority and probabilistic
- Advanced backtesting and evaluation metrics for trading performance

The goal of this pipeline is to combine multiple predictive models and optimized technical signals to improve trading decision robustness, profitability, and risk-adjusted returns. The models are trained and tuned on historical financial data, and their outputs are evaluated through systematic backtesting with both static and dynamic position sizing strategies.

#### Key Features

- Genetic Algorithm for RSI parameter tuning  
- MLP, CNN, LSTM, and Transformer models with hyperparameter tuning  
- Majority voting and probabilistic ensemble integration  
- Backtesting with advanced trading performance metrics (Sharpe, Sortino, Calmar, Omega, Gain to Pain, etc.)  
- Modular, reproducible pipeline for systematic trading research

---



## Data Preparation

In this section, we load historical financial data for the selected ticker, apply technical indicators, and prepare the dataset for model input.

#### Steps:

1. Download historical price data using the Yahoo Finance API wrapper.
2. Calculate technical indicators:
   - Relative Strength Index (RSI) over multiple intervals
   - Simple Moving Averages (SMA) for trend identification
3. Define trend labels based on SMA crossovers to classify uptrends and downtrends.

The prepared dataset will be used for generating features and labels required by each machine learning model in the pipeline. Here it is possible to change the ticker to another of your choosing, in this notebook we will be using the Tesla stock ticker.

Other examples could be:
- ticker = "BTC-USD"  # Bitcoin
- ticker = "GLD"  # SPDR Gold Shares
- ticker = "XLF"  # Financial sector ETF
- ticker = "TSLA"  # Tesla, for a high-volatility equity


In [2]:
ticker = "TSLA"
start_date = "1997-01-01"
end_date = "2017-01-01"
df = download_stock_data(ticker, start_date, end_date)
df = add_technical_indicators(df, ticker)

[*********************100%***********************]  1 of 1 completed

Data saved: TSLA_technical_indicators.csv





## Genetic Algorithm Best Chromosome

In this section, we load the optimized RSI and interval parameters generated by the Genetic Algorithm (GA) tuning process.

### Purpose

The GA searches for the combination of RSI thresholds and interval lengths that maximize a predefined *fitness function* related to trading performance. These optimized parameters are critical for generating features and labels for model training and evaluation.

### Optimization Problem

The GA seeks to maximize the *fitness function* defined as:

$$
\text{Fitness} = \frac{R_a}{|\text{Max Drawdown}|}
$$

where:

- $ R_a $ is the *annualized return*, calculated as:

$$
R_a = \left( \frac{P_{end}}{P_{start}} \right)^{\frac{252}{N}} - 1
$$

Here:
  - $ P_{end} $ = portfolio value at end
  - $ P_{start} $ = portfolio value at start
  - $ N $ = number of trading days

- *Max Drawdown* is the largest observed loss from a peak to a trough before a new peak is reached, calculated as:

$$
\text{Max Drawdown} = \min_t \left( \frac{P_t - \max_{i \leq t} P_i}{\max_{i \leq t} P_i} \right)
$$

---

### Chromosome Representation

Each *chromosome* in the GA population encodes a set of RSI-based trading parameters:

- Downtrend buy value and interval
- Downtrend sell value and interval
- Uptrend buy value and interval
- Uptrend sell value and interval

### Steps

1. *Load GA results* from the saved CSV file containing the final tuned chromosomes.
2. *Extract the best chromosome* based on the highest fitness score.
3. *Prepare these optimized parameters* for feature generation and labeling in the machine learning pipeline.

These parameters ensure that models are trained using historically optimal RSI thresholds and intervals, enhancing trading signal relevance.


In [3]:
create_ga_chromosome_metrics(ticker)


=== Genetic Algorithm Chromosome Performance Comparison ===
 down_buy_val  down_buy_int  down_sell_val  down_sell_int  up_buy_val  up_buy_int  up_sell_val  up_sell_int  total_return  annualized_return  sharpe_ratio  sortino_ratio  max_drawdown  fitness
    27.379938             5      85.954267             12   12.812376           8    83.684482            7      1.127715           0.123018      0.554271       0.449230     -0.425369 0.289203
    25.667238             6      61.042903             11   13.143131           5    79.643577           18      3.381593           0.254851      0.935329       0.826542     -0.406097 0.627563
    12.715422            13      88.330066              5   31.558258          10    84.434879           15      3.359156           0.253862      0.752369       0.937224     -0.458625 0.553528
    14.725497            11      93.502458             15    8.577360          17    63.385073           16      1.415102           0.145094      0.882258       0.5757

### Genetic Algorithm Chromosome Performance Interpretation

The table below summarizes the performance of multiple chromosomes, each representing a set of RSI thresholds and intervals optimized by the Genetic Algorithm (GA).

| Metric | Explanation |
|--------|-------------|
| down_buy_val / down_buy_int | RSI buy threshold and interval during downtrend |
| down_sell_val / down_sell_int | RSI sell threshold and interval during downtrend |
| up_buy_val / up_buy_int | RSI buy threshold and interval during uptrend |
| up_sell_val / up_sell_int | RSI sell threshold and interval during uptrend |
| total_return | Overall return factor (e.g. 1.12 = +12%) |
| annualized_return | Annualized return scaled to a yearly basis |
| sharpe_ratio | Risk-adjusted return accounting for standard deviation |
| sortino_ratio | Risk-adjusted return penalizing only downside volatility |
| max_drawdown | Largest peak-to-trough decline observed |
| fitness | Defined as annualized return divided by absolute max drawdown |

#### Key insights

- Highest fitness:  
  - Chromosome index 4 with fitness = 0.8909, annualized return = 14.5%, and very low max drawdown = –16.3%.  
  - This represents the best risk-adjusted performance under the defined fitness metric.

- Highest total return:  
  - Chromosome index 2 with total return = 3.38x (238% net gain), annualized return = 25.5%, and max drawdown = –40.6%.  
  - While it achieves the highest returns, the drawdown is significantly higher, reducing its fitness relative to index 4.

#### Strategic takeaway

The optimal chromosome selection depends on trading objectives:

- If prioritizing maximum absolute returns, chromosome index 2 is preferable despite higher drawdown.
- If prioritizing risk-adjusted returns with controlled drawdown, chromosome index 4 is optimal under the GA fitness definition.

These optimized parameters will guide feature generation and labeling for all subsequent machine learning models in the pipeline.


## Alternative Fitness Functions for Genetic Algorithm Optimization

Below are alternative fitness functions that have been implemented in this code that be used to optimize trading strategy performance, each with its mathematical definition and interpretation.

---

### Sharpe Ratio

Measures *risk-adjusted return*, penalizing both upside and downside volatility.

$$
\text{Sharpe Ratio} = \frac{R_p - R_f}{\sigma_p}
$$

- $ R_p $: portfolio return  
- $ R_f $: risk-free rate  
- $ \sigma_p $: standard deviation of portfolio returns

### Sortino Ratio

Adjusts for *downside risk only*, ignoring upside volatility.

$$
\text{Sortino Ratio} = \frac{R_p - R_f}{\sigma_D}
$$

- $ \sigma_D $: standard deviation of negative returns (downside deviation)

---

### Strategic Note

Selecting a fitness function depends on your *trading objectives* (max return, max risk-adjusted return, drawdown minimization) and your *risk tolerance* for the strategy.

Consider testing multiple objectives to identify which yields the most robust out-of-sample performance.


In [4]:
genetic_algorithm(ticker, override_selection_metric='sharpe_ratio')

Generation 1: Best Fitness = 1.0541
Generation 2: Best Fitness = 1.0541
  No improvement. Stagnation count: 1/10
Generation 3: Best Fitness = 1.1009
Generation 4: Best Fitness = 1.1853
Generation 5: Best Fitness = 1.1853
  No improvement. Stagnation count: 1/10
Generation 6: Best Fitness = 1.1853
  No improvement. Stagnation count: 2/10
Generation 7: Best Fitness = 1.4514
Generation 8: Best Fitness = 1.4514
  No improvement. Stagnation count: 1/10
Generation 9: Best Fitness = 1.4514
  No improvement. Stagnation count: 2/10
Generation 10: Best Fitness = 1.4514
  No improvement. Stagnation count: 3/10
Generation 11: Best Fitness = 1.5112
Generation 12: Best Fitness = 1.5112
  No improvement. Stagnation count: 1/10
Generation 13: Best Fitness = 1.5112
  No improvement. Stagnation count: 2/10
Generation 14: Best Fitness = 1.5112
  No improvement. Stagnation count: 3/10
Generation 15: Best Fitness = 1.5112
  No improvement. Stagnation count: 4/10
Generation 16: Best Fitness = 1.5112
  No im

## Model Loading and Predictions

In this section, we load each trained machine learning model and generate their predictions on the prepared labeled dataset.

### Purpose

The ensemble pipeline integrates predictions from multiple models to enhance trading signal robustness. Each model was trained using the optimized technical indicator parameters obtained from the Genetic Algorithm and tuned for hyperparameter configurations.

### Models included:

1. *MLP (Multi-Layer Perceptron)*
2. *CNN (Convolutional Neural Network)*
3. *LSTM (Long Short-Term Memory Network)*
4. *Transformer Model*

### Steps:

1. *Load trained model weights* from saved `.pth` files.
2. *Prepare model input data* with correct feature structure.
3. *Generate predictions* using each model in evaluation mode.
4. *Store predictions for ensemble integration.*

The outputs from this section will be used to construct ensemble voting strategies in subsequent analyses.

---

### MLP Model

In this subsection, we load the trained Multi-Layer Perceptron (MLP) model and generate its predictions on the prepared labeled dataset.

#### Model overview

The *MLP model* is a fully connected feedforward neural network trained to classify trading actions (Buy, Sell, Hold) based on RSI features and trend information.

#### Mathematical formulation

An MLP with $ L $ layers can be formulated as:


\begin{aligned}
& h^{(0)} = x \\
& h^{(l)} = \sigma(W^{(l)} h^{(l-1)} + b^{(l)}), \quad l = 1, \ldots, L-1 \\
& \hat{y} = \text{softmax}(W^{(L)} h^{(L-1)} + b^{(L)})
\end{aligned}


where:

- $ x $ is the input feature vector (e.g. RSI value, interval, trend).  
- $ W^{(l)} $ and $ b^{(l)} $ are the weights and biases of layer $ l $.  
- $ \sigma $ is the activation function (e.g. ReLU).  
- softmax output yields class probabilities for Buy, Sell, Hold.

#### Steps:

1. Load model architecture with the tuned hidden layer configuration.
2. Load trained weights from the saved `.pth` file.
3. Prepare input tensor with required feature columns.
4. Generate predictions and extract class labels for ensemble integration.


In [5]:
run_tuning_mlp(ticker)
best_mlp(ticker)


Testing LR=0.001, Batch Size=64, Hidden Sizes=[20, 10, 8, 6, 5]
Epoch [1/100], Loss: 1.0544
Epoch [20/100], Loss: 0.2437
Epoch [40/100], Loss: 0.0776
Epoch [60/100], Loss: 0.0599
Epoch [80/100], Loss: 0.0452
Epoch [100/100], Loss: 0.0409

Testing LR=0.001, Batch Size=64, Hidden Sizes=[30, 20, 10]
Epoch [1/100], Loss: 1.4262
Epoch [20/100], Loss: 0.2561
Epoch [40/100], Loss: 0.1050
Epoch [60/100], Loss: 0.0790
Epoch [80/100], Loss: 0.0560
Epoch [100/100], Loss: 0.0433

Testing LR=0.001, Batch Size=64, Hidden Sizes=[40, 20]
Epoch [1/100], Loss: 1.1517
Epoch [20/100], Loss: 0.3399
Epoch [40/100], Loss: 0.1613
Epoch [60/100], Loss: 0.1113
Epoch [80/100], Loss: 0.0829
Epoch [100/100], Loss: 0.0706

Testing LR=0.001, Batch Size=128, Hidden Sizes=[20, 10, 8, 6, 5]
Epoch [1/100], Loss: 1.1170
Epoch [20/100], Loss: 0.4482
Epoch [40/100], Loss: 0.1892
Epoch [60/100], Loss: 0.0929
Epoch [80/100], Loss: 0.0650
Epoch [100/100], Loss: 0.0568

Testing LR=0.001, Batch Size=128, Hidden Sizes=[30, 20, 

### CNN Model

In this subsection, we load the trained Convolutional Neural Network (CNN) model and generate its predictions on the prepared labeled dataset.

#### Model overview

The CNN model captures local temporal patterns in RSI and trend features by applying convolutional filters over input sequences, learning hierarchical representations relevant for trading signal classification.

#### Mathematical formulation

For a 1D CNN layer:

$$
h^{(l)}_i = \sigma \left( \sum_{k=0}^{K-1} W_k^{(l)} x_{i+k} + b^{(l)} \right)
$$

where:

- $ x $ is the input sequence.  
- $ W_k^{(l)} $ are the convolutional kernel weights of size $ K $.  
- $ b^{(l)} $ is the bias term.  
- $ \sigma $ is the activation function (e.g. ReLU).  
- $ h^{(l)}_i $ is the output feature map at position $ i $ in layer $ l $.

The final output is passed through fully connected layers and a softmax function to produce class probabilities for Buy, Sell, Hold.

#### Steps:

1. Load model architecture with tuned kernel size and hidden channels.  
2. Load trained weights from the saved `.pth` file.  
3. Prepare input tensors as rolling window sequences.  
4. Generate predictions and extract class labels for ensemble integration.


In [None]:
tune_cnn_hyperparameters(ticker)
best_cnn(ticker)

Testing Kernel=2, Hidden Channels=8, LR=0.001, Batch Size=32
Epoch [10/50], Loss: 0.7898
Epoch [20/50], Loss: 0.7296
Epoch [30/50], Loss: 0.7005
Epoch [40/50], Loss: 0.6960
Epoch [50/50], Loss: 0.6794
Testing Kernel=2, Hidden Channels=8, LR=0.001, Batch Size=64
Epoch [10/50], Loss: 0.7340
Epoch [20/50], Loss: 0.6976
Epoch [30/50], Loss: 0.6731
Epoch [40/50], Loss: 0.6537
Epoch [50/50], Loss: 0.6371
Testing Kernel=2, Hidden Channels=8, LR=0.0005, Batch Size=32
Epoch [10/50], Loss: 0.7685
Epoch [20/50], Loss: 0.7229
Epoch [30/50], Loss: 0.7031
Epoch [40/50], Loss: 0.6745
Epoch [50/50], Loss: 0.6743
Testing Kernel=2, Hidden Channels=8, LR=0.0005, Batch Size=64
Epoch [10/50], Loss: 0.7433
Epoch [20/50], Loss: 0.7073
Epoch [30/50], Loss: 0.6831
Epoch [40/50], Loss: 0.6731
Epoch [50/50], Loss: 0.6670
Testing Kernel=2, Hidden Channels=16, LR=0.001, Batch Size=32
Epoch [10/50], Loss: 0.7530
Epoch [20/50], Loss: 0.6992
Epoch [30/50], Loss: 0.6800
Epoch [40/50], Loss: 0.6531
Epoch [50/50], Loss:

CNNClassifier(
  (conv1): Conv1d(3, 32, kernel_size=(5,), stride=(1,))
  (conv2): Conv1d(32, 64, kernel_size=(5,), stride=(1,))
  (global_avg_pool): AdaptiveAvgPool1d(output_size=1)
  (fc1): Linear(in_features=64, out_features=32, bias=True)
  (fc2): Linear(in_features=32, out_features=3, bias=True)
)

### 4.3 LSTM Model

In this subsection, we load the trained Long Short-Term Memory (LSTM) model and generate its predictions on the prepared labeled dataset.

#### Model overview

The LSTM model is a recurrent neural network architecture designed to capture sequential dependencies and temporal patterns in time series data, addressing the vanishing gradient problem present in traditional RNNs.

#### Mathematical formulation

An LSTM cell operates with the following equations at each time step $ t $:

\begin{aligned}
& f_t = \sigma(W_f x_t + U_f h_{t-1} + b_f) \\
& i_t = \sigma(W_i x_t + U_i h_{t-1} + b_i) \\
& o_t = \sigma(W_o x_t + U_o h_{t-1} + b_o) \\
& \tilde{c}_t = \tanh(W_c x_t + U_c h_{t-1} + b_c) \\
& c_t = f_t \odot c_{t-1} + i_t \odot \tilde{c}_t \\
& h_t = o_t \odot \tanh(c_t)
\end{aligned}

where:

- $ x_t $ is the input vector at time $ t $.  
- $ h_{t-1} $ is the previous hidden state.  
- $ c_t $ is the cell state.  
- $ f_t, i_t, o_t $ are the forget, input, and output gates, respectively.  
- $ \tilde{c}_t $ is the candidate cell state.  
- $ W $ and $ U $ are weight matrices, $ b $ are biases.  
- $ \sigma $ denotes the sigmoid activation function, and $ \odot $ denotes element-wise multiplication.

#### Steps:

1. Load model architecture with tuned hidden size and number of layers.  
2. Load trained weights from the saved `.pth` file.  
3. Prepare input tensors as sequential rolling windows.  
4. Generate predictions and extract class labels for ensemble integration.


In [None]:
tune_lstm_hyperparameters(ticker)
best_lstm(ticker)

Testing Hidden Size=16, Num Layers=1, LR=0.001, Batch Size=32
Epoch [10/50], Loss: 0.6334
Epoch [20/50], Loss: 0.5853
Epoch [30/50], Loss: 0.5620
Epoch [40/50], Loss: 0.5547
Epoch [50/50], Loss: 0.5505
Testing Hidden Size=16, Num Layers=1, LR=0.001, Batch Size=64
Epoch [10/50], Loss: 0.6731
Epoch [20/50], Loss: 0.5902
Epoch [30/50], Loss: 0.5662
Epoch [40/50], Loss: 0.5555
Epoch [50/50], Loss: 0.5479
Testing Hidden Size=16, Num Layers=1, LR=0.0005, Batch Size=32
Epoch [10/50], Loss: 0.7553
Epoch [20/50], Loss: 0.6273
Epoch [30/50], Loss: 0.5897
Epoch [40/50], Loss: 0.5718
Epoch [50/50], Loss: 0.5601
Testing Hidden Size=16, Num Layers=1, LR=0.0005, Batch Size=64
Epoch [10/50], Loss: 0.8111
Epoch [20/50], Loss: 0.6292
Epoch [30/50], Loss: 0.5901
Epoch [40/50], Loss: 0.5791
Epoch [50/50], Loss: 0.5753
Testing Hidden Size=16, Num Layers=2, LR=0.001, Batch Size=32
Epoch [10/50], Loss: 0.5952
Epoch [20/50], Loss: 0.5630
Epoch [30/50], Loss: 0.5511
Epoch [40/50], Loss: 0.5377
Epoch [50/50], L

LSTMClassifier(
  (lstm): LSTM(3, 64, num_layers=2, batch_first=True)
  (fc): Linear(in_features=64, out_features=3, bias=True)
)

### Transformer Model

In this subsection, we load the trained Transformer model and generate its predictions on the prepared labeled dataset.

#### Model overview

The Transformer model uses self-attention mechanisms to capture dependencies across the entire input sequence, enabling parallel computation and effective long-range pattern recognition in time series data.

#### Mathematical formulation

The Scaled Dot-Product Attention mechanism in the Transformer is defined as:

$$
\text{Attention}(Q, K, V) = \text{softmax} \left( \frac{QK^T}{\sqrt{d_k}} \right) V
$$

where:

- $ Q $ = Query matrix  
- $ K $ = Key matrix  
- $ V $ = Value matrix  
- $ d_k $ = dimension of the key vectors

---

The Multi-Head Attention combines multiple attention heads:

$$
\text{MultiHead}(Q, K, V) = \text{Concat}(\text{head}_1, \ldots, \text{head}_h) W^O
$$

where each head is computed as:

$$
\text{head}_i = \text{Attention}(Q W_i^Q, K W_i^K, V W_i^V)
$$

---

#### Steps:

1. Load model architecture with tuned dimensions, number of heads, and layers.  
2. Load trained weights from the saved `.pth` file.  
3. Prepare input tensors as sequential feature windows.  
4. Generate predictions and extract class labels for ensemble integration.


In [None]:
tune_transformer(ticker)
best_transformer(ticker)


Training Transformer: dim_model=64, num_heads=4, num_layers=1, lr=0.001, batch_size=64
Epoch [1/30], Loss: 1.0001
Epoch [10/30], Loss: 0.6452
Epoch [20/30], Loss: 0.5743
Epoch [30/30], Loss: 0.6010

Training Transformer: dim_model=64, num_heads=4, num_layers=2, lr=0.001, batch_size=64
Epoch [1/30], Loss: 1.0637
Epoch [10/30], Loss: 0.7357
Epoch [20/30], Loss: 0.5832
Epoch [30/30], Loss: 0.5632

Training Transformer: dim_model=64, num_heads=8, num_layers=1, lr=0.001, batch_size=64
Epoch [1/30], Loss: 1.0220
Epoch [10/30], Loss: 0.7138
Epoch [20/30], Loss: 0.6098
Epoch [30/30], Loss: 0.5601

Training Transformer: dim_model=64, num_heads=8, num_layers=2, lr=0.001, batch_size=64
Epoch [1/30], Loss: 1.1155
Epoch [10/30], Loss: 0.7246
Epoch [20/30], Loss: 0.5988
Epoch [30/30], Loss: 0.6063

Training Transformer: dim_model=128, num_heads=4, num_layers=1, lr=0.001, batch_size=64
Epoch [1/30], Loss: 1.0381
Epoch [10/30], Loss: 0.7545
Epoch [20/30], Loss: 0.6222
Epoch [30/30], Loss: 0.6186

Tra

### Ensemble Model Comparison and Evaluation

In this section, we compare the performance of different *ensemble strategies* tested on our true out-of-sample dataset. Rather than evaluating each model individually, we focus exclusively on ensemble approaches that combine the predictive power of MLP, CNN, LSTM, and Transformer models.

#### Ensemble Methods Evaluated

1. Weighted Majority Voting Ensemble

   Combines model predictions using fixed assigned weights to each model’s vote.

   Formula:

   $$
   \text{Final Decision} = \arg \max_{c} \sum_{m=1}^{M} w_m \cdot \mathbb{1}_{\{p_m = c\}}
   $$

   where:
   - $ w_m $ is the weight of model $ m $
   - $ p_m $ is its predicted class
   - $ \mathbb{1}_{\{p_m = c\}} $ is an indicator if model $ m $ predicts class $ c $.

2. Probabilistic Voting Ensemble (Unconstrained)

   Averages the softmax probabilities across all models to derive the final prediction.

   Formula:

   $$
   \text{Final Decision} = \arg \max_{c} \left( \frac{1}{M} \sum_{m=1}^{M} P_m(c) \right)
   $$

   where $ P_m(c) $ is the predicted probability for class $ c $ from model $ m $.

3. Probabilistic Voting Ensemble (Constrained)

   Same as above, but executed with trading constraints such as:

   - Maximum position size
   - Stop-loss thresholds
   - Take-profit thresholds

   This reflects more realistic portfolio constraints.

4. Probabilistic Voting Ensemble with Kelly Criterion Sizing

   Uses averaged probabilistic predictions for signals, and dynamically sizes each trade based on the Kelly criterion calculated from historical win/loss ratios.

   Formula:

   $$
   f^* = \frac{bp - q}{b}
   $$

   where:
   - $ f^* $: optimal fraction of capital to bet
   - $ b $: odds received (average win / average loss)
   - $ p $: probability of winning
   - $ q = 1 - p $.

#### Evaluation Metrics

For each ensemble method, we report:

- Total Return
- Annualized Return
- Sharpe Ratio
- Sortino Ratio
- Maximum Drawdown
- Win Rate
- Profit Factor


In [25]:
%%capture
results_weighted_majority, portfolio_weighted_majority, trade_returns_weighted_majority, ensemble_predictions_weighted_majority, df_eval_weighted_majority = run_ensemble_backtest(ticker)
results_kelly, portfolio_kelly, trade_returns_kelly, ensemble_predictions_kelly, df_eval_kelly = run_probabilistic_ensemble_backtest_with_kelly(ticker)
results_prob_voting_uncon, portfolio_prob_voting_uncon, trade_returns_prob_voting_uncon, ensemble_predictions_prob_voting_uncon, df_eval_prob_voting_uncon = run_probabilistic_ensemble_backtest(ticker, position_size=1, stop_loss=1, take_profit=1)
results_prob_voting_con_min_pos, portfolio_prob_voting_con_min_pos, trade_returns_prob_voting_con_min_pos, ensemble_predictions_prob_voting_con_min_pos, df_eval_prob_voting_con_min_pos = run_probabilistic_ensemble_backtest(ticker, position_size=1)
results_prob_voting_tot_con, portfolio_prob_voting_tot_con, trade_returns_prob_voting_tot_con, ensemble_predictions_prob_voting_tot_con, df_eval_prob_voting_tot_con = run_probabilistic_ensemble_backtest(ticker)

In [28]:
metrics_weighted_majority = calculate_additional_metrics(portfolio_weighted_majority, trade_returns_weighted_majority)
metrics_kelly = calculate_additional_metrics(portfolio_kelly, trade_returns_kelly)
metrics_prob_voting_uncon = calculate_additional_metrics(portfolio_prob_voting_uncon, trade_returns_prob_voting_uncon)
metrics_prob_voting_con_min_pos = calculate_additional_metrics(portfolio_prob_voting_con_min_pos, trade_returns_prob_voting_con_min_pos)
metrics_prob_voting_tot_con = calculate_additional_metrics(portfolio_prob_voting_tot_con, trade_returns_prob_voting_tot_con)

combined_metrics_weighted_majority = {**results_weighted_majority, **metrics_weighted_majority}
combined_metrics_kelly = {**results_kelly, **metrics_kelly}
combined_metrics_prob_voting_uncon = {**results_prob_voting_uncon, **metrics_prob_voting_uncon}
combined_metrics_prob_voting_con_min_pos = {**results_prob_voting_con_min_pos, **metrics_prob_voting_con_min_pos}
combined_metrics_prob_voting_tot_con = {**results_prob_voting_tot_con, **metrics_prob_voting_tot_con}

table = ensemble_comparison_summary(combined_metrics_weighted_majority, combined_metrics_kelly, combined_metrics_prob_voting_uncon, combined_metrics_prob_voting_con_min_pos, combined_metrics_prob_voting_tot_con)


=== Ensemble Model Comparison Summary ===
                            Ensemble Method  Total Return  Annualized Return  Sharpe Ratio  Sortino Ratio  Max Drawdown  Win Rate  Profit Factor
                   Weighted Majority Voting        2.5765             0.1659        0.5823         0.4820       -0.5828    0.6867         1.7498
       Probabilistic Voting (Unconstrained)        1.1019             0.0936        0.4226         0.3604       -0.5484    0.6625         1.4621
Probabilistic Voting (Constrained, Min Pos)        0.6302             0.0606        0.3472         0.2851       -0.5484    0.4493         1.1799
 Probabilistic Voting (Totally Constrained)        0.2870             0.0309        0.3500         0.2881       -0.1682    0.4493         1.1799
        Probabilistic Voting + Kelly Sizing       -0.0225            -0.0027       -0.3128        -0.0287       -0.0348    0.3333         0.2599


### 3. Calmar Ratio

Compares *annualized return to maximum drawdown* to assess return-risk tradeoff.

$$
\text{Calmar Ratio} = \frac{R_a}{|\text{Max Drawdown}|}
$$

- $ R_a $: annualized return

---

### 4. Omega Ratio

Considers the *entire distribution of returns relative to a target threshold*.

$$
\text{Omega Ratio} = \frac{\int_{r_T}^{\infty} [1 - F(r)] \, dr}{\int_{-\infty}^{r_T} F(r) \, dr}
$$

- $ r_T $: target return (e.g. 0)  
- $ F(r) $: cumulative distribution function of returns

---

### 5. Gain to Pain Ratio

Measures *total net gains relative to total absolute losses*.

$$
\text{Gain to Pain} = \frac{\sum \text{Returns}}{\sum |\text{Losses}|}
$$

---

### 6. Profit Factor

Calculates *gross profit divided by gross loss*.

$$
\text{Profit Factor} = \frac{\text{Gross Profit}}{|\text{Gross Loss}|}
$$

---

### 7. Expectancy per Trade

Indicates *average expected profit per trade* accounting for win/loss probabilities.

$$
\text{Expectancy} = (\text{Win Rate} \times \text{Avg Win}) - (\text{Loss Rate} \times \text{Avg Loss})
$$

---

### 8. CAGR to Max Drawdown Ratio

Compares *compound annual growth rate to maximum drawdown*.

$$
\text{CAGR/MDD} = \frac{\text{CAGR}}{|\text{Max Drawdown}|}
$$

- *CAGR* is calculated as:

$$
\text{CAGR} = \left( \frac{P_{end}}{P_{start}} \right)^{\frac{1}{n}} - 1
$$

where:
  - $ P_{end} $: ending portfolio value  
  - $ P_{start} $: starting portfolio value  
  - $ n $: number of years