<div style="display: flex; align-items: center; width: 100%;">  
  <div style="display: flex; flex-direction: column; align-items: center; justify-content: center; width: 100px; margin-right: 0px;">    
    <a href="https://risklab.ai" style="border: 0; line-height: 0.5;">
      <img src="../../utils/risklab_ai.gif" width="60px" style="border: 0; margin-bottom:-10px; vertical-align: middle;"/>
    </a>
  </div>  
  <div style="flex-grow: 1;">
    <h1 style="margin: 0; margin-left:0; font-weight: bold; text-align: left; font-size: 38px;">
      Backtest Overfitting Simulation
    </h1>
  </div>  
</div>

## 0. Setup and Imports



In [1]:
import numpy as np
import pandas as pd
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
from xgboost import XGBClassifier
from joblib import Parallel, delayed
from joblib_progress import joblib_progress

from RiskLabAI.data.synthetic_data import drift_volatility_burst, parallel_generate_prices
from RiskLabAI.backtest import (
    overall_backtest_overfitting_simulation,
    temporal_backtest_overfitting_simulation,
    overall_novel_methods_backtest_overfitting_simulation,
    measure_all_cv_computational_requirements,
    measure_cpcv_parallelization,
    measure_cpcv_scalability,
    get_cpu_info,
    format_cpu_info
)
from RiskLabAI.backtest.validation import CrossValidatorController

## 1. Simulation Parameters
Here we define the simulation parameters such as the number of jobs (`N_JOBS`), paths (`N_PATHS`), and the total time (`TOTAL_TIME`). These parameters will be used to create a synthetic controlled environment that simulates market conditions, allowing us to evaluate our cross-validation techniques effectively.


In [2]:
N_JOBS = 24
N_PATHS = 1000
TOTAL_TIME = 40
N_STEPS = int(252 * TOTAL_TIME)
RISK_FREE_RATE = 0.05
STEP_RISK_FREE_RATE = np.log(1 + RISK_FREE_RATE) / N_STEPS * TOTAL_TIME
RANDOM_STATE = 0
OVERFITTING_PARTITIONS_LENGTH = 252

## 3. Market Regime Parameters and Custom Pipeline
We define parameters for market regimes, using a `drift_volatility_burst` function to simulate market conditions, including calm, volatile, and speculative bubble regimes. These regimes are characterized by specific Heston model parameters such as mean return (`mu`), rate at which variance reverts to theta (`kappa`), long-run average price variance (`theta`), and others.


In [3]:
x = 0.35

bubble_drifts, bubble_volatilities = drift_volatility_burst(
    bubble_length=5 * 252, 
    a_before=x, 
    a_after=-x, 
    b_before=0.6 * x, 
    b_after=0.6 * x, 
    alpha=0.75, 
    beta=0.45,
    explosion_filter_width=0.1
)
# Dictionary of Heston parameters for different market regimes
regimes = {
    'calm': {
        'mu': 0.1,
        'kappa': 3.98,
        'theta': 0.029,
        'xi': 0.389645311,
        'rho': -0.7,
        'lam': 121,
        'm': -0.000709,
        'v': 0.0119
    },
    'volatile': {
        'mu': 0.1,
        'kappa': 3.81,
        'theta': 0.25056,
        'xi': 0.59176974,
        'rho': -0.7,
        'lam': 121,
        'm': -0.000709,
        'v': 0.0119
    },
    'speculative_bubble': {
        'mu': list(bubble_drifts),
        'kappa': 1,
        'theta': list(bubble_volatilities),
        'xi': 0,
        'rho': 0,
        'lam': 0,
        'm': 0,
        'v': 0.00000001
    },
}

In [4]:
class CustomPipeline(Pipeline):
    @classmethod
    def from_existing_pipeline(cls, existing_pipeline, memory=None, verbose=False):
        return cls(steps=existing_pipeline.steps, memory=memory, verbose=verbose)
        
    def fit(self, X, y=None, **fit_params):
        if 'sample_weight' in fit_params:
            sample_weight = fit_params.pop('sample_weight')
            for step_name, _ in self.steps:
                fit_params[f"{step_name}__sample_weight"] = sample_weight
        return super().fit(X, y, **fit_params)

## 4. Transition Matrix and Strategy Parameters
A transition matrix is established to represent the probability of transitioning between market states. Additionally, we define `strategy_parameters` for various trading strategies to be tested in our simulation.

In [5]:
dt = TOTAL_TIME / N_STEPS

transition_matrix = np.array([
    [1 - 1 * dt,   1 * dt - 0.00001,        0.00001],  # State 0 transitions
    [20 * dt,      1 - 20 * dt - 0.00001,   0.00001],  # State 1 transitions
    [1 - 1 * dt,   1 * dt,                      0.0],  # State 2 transitions
])

strategy_parameters = {
    'fast_window' : [5, 20, 50, 70],
    'slow_window' : [10, 50, 100, 140],
    'exponential' : [False],
    'mean_reversion' : [False]
}

## 5. Model Definition and Parameter Grids
We declare a dictionary of machine learning models, including k-Nearest Neighbors (k-NN), Decision Tree, and XGBoost. For each model, we specify a custom pipeline and a grid of hyperparameters to be optimized during model training.

In [6]:
# Define models and parameter grids

models = {
    'k-NN' : {
        'Model': CustomPipeline.from_existing_pipeline(existing_pipeline=make_pipeline(StandardScaler(), KNeighborsClassifier())),
        'Parameters': {
            'kneighborsclassifier__n_neighbors': [1, 2, 3],
        }
    },
    'Decision Tree' : {
        'Model': DecisionTreeClassifier(random_state=RANDOM_STATE),
        'Parameters': {
            'max_depth': [None],
            'min_samples_split': [2],
            'min_samples_leaf': [1],
        }
    },
    'XGBoost': {
        'Model': XGBClassifier(use_label_encoder=False, eval_metric='mlogloss', seed=RANDOM_STATE),
        'Parameters': {
            'n_estimators': [1000],
            'max_depth': [1000000000],
            'learning_rate': [1, 10, 100],
            'subsample': [1.0],
            'colsample_bytree': [1.0],
        }
    },
}

## 6. Synthesizing Prices
Finally, we use the `parallel_generate_prices` function to synthesize asset prices for different market regimes, which will serve as the dataset for our backtesting and model validation.

In [None]:
print('Synthesizing Prices...')
all_prices, all_regimes = parallel_generate_prices(
    N_PATHS,
    regimes,
    transition_matrix,
    TOTAL_TIME,
    N_STEPS,
    RANDOM_STATE,
    N_JOBS
)

Synthesizing Prices...


## 7.  Evaluation
The code block executes a parallelized backtesting simulation on all price columns to evaluate the risk of overfitting in different cross-validation (CV) methods. Results from the simulation are collected into lists, which are then transformed into DataFrames for more detailed analysis.

In [None]:
# --- REVIEW: Fixed function call and arguments ---
with joblib_progress("Overall Simulation...", total=all_prices.shape[1]):    
    results = Parallel(n_jobs=N_JOBS)(delayed(overall_backtest_overfitting_simulation)(
        all_prices[column], 
        strategy_parameters, 
        models, 
        STEP_RISK_FREE_RATE, 
        random_state=RANDOM_STATE,
        n_jobs=1  # Run inner backtests serially, parallelize paths
    ) for column in all_prices.columns)

# Assuming results is already populated from the joblib Parallel call
cv_pbo_list = [result[0] for result in results]  # Collect all cv_pbo dicts
cv_deflated_sr_list = [result[1] for result in results]  # Collect all cv_deflated_sr dicts

# Initialize dicts to collect lists for each CV method
cv_pbo_data = {cv: [] for cv in cv_pbo_list[0].keys()}
cv_deflated_sr_data = {cv: [] for cv in cv_deflated_sr_list[0].keys()}

# Populate the cv_pbo_data and cv_deflated_sr_data with concatenated lists from each result
for cv_pbo in cv_pbo_list:
    for cv, values in cv_pbo.items():
        cv_pbo_data[cv].append(values)

for cv_deflated_sr in cv_deflated_sr_list:
    for cv, values in cv_deflated_sr.items():
        cv_deflated_sr_data[cv].append(values)

# Convert the collected lists into DataFrames
cv_pbo_dfs = {cv: pd.DataFrame(cv_pbo_data[cv]).T for cv in cv_pbo_data}
cv_deflated_sr_dfs = {cv: pd.DataFrame(cv_deflated_sr_data[cv]).T for cv in cv_deflated_sr_data}

# Mapping from descriptive CV names to filesystem-friendly names
cv_name_map = {
    'Walk-Forward': 'walkforward',
    'K-Fold': 'kfold',
    'Purged K-Fold': 'purgedkfold',
    'Combinatorial Purged': 'combinatorialpurged',
}

# Save each cv_pbo DataFrame to CSV using the mapping for file names
for cv_name, df in cv_pbo_dfs.items():
    file_name = cv_name_map.get(cv_name, cv_name)  # Fallback to cv_name if not found in the map
    df.to_csv(f'overall_simulated_pbo_{file_name}.csv', index=False)

# Save each cv_deflated_sr DataFrame to CSV using the mapping for file names
for cv_name, df in cv_deflated_sr_dfs.items():
    file_name = cv_name_map.get(cv_name, cv_name)  # Fallback to cv_name if not found in the map
    df.to_csv(f'overall_simulated_deflated_sr_{file_name}.csv', index=False)

Output()

KeyError: "[Timestamp('2000-01-10 00:00:00'), Timestamp('2000-01-13 00:00:00'), Timestamp('2000-01-14 00:00:00'), Timestamp('2000-01-21 00:00:00'), Timestamp('2000-01-27 00:00:00'), Timestamp('2000-02-02 00:00:00'), Timestamp('2000-02-14 00:00:00'), Timestamp('2000-02-23 00:00:00'), Timestamp('2000-03-02 00:00:00'), Timestamp('2000-03-15 00:00:00'), Timestamp('2000-03-24 00:00:00'), Timestamp('2000-04-04 00:00:00'), Timestamp('2000-04-07 00:00:00'), Timestamp('2000-04-17 00:00:00'), Timestamp('2000-04-26 00:00:00'), Timestamp('2000-05-08 00:00:00'), Timestamp('2000-05-24 00:00:00'), Timestamp('2000-05-30 00:00:00'), Timestamp('2000-06-02 00:00:00'), Timestamp('2000-06-13 00:00:00'), Timestamp('2000-06-21 00:00:00'), Timestamp('2000-07-05 00:00:00'), Timestamp('2000-07-11 00:00:00'), Timestamp('2000-07-14 00:00:00'), Timestamp('2000-07-20 00:00:00'), Timestamp('2000-07-28 00:00:00'), Timestamp('2000-08-02 00:00:00'), Timestamp('2000-08-15 00:00:00'), Timestamp('2000-08-21 00:00:00'), Timestamp('2000-08-23 00:00:00'), Timestamp('2000-08-24 00:00:00'), Timestamp('2000-09-04 00:00:00'), Timestamp('2000-09-08 00:00:00'), Timestamp('2000-10-09 00:00:00'), Timestamp('2000-10-19 00:00:00'), Timestamp('2000-10-26 00:00:00'), Timestamp('2000-11-03 00:00:00'), Timestamp('2000-11-29 00:00:00'), Timestamp('2000-12-04 00:00:00'), Timestamp('2000-12-06 00:00:00'), Timestamp('2000-12-08 00:00:00'), Timestamp('2000-12-20 00:00:00'), Timestamp('2000-12-26 00:00:00'), Timestamp('2001-02-06 00:00:00'), Timestamp('2001-02-14 00:00:00'), Timestamp('2001-02-28 00:00:00'), Timestamp('2001-03-09 00:00:00'), Timestamp('2001-03-20 00:00:00'), Timestamp('2001-03-28 00:00:00'), Timestamp('2001-03-29 00:00:00'), Timestamp('2001-04-06 00:00:00'), Timestamp('2001-04-09 00:00:00'), Timestamp('2001-04-13 00:00:00'), Timestamp('2001-04-24 00:00:00'), Timestamp('2001-04-27 00:00:00'), Timestamp('2001-05-08 00:00:00'), Timestamp('2001-05-11 00:00:00'), Timestamp('2001-05-21 00:00:00'), Timestamp('2001-05-30 00:00:00'), Timestamp('2001-06-14 00:00:00'), Timestamp('2001-06-22 00:00:00'), Timestamp('2001-06-26 00:00:00'), Timestamp('2001-07-02 00:00:00'), Timestamp('2001-07-11 00:00:00'), Timestamp('2001-07-20 00:00:00'), Timestamp('2001-07-25 00:00:00'), Timestamp('2001-08-01 00:00:00'), Timestamp('2001-08-09 00:00:00'), Timestamp('2001-08-16 00:00:00'), Timestamp('2001-08-28 00:00:00'), Timestamp('2001-09-04 00:00:00'), Timestamp('2001-09-12 00:00:00'), Timestamp('2001-09-14 00:00:00'), Timestamp('2001-09-25 00:00:00'), Timestamp('2001-10-01 00:00:00'), Timestamp('2001-10-12 00:00:00'), Timestamp('2001-10-18 00:00:00'), Timestamp('2001-10-24 00:00:00'), Timestamp('2001-10-30 00:00:00'), Timestamp('2001-11-02 00:00:00'), Timestamp('2001-11-08 00:00:00'), Timestamp('2001-11-14 00:00:00'), Timestamp('2001-11-19 00:00:00'), Timestamp('2001-11-22 00:00:00'), Timestamp('2001-11-28 00:00:00'), Timestamp('2001-12-04 00:00:00'), Timestamp('2001-12-18 00:00:00'), Timestamp('2001-12-21 00:00:00'), Timestamp('2002-01-03 00:00:00'), Timestamp('2002-01-08 00:00:00'), Timestamp('2002-01-16 00:00:00'), Timestamp('2002-01-22 00:00:00'), Timestamp('2002-02-08 00:00:00'), Timestamp('2002-02-15 00:00:00'), Timestamp('2002-02-20 00:00:00'), Timestamp('2002-02-22 00:00:00'), Timestamp('2002-03-01 00:00:00'), Timestamp('2002-03-11 00:00:00'), Timestamp('2002-03-20 00:00:00'), Timestamp('2002-03-22 00:00:00'), Timestamp('2002-03-28 00:00:00'), Timestamp('2002-04-03 00:00:00'), Timestamp('2002-04-10 00:00:00'), Timestamp('2002-04-23 00:00:00'), Timestamp('2002-04-29 00:00:00'), Timestamp('2002-05-01 00:00:00'), Timestamp('2002-05-02 00:00:00'), Timestamp('2002-05-06 00:00:00'), Timestamp('2002-05-10 00:00:00'), Timestamp('2002-05-14 00:00:00'), Timestamp('2002-05-28 00:00:00'), Timestamp('2002-06-11 00:00:00'), Timestamp('2002-06-21 00:00:00'), Timestamp('2002-06-27 00:00:00'), Timestamp('2002-07-03 00:00:00'), Timestamp('2002-07-11 00:00:00'), Timestamp('2002-07-16 00:00:00'), Timestamp('2002-07-24 00:00:00'), Timestamp('2002-07-31 00:00:00'), Timestamp('2002-08-02 00:00:00'), Timestamp('2002-08-12 00:00:00'), Timestamp('2002-08-19 00:00:00'), Timestamp('2002-08-22 00:00:00'), Timestamp('2002-09-06 00:00:00'), Timestamp('2002-09-16 00:00:00'), Timestamp('2002-09-19 00:00:00'), Timestamp('2002-09-30 00:00:00'), Timestamp('2002-10-03 00:00:00'), Timestamp('2002-10-21 00:00:00'), Timestamp('2002-10-28 00:00:00'), Timestamp('2002-10-30 00:00:00'), Timestamp('2002-11-05 00:00:00'), Timestamp('2002-11-14 00:00:00'), Timestamp('2002-11-22 00:00:00'), Timestamp('2002-12-12 00:00:00'), Timestamp('2002-12-26 00:00:00'), Timestamp('2003-01-09 00:00:00'), Timestamp('2003-01-15 00:00:00'), Timestamp('2003-01-22 00:00:00'), Timestamp('2003-01-29 00:00:00'), Timestamp('2003-02-04 00:00:00'), Timestamp('2003-02-12 00:00:00'), Timestamp('2003-02-20 00:00:00'), Timestamp('2003-02-21 00:00:00'), Timestamp('2003-03-04 00:00:00'), Timestamp('2003-03-12 00:00:00'), Timestamp('2003-03-21 00:00:00'), Timestamp('2003-04-07 00:00:00'), Timestamp('2003-04-11 00:00:00'), Timestamp('2003-04-16 00:00:00'), Timestamp('2003-04-24 00:00:00'), Timestamp('2003-05-08 00:00:00'), Timestamp('2003-05-19 00:00:00'), Timestamp('2003-05-23 00:00:00'), Timestamp('2003-06-06 00:00:00'), Timestamp('2003-06-10 00:00:00'), Timestamp('2003-06-16 00:00:00'), Timestamp('2003-06-23 00:00:00'), Timestamp('2003-06-30 00:00:00'), Timestamp('2003-07-02 00:00:00'), Timestamp('2003-07-09 00:00:00'), Timestamp('2003-07-22 00:00:00'), Timestamp('2003-08-01 00:00:00'), Timestamp('2003-08-14 00:00:00'), Timestamp('2003-08-20 00:00:00'), Timestamp('2003-08-28 00:00:00'), Timestamp('2003-09-02 00:00:00'), Timestamp('2003-09-09 00:00:00'), Timestamp('2003-09-18 00:00:00'), Timestamp('2003-09-30 00:00:00'), Timestamp('2003-10-14 00:00:00'), Timestamp('2003-10-21 00:00:00'), Timestamp('2003-10-28 00:00:00'), Timestamp('2003-11-10 00:00:00'), Timestamp('2003-11-26 00:00:00'), Timestamp('2003-12-04 00:00:00'), Timestamp('2003-12-10 00:00:00'), Timestamp('2003-12-18 00:00:00'), Timestamp('2003-12-23 00:00:00'), Timestamp('2003-12-29 00:00:00'), Timestamp('2004-01-05 00:00:00'), Timestamp('2004-01-09 00:00:00'), Timestamp('2004-01-22 00:00:00'), Timestamp('2004-01-29 00:00:00'), Timestamp('2004-02-10 00:00:00'), Timestamp('2004-02-13 00:00:00'), Timestamp('2004-02-17 00:00:00'), Timestamp('2004-02-24 00:00:00'), Timestamp('2004-03-01 00:00:00'), Timestamp('2004-03-12 00:00:00'), Timestamp('2004-03-18 00:00:00'), Timestamp('2004-03-24 00:00:00'), Timestamp('2004-03-29 00:00:00'), Timestamp('2004-04-08 00:00:00'), Timestamp('2004-04-28 00:00:00'), Timestamp('2004-05-06 00:00:00'), Timestamp('2004-05-12 00:00:00'), Timestamp('2004-05-17 00:00:00'), Timestamp('2004-05-24 00:00:00'), Timestamp('2004-05-27 00:00:00'), Timestamp('2004-06-10 00:00:00'), Timestamp('2004-06-23 00:00:00'), Timestamp('2004-06-29 00:00:00'), Timestamp('2004-07-06 00:00:00'), Timestamp('2004-07-08 00:00:00'), Timestamp('2004-07-15 00:00:00'), Timestamp('2004-07-20 00:00:00'), Timestamp('2004-07-26 00:00:00'), Timestamp('2004-08-16 00:00:00'), Timestamp('2004-08-18 00:00:00'), Timestamp('2004-08-24 00:00:00'), Timestamp('2004-09-14 00:00:00'), Timestamp('2004-09-20 00:00:00'), Timestamp('2004-09-24 00:00:00'), Timestamp('2004-09-28 00:00:00'), Timestamp('2004-10-04 00:00:00'), Timestamp('2004-10-11 00:00:00'), Timestamp('2004-10-13 00:00:00'), Timestamp('2004-10-19 00:00:00'), Timestamp('2004-10-21 00:00:00'), Timestamp('2005-01-13 00:00:00'), Timestamp('2005-02-07 00:00:00'), Timestamp('2005-03-07 00:00:00'), Timestamp('2005-03-25 00:00:00'), Timestamp('2005-04-08 00:00:00'), Timestamp('2005-05-04 00:00:00'), Timestamp('2005-05-09 00:00:00'), Timestamp('2005-05-10 00:00:00'), Timestamp('2005-05-23 00:00:00'), Timestamp('2005-05-27 00:00:00'), Timestamp('2005-06-01 00:00:00'), Timestamp('2005-06-06 00:00:00'), Timestamp('2005-06-15 00:00:00'), Timestamp('2005-06-24 00:00:00'), Timestamp('2005-06-30 00:00:00'), Timestamp('2005-07-05 00:00:00'), Timestamp('2005-07-20 00:00:00'), Timestamp('2005-08-09 00:00:00'), Timestamp('2005-08-18 00:00:00'), Timestamp('2005-09-01 00:00:00'), Timestamp('2005-09-05 00:00:00'), Timestamp('2005-09-09 00:00:00'), Timestamp('2005-09-19 00:00:00'), Timestamp('2005-10-03 00:00:00'), Timestamp('2005-10-24 00:00:00'), Timestamp('2005-11-10 00:00:00'), Timestamp('2005-11-28 00:00:00'), Timestamp('2005-12-15 00:00:00'), Timestamp('2005-12-30 00:00:00'), Timestamp('2006-01-11 00:00:00'), Timestamp('2006-01-17 00:00:00'), Timestamp('2006-01-24 00:00:00'), Timestamp('2006-01-30 00:00:00'), Timestamp('2006-02-03 00:00:00'), Timestamp('2006-03-02 00:00:00'), Timestamp('2006-03-06 00:00:00'), Timestamp('2006-03-15 00:00:00'), Timestamp('2006-04-06 00:00:00'), Timestamp('2006-05-08 00:00:00'), Timestamp('2006-05-18 00:00:00'), Timestamp('2006-05-30 00:00:00'), Timestamp('2006-06-08 00:00:00'), Timestamp('2006-06-15 00:00:00'), Timestamp('2006-06-19 00:00:00'), Timestamp('2006-06-30 00:00:00'), Timestamp('2006-07-07 00:00:00'), Timestamp('2006-07-20 00:00:00'), Timestamp('2006-07-28 00:00:00'), Timestamp('2006-08-01 00:00:00'), Timestamp('2006-08-14 00:00:00'), Timestamp('2006-08-17 00:00:00'), Timestamp('2006-08-22 00:00:00'), Timestamp('2006-08-24 00:00:00'), Timestamp('2006-08-29 00:00:00'), Timestamp('2006-08-31 00:00:00'), Timestamp('2006-09-01 00:00:00'), Timestamp('2006-09-12 00:00:00'), Timestamp('2006-09-14 00:00:00'), Timestamp('2006-09-20 00:00:00'), Timestamp('2006-09-22 00:00:00'), Timestamp('2006-09-26 00:00:00'), Timestamp('2006-10-05 00:00:00'), Timestamp('2006-10-27 00:00:00'), Timestamp('2006-11-06 00:00:00'), Timestamp('2006-11-16 00:00:00'), Timestamp('2006-11-27 00:00:00'), Timestamp('2006-12-01 00:00:00'), Timestamp('2006-12-14 00:00:00'), Timestamp('2006-12-27 00:00:00'), Timestamp('2007-01-09 00:00:00'), Timestamp('2007-01-16 00:00:00'), Timestamp('2007-02-09 00:00:00'), Timestamp('2007-02-22 00:00:00'), Timestamp('2007-02-23 00:00:00'), Timestamp('2007-02-27 00:00:00'), Timestamp('2007-03-01 00:00:00'), Timestamp('2007-03-19 00:00:00'), Timestamp('2007-03-22 00:00:00'), Timestamp('2007-04-06 00:00:00'), Timestamp('2007-05-01 00:00:00'), Timestamp('2007-05-04 00:00:00'), Timestamp('2007-05-11 00:00:00'), Timestamp('2007-05-17 00:00:00'), Timestamp('2007-05-31 00:00:00'), Timestamp('2007-06-05 00:00:00'), Timestamp('2007-06-12 00:00:00'), Timestamp('2007-06-18 00:00:00'), Timestamp('2007-07-03 00:00:00'), Timestamp('2007-07-16 00:00:00'), Timestamp('2007-07-19 00:00:00'), Timestamp('2007-08-07 00:00:00'), Timestamp('2007-08-09 00:00:00'), Timestamp('2007-08-22 00:00:00'), Timestamp('2007-08-28 00:00:00'), Timestamp('2007-09-04 00:00:00'), Timestamp('2007-09-10 00:00:00'), Timestamp('2007-09-13 00:00:00'), Timestamp('2007-09-21 00:00:00'), Timestamp('2007-09-27 00:00:00'), Timestamp('2007-10-09 00:00:00'), Timestamp('2007-10-29 00:00:00'), Timestamp('2007-11-07 00:00:00'), Timestamp('2007-11-19 00:00:00'), Timestamp('2007-11-26 00:00:00'), Timestamp('2007-12-10 00:00:00'), Timestamp('2007-12-20 00:00:00'), Timestamp('2007-12-31 00:00:00'), Timestamp('2008-01-08 00:00:00'), Timestamp('2008-01-25 00:00:00'), Timestamp('2008-02-18 00:00:00'), Timestamp('2008-03-04 00:00:00'), Timestamp('2008-03-17 00:00:00'), Timestamp('2008-03-24 00:00:00'), Timestamp('2008-03-26 00:00:00'), Timestamp('2008-04-01 00:00:00'), Timestamp('2008-04-08 00:00:00'), Timestamp('2008-04-10 00:00:00'), Timestamp('2008-04-14 00:00:00'), Timestamp('2008-04-18 00:00:00'), Timestamp('2008-04-22 00:00:00'), Timestamp('2008-05-01 00:00:00'), Timestamp('2008-05-09 00:00:00'), Timestamp('2008-05-16 00:00:00'), Timestamp('2008-06-16 00:00:00'), Timestamp('2008-06-26 00:00:00'), Timestamp('2008-07-18 00:00:00'), Timestamp('2008-07-23 00:00:00'), Timestamp('2008-08-08 00:00:00'), Timestamp('2008-08-29 00:00:00'), Timestamp('2008-09-03 00:00:00'), Timestamp('2008-09-15 00:00:00'), Timestamp('2008-09-19 00:00:00'), Timestamp('2008-10-09 00:00:00'), Timestamp('2008-10-15 00:00:00'), Timestamp('2008-10-22 00:00:00'), Timestamp('2008-11-07 00:00:00'), Timestamp('2008-11-13 00:00:00'), Timestamp('2008-11-26 00:00:00'), Timestamp('2008-12-22 00:00:00'), Timestamp('2008-12-25 00:00:00'), Timestamp('2009-01-12 00:00:00'), Timestamp('2009-01-22 00:00:00'), Timestamp('2009-02-05 00:00:00'), Timestamp('2009-02-12 00:00:00'), Timestamp('2009-02-19 00:00:00'), Timestamp('2009-03-02 00:00:00'), Timestamp('2009-03-16 00:00:00'), Timestamp('2009-03-19 00:00:00'), Timestamp('2009-03-25 00:00:00'), Timestamp('2009-04-13 00:00:00'), Timestamp('2009-04-16 00:00:00'), Timestamp('2009-04-20 00:00:00'), Timestamp('2009-05-04 00:00:00'), Timestamp('2009-05-14 00:00:00'), Timestamp('2009-05-19 00:00:00'), Timestamp('2009-05-26 00:00:00'), Timestamp('2009-06-01 00:00:00'), Timestamp('2009-06-16 00:00:00'), Timestamp('2009-06-17 00:00:00'), Timestamp('2009-06-26 00:00:00'), Timestamp('2009-07-16 00:00:00'), Timestamp('2009-08-05 00:00:00'), Timestamp('2009-08-07 00:00:00'), Timestamp('2009-08-20 00:00:00'), Timestamp('2009-09-07 00:00:00'), Timestamp('2009-09-14 00:00:00'), Timestamp('2009-10-05 00:00:00'), Timestamp('2009-10-27 00:00:00'), Timestamp('2009-11-18 00:00:00'), Timestamp('2009-12-03 00:00:00'), Timestamp('2009-12-22 00:00:00'), Timestamp('2010-01-08 00:00:00'), Timestamp('2010-01-14 00:00:00'), Timestamp('2010-02-02 00:00:00'), Timestamp('2010-02-11 00:00:00'), Timestamp('2010-02-24 00:00:00'), Timestamp('2010-03-01 00:00:00'), Timestamp('2010-03-04 00:00:00'), Timestamp('2010-03-17 00:00:00'), Timestamp('2010-03-26 00:00:00'), Timestamp('2010-04-14 00:00:00'), Timestamp('2010-04-16 00:00:00'), Timestamp('2010-04-21 00:00:00'), Timestamp('2010-04-26 00:00:00'), Timestamp('2010-05-07 00:00:00'), Timestamp('2010-05-14 00:00:00'), Timestamp('2010-05-28 00:00:00'), Timestamp('2010-06-02 00:00:00'), Timestamp('2010-06-07 00:00:00'), Timestamp('2010-06-16 00:00:00'), Timestamp('2010-06-21 00:00:00'), Timestamp('2010-06-29 00:00:00'), Timestamp('2010-07-06 00:00:00'), Timestamp('2010-07-15 00:00:00'), Timestamp('2010-07-28 00:00:00'), Timestamp('2010-08-10 00:00:00'), Timestamp('2010-08-13 00:00:00'), Timestamp('2010-08-23 00:00:00'), Timestamp('2010-08-26 00:00:00'), Timestamp('2010-09-02 00:00:00'), Timestamp('2010-09-10 00:00:00'), Timestamp('2010-09-17 00:00:00'), Timestamp('2010-09-22 00:00:00'), Timestamp('2010-09-27 00:00:00'), Timestamp('2010-09-28 00:00:00'), Timestamp('2010-10-19 00:00:00'), Timestamp('2010-10-27 00:00:00'), Timestamp('2010-11-01 00:00:00'), Timestamp('2010-11-11 00:00:00'), Timestamp('2010-11-26 00:00:00'), Timestamp('2011-01-04 00:00:00'), Timestamp('2011-01-07 00:00:00'), Timestamp('2011-01-20 00:00:00'), Timestamp('2011-01-24 00:00:00'), Timestamp('2011-01-25 00:00:00'), Timestamp('2011-01-28 00:00:00'), Timestamp('2011-02-14 00:00:00'), Timestamp('2011-02-18 00:00:00'), Timestamp('2011-02-23 00:00:00'), Timestamp('2011-02-28 00:00:00'), Timestamp('2011-03-07 00:00:00'), Timestamp('2011-03-08 00:00:00'), Timestamp('2011-03-14 00:00:00'), Timestamp('2011-03-18 00:00:00'), Timestamp('2011-03-21 00:00:00'), Timestamp('2011-03-24 00:00:00'), Timestamp('2011-04-11 00:00:00'), Timestamp('2011-05-09 00:00:00'), Timestamp('2011-05-26 00:00:00'), Timestamp('2011-06-16 00:00:00'), Timestamp('2011-06-29 00:00:00'), Timestamp('2011-07-08 00:00:00'), Timestamp('2011-07-19 00:00:00'), Timestamp('2011-07-29 00:00:00'), Timestamp('2011-08-12 00:00:00'), Timestamp('2011-08-18 00:00:00'), Timestamp('2011-09-09 00:00:00'), Timestamp('2011-09-20 00:00:00'), Timestamp('2011-09-28 00:00:00'), Timestamp('2011-10-06 00:00:00'), Timestamp('2011-10-12 00:00:00'), Timestamp('2011-10-20 00:00:00'), Timestamp('2011-11-02 00:00:00'), Timestamp('2011-11-10 00:00:00'), Timestamp('2011-11-16 00:00:00'), Timestamp('2011-11-29 00:00:00'), Timestamp('2011-12-05 00:00:00'), Timestamp('2011-12-12 00:00:00'), Timestamp('2011-12-29 00:00:00'), Timestamp('2012-01-03 00:00:00'), Timestamp('2012-01-04 00:00:00'), Timestamp('2012-01-16 00:00:00'), Timestamp('2012-01-27 00:00:00'), Timestamp('2012-02-03 00:00:00'), Timestamp('2012-02-10 00:00:00'), Timestamp('2012-03-06 00:00:00'), Timestamp('2012-03-09 00:00:00'), Timestamp('2012-03-12 00:00:00'), Timestamp('2012-03-19 00:00:00'), Timestamp('2012-03-21 00:00:00'), Timestamp('2012-03-28 00:00:00'), Timestamp('2012-04-03 00:00:00'), Timestamp('2012-04-16 00:00:00'), Timestamp('2012-04-24 00:00:00'), Timestamp('2012-04-27 00:00:00'), Timestamp('2012-05-08 00:00:00'), Timestamp('2012-05-10 00:00:00'), Timestamp('2012-05-11 00:00:00'), Timestamp('2012-05-15 00:00:00'), Timestamp('2012-05-16 00:00:00'), Timestamp('2012-06-15 00:00:00'), Timestamp('2012-06-19 00:00:00'), Timestamp('2012-06-29 00:00:00'), Timestamp('2012-07-04 00:00:00'), Timestamp('2012-07-16 00:00:00'), Timestamp('2012-07-26 00:00:00'), Timestamp('2012-08-06 00:00:00'), Timestamp('2012-08-16 00:00:00'), Timestamp('2012-08-28 00:00:00'), Timestamp('2012-08-30 00:00:00'), Timestamp('2012-09-05 00:00:00'), Timestamp('2012-09-14 00:00:00'), Timestamp('2012-09-27 00:00:00'), Timestamp('2012-10-09 00:00:00'), Timestamp('2012-10-31 00:00:00'), Timestamp('2012-11-06 00:00:00'), Timestamp('2012-11-12 00:00:00'), Timestamp('2012-11-23 00:00:00'), Timestamp('2012-12-10 00:00:00'), Timestamp('2012-12-25 00:00:00'), Timestamp('2012-12-26 00:00:00'), Timestamp('2013-01-07 00:00:00'), Timestamp('2013-01-18 00:00:00'), Timestamp('2013-01-24 00:00:00'), Timestamp('2013-02-07 00:00:00'), Timestamp('2013-02-21 00:00:00'), Timestamp('2013-03-05 00:00:00'), Timestamp('2013-03-08 00:00:00'), Timestamp('2013-04-02 00:00:00'), Timestamp('2013-04-09 00:00:00'), Timestamp('2013-04-22 00:00:00'), Timestamp('2013-04-26 00:00:00'), Timestamp('2013-05-09 00:00:00'), Timestamp('2013-05-16 00:00:00'), Timestamp('2013-05-28 00:00:00'), Timestamp('2013-06-03 00:00:00'), Timestamp('2013-06-10 00:00:00'), Timestamp('2013-06-12 00:00:00'), Timestamp('2013-06-18 00:00:00'), Timestamp('2013-06-24 00:00:00'), Timestamp('2013-06-26 00:00:00'), Timestamp('2013-07-09 00:00:00'), Timestamp('2013-07-15 00:00:00'), Timestamp('2013-07-23 00:00:00'), Timestamp('2013-07-29 00:00:00'), Timestamp('2013-08-12 00:00:00'), Timestamp('2013-08-15 00:00:00'), Timestamp('2013-08-19 00:00:00'), Timestamp('2013-08-20 00:00:00'), Timestamp('2013-08-26 00:00:00'), Timestamp('2013-08-30 00:00:00'), Timestamp('2013-09-11 00:00:00'), Timestamp('2013-09-23 00:00:00'), Timestamp('2013-09-27 00:00:00'), Timestamp('2013-10-16 00:00:00'), Timestamp('2013-10-22 00:00:00'), Timestamp('2013-10-31 00:00:00'), Timestamp('2013-11-08 00:00:00'), Timestamp('2013-11-22 00:00:00'), Timestamp('2013-12-04 00:00:00'), Timestamp('2013-12-12 00:00:00'), Timestamp('2013-12-27 00:00:00'), Timestamp('2014-01-08 00:00:00'), Timestamp('2014-01-20 00:00:00'), Timestamp('2014-01-31 00:00:00'), Timestamp('2014-02-06 00:00:00'), Timestamp('2014-02-07 00:00:00'), Timestamp('2014-02-28 00:00:00'), Timestamp('2014-04-03 00:00:00'), Timestamp('2014-04-30 00:00:00'), Timestamp('2014-05-07 00:00:00'), Timestamp('2014-05-12 00:00:00'), Timestamp('2014-05-19 00:00:00'), Timestamp('2014-05-20 00:00:00'), Timestamp('2014-05-26 00:00:00'), Timestamp('2014-06-04 00:00:00'), Timestamp('2014-06-11 00:00:00'), Timestamp('2014-06-17 00:00:00'), Timestamp('2014-06-26 00:00:00'), Timestamp('2014-07-08 00:00:00'), Timestamp('2014-08-15 00:00:00'), Timestamp('2014-08-28 00:00:00'), Timestamp('2014-09-05 00:00:00'), Timestamp('2014-09-23 00:00:00'), Timestamp('2014-09-24 00:00:00'), Timestamp('2014-10-08 00:00:00'), Timestamp('2014-10-15 00:00:00'), Timestamp('2014-10-23 00:00:00'), Timestamp('2014-10-30 00:00:00'), Timestamp('2014-11-05 00:00:00'), Timestamp('2014-12-05 00:00:00'), Timestamp('2014-12-25 00:00:00'), Timestamp('2015-01-01 00:00:00'), Timestamp('2015-01-13 00:00:00'), Timestamp('2015-01-16 00:00:00'), Timestamp('2015-02-04 00:00:00'), Timestamp('2015-02-17 00:00:00'), Timestamp('2015-02-24 00:00:00'), Timestamp('2015-03-10 00:00:00'), Timestamp('2015-03-16 00:00:00'), Timestamp('2015-03-19 00:00:00'), Timestamp('2015-03-24 00:00:00'), Timestamp('2015-04-09 00:00:00'), Timestamp('2015-04-16 00:00:00'), Timestamp('2015-04-29 00:00:00'), Timestamp('2015-05-06 00:00:00'), Timestamp('2015-05-14 00:00:00')] not in index"

## 8. Partitioned Data Evaluation
In a temporal analysis, the same simulation is conducted on partitions of the data to understand how each CV method performs over time. The results are stored in CSV files for each CV method, mapping descriptive names to filesystem-friendly names for consistency and ease of access.

In [None]:
# --- REVIEW: Fixed function call --- 
with joblib_progress("Temporal Simulation...", total=all_prices.shape[1]):    
    results = Parallel(n_jobs=N_JOBS)(delayed(temporal_backtest_overfitting_simulation)(
        all_prices[column], 
        strategy_parameters, 
        models, 
        STEP_RISK_FREE_RATE, 
        OVERFITTING_PARTITIONS_LENGTH,
        n_jobs=1 # Run inner backtests serially
    ) for column in all_prices.columns)

# Assuming results is already populated from the joblib Parallel call
cv_pbo_list = [result[0] for result in results]  # Collect all cv_pbo dicts
cv_deflated_sr_list = [result[1] for result in results]  # Collect all cv_deflated_sr dicts

# Initialize dicts to collect lists for each CV method
cv_pbo_data = {cv: [] for cv in cv_pbo_list[0].keys()}
cv_deflated_sr_data = {cv: [] for cv in cv_deflated_sr_list[0].keys()}

# Populate the cv_pbo_data and cv_deflated_sr_data with concatenated lists from each result
for cv_pbo in cv_pbo_list:
    for cv, values in cv_pbo.items():
        cv_pbo_data[cv].append(values)

for cv_deflated_sr in cv_deflated_sr_list:
    for cv, values in cv_deflated_sr.items():
        cv_deflated_sr_data[cv].append(values)

# Convert the collected lists into DataFrames
cv_pbo_dfs = {cv: pd.DataFrame(cv_pbo_data[cv]).T for cv in cv_pbo_data}
cv_deflated_sr_dfs = {cv: pd.DataFrame(cv_deflated_sr_data[cv]).T for cv in cv_deflated_sr_data}

# Mapping from descriptive CV names to filesystem-friendly names
cv_name_map = {
    'Walk-Forward': 'walkforward',
    'K-Fold': 'kfold',
    'Purged K-Fold': 'purgedkfold',
    'Combinatorial Purged': 'combinatorialpurged',
}

# Save each cv_pbo DataFrame to CSV using the mapping for file names
for cv_name, df in cv_pbo_dfs.items():
    file_name = cv_name_map.get(cv_name, cv_name)  # Fallback to cv_name if not found in the map
    df.to_csv(f'simulated_pbo_{file_name}.csv', index=False)

# Save each cv_deflated_sr DataFrame to CSV using the mapping for file names
for cv_name, df in cv_deflated_sr_dfs.items():
    file_name = cv_name_map.get(cv_name, cv_name)  # Fallback to cv_name if not found in the map
    df.to_csv(f'simulated_deflated_sr_{file_name}.csv', index=False)

## 9. CPCV, B-CPCV, A-CPCV Evaluations

This section runs the simulation using the `overall_novel_methods_backtest_overfitting_simulation` function, which specifically tests your advanced cross-validators: `CombinatorialPurged`, `BaggedCombinatorialPurged`, and `AdaptiveCombinatorialPurged`.

In [None]:
print("\n--- Starting Novel Methods Simulation ---")
with joblib_progress("Novel CV Simulation...", total=all_prices.shape[1]):    
    novel_results = Parallel(n_jobs=N_JOBS)(delayed(overall_novel_methods_backtest_overfitting_simulation)(
        all_prices[column], 
        strategy_parameters, 
        models, 
        STEP_RISK_FREE_RATE, 
        random_state=RANDOM_STATE,
        n_jobs=1
    ) for column in all_prices.columns)

# Process results
novel_pbo_list = [result[0] for result in novel_results]
novel_dsr_list = [result[1] for result in novel_results]
novel_pbo_data = {cv: [] for cv in novel_pbo_list[0].keys()}
novel_dsr_data = {cv: [] for cv in novel_dsr_list[0].keys()}

for cv_pbo in novel_pbo_list:
    for cv, values in cv_pbo.items():
        novel_pbo_data[cv].append(values)
for cv_deflated_sr in novel_dsr_list:
    for cv, values in cv_deflated_sr.items():
        novel_dsr_data[cv].append(values)

novel_pbo_dfs = {cv: pd.DataFrame(novel_pbo_data[cv]).T for cv in novel_pbo_data}
novel_dsr_dfs = {cv: pd.DataFrame(novel_dsr_data[cv]).T for cv in novel_dsr_data}

# Save results
print("\nSaving Novel CV results...")
for cv_name, df in novel_pbo_dfs.items():
    file_name = cv_name.lower().replace(' ', '').replace('-', '')
    df.to_csv(f'novel_simulated_pbo_{file_name}.csv', index=False)
for cv_name, df in novel_dsr_dfs.items():
    file_name = cv_name.lower().replace(' ', '').replace('-', '')
    df.to_csv(f'novel_simulated_deflated_sr_{file_name}.csv', index=False)

## 10. Hardware & Performance Profiling

This section uses the profiling functions from the simulation module to assess the computational cost of the different CV methods and the parallelization efficiency of CPCV.

In [None]:
print("\n--- Starting Hardware Profiling ---")

# 1. Get CPU Info
try:
    cpu_info = get_cpu_info()
    print("--- CPU Information ---")
    print(format_cpu_info(cpu_info))
except Exception as e:
    print(f"Could not get CPU info: {e}")

# 2. Define CVs for profilingS
dummy_times = pd.Series(pd.date_range('2000-01-01', periods=100))
profiling_cvs = {
    'Walk-Forward': CrossValidatorController('walkforward', n_splits=4).cross_validator,
    'K-Fold': CrossValidatorController('kfold', n_splits=4).cross_validator,
    'Purged K-Fold': CrossValidatorController('purgedkfold', n_splits=4, times=dummy_times, embargo=0.02).cross_validator,
    'Combinatorial Purged': CrossValidatorController('combinatorialpurged', n_splits=8, n_test_groups=2, times=dummy_times, embargo=0.02).cross_validator,
}

# 3. Measure computational requirements
print("\n--- Measuring Computational Requirements (30 repeats) ---")
# Use smaller sample size for a quick notebook run
comp_req_df = measure_all_cv_computational_requirements(
    cross_validators=profiling_cvs,
    n_samples=5000, 
    n_features=20, 
    n_jobs=N_JOBS,
    n_repeats=30
)
print(comp_req_df.to_markdown(floatfmt=".4f"))
comp_req_df.to_csv('computational_requirements.csv')

# 4. Measure CPCV Parallelization
print("\n--- Measuring CPCV Parallelization (30 repeats) ---")
parallel_df = measure_cpcv_parallelization(
    n_samples=5000, 
    n_features=20, 
    n_repeats=30,
    n_jobs_list=[1, 2, 4, 8, -1] # Test 1, 2, 4, 8, and max cores
)
print(parallel_df.to_markdown(floatfmt=".4f"))
parallel_df.to_csv('cpcv_parallelization.csv')

By conducting these simulations, we can quantify the Probability of Backtest Overfitting (PBO) and the performance of the Deflated Sharpe Ratio (DSR) across different CV methods, providing a comprehensive view of their robustness in temporal contexts.