### RLDynamicHedger Hyper-parameter Tuning For All RL Algorithms
 - This notebook is used to demo the hyper-parameter tuning of DDPG, TD3, SAC and PPO RL algorithms
 - It also persists the best model trained with the searched/optimal hyper-parameters
 - Currently setup for RL algorithm: DDPG and simulation use case: GBM (note that you can changes the RL algorithm and hedging simulation settings)

#### Imports

In [1]:
import os, sys
import numpy as np
import time

SEED = 100
NEW_LINE = "\n"
LINE_DIVIDER = "==========" * 5

np.random.seed(SEED)

#### Import the experiment use cases module

In [2]:
from experiment_use_cases import run_scenario_map, getRunScenarioParams

Current path is: C:\Development\Training\MLI Certificate of Finance\Final-Project\Project\RLDynamicHedgerV2...

Root folder: C:\Development\Training\MLI Certificate of Finance\Final-Project\Project\RLDynamicHedgerV2\scripts\..


  from .autonotebook import tqdm as notebook_tqdm


#### Set current working directory..

In [3]:
ROOT_PATH = "../"
os.chdir(ROOT_PATH)
sys.path.insert(1, ROOT_PATH)
print(f"Current path is: {os.getcwd()}...{NEW_LINE}")

#### Libaries for RLDynamicHedger


Current path is: C:\Development\Training\MLI Certificate of Finance\Final-Project\Project...



In [4]:
from src.main.utility.enum_types import PlotType, AggregationType, HedgingType, RLAgorithmType
from src.main.market_simulator.parameters import Parameters
from src.main.utility.utils import Helpers
from scripts.tune_hedger_rl_model import TuneHyperparametersForRLModels
import src.main.configs as configs

#### Set demo use cases

In [5]:
HEDGING_TYPE = HedgingType.gbm
RL_ALGO_TYPE = RLAgorithmType.ppo
MODEL_USE_CASE = "low_moneyness"
# USE_CASES = ["low_expiry", "high_expiry", "low_trading_cost", "high_trading_cost", "low_trading_freq","high_trading_freq", "high_moneyness", "low_moneyness"]
USE_CASES = ["low_moneyness"]
# ALGO_TYPES = [RLAgorithmType.ppo, RLAgorithmType.ddpg, RLAgorithmType.td3, RLAgorithmType.sac]
ALGO_TYPES = [RLAgorithmType.ddpg]

#### Run the hyper-parameter tuning/training cycles

In [6]:
def runTuningCyles(
    model_use_case: str = MODEL_USE_CASE,
    algo_type: RLAgorithmType = RL_ALGO_TYPE,
    hedging_type: HedgingType = HEDGING_TYPE
):
    """
    Entry point to run the RL hyper-parameter tuning/training cycles
    :param model_use_case: Model use case
    :param algo_type: Algorithm type
    :param hedging_type: Hedging type    
    :return: None
    """
    start_time = time.process_time()
    print(f"Start of RL agent hyper-parameter tuning/training cycles for RL agorithm: {algo_type} and simulation hedging use case: {hedging_type}\n")
    parameter_settings_data = Helpers.getParameterSettings(configs.DEFAULT_SETTINGS_NAME)
    is_test_env = False
    parameters = Parameters(**parameter_settings_data)
    
    run_scenario_parameters = getRunScenarioParams(parameters, scenario=model_use_case, is_test_env=is_test_env)
    model = TuneHyperparametersForRLModels(algo_type, hedging_type, run_scenario_parameters, model_use_case=model_use_case)
    
    best_model_path = model.hyperparameterTuningCyle()
    end_time = time.process_time()
    elapsed_time_sec = round(end_time - start_time, 4)
    elapsed_time_min = round(elapsed_time_sec/60, 4)
    print(f"End of tunning cycles, the best hyper-parameter model is saved at this folder: {best_model_path}")
    print(f"Processing time was: {elapsed_time_sec} seconds | {elapsed_time_min} minutes")




#### Run the RL tuning/training cycles for all the use cases

In [7]:
def run():
    """
    Run the RL tuning/training cycles
    """
    use_cases = USE_CASES
    algo_types = ALGO_TYPES    
    
    for use_case in use_cases:
        for algo_type in algo_types:
           runTuningCyles(
               model_use_case=use_case,
               algo_type=algo_type,
               hedging_type=HedgingType.gbm
           )
           print(f"{LINE_DIVIDER}\n\n\n")

run()

2025-04-20 16:52:29,940 - INFO - tune_hedger_rl_model.py:__init__ - : RL Delta Hedger for ddpg algorithm type in C:\Development\Training\MLI Certificate of Finance\Final-Project\Project\RLDynamicHedgerV2\scripts\tune_hedger_rl_model.py:50
2025-04-20 16:52:29,947 - INFO - env_v2.py:__init__ - : parameters:
Parameters(n_paths=1000, n_time_steps=252, n_days_per_year=252, trading_frequency=1, option_expiry_time=1.0, start_stock_price=80.0, strike_price=100, volatility=0.2, start_volatility=0.2, volatility_of_volatility=0.6, risk_free_rate=0.0, dividend_rate=0.0, return_on_stock=0.05, cost_per_traded_stock=0.01, rho=-0.4, stdev_coefficient=1.5, central_difference_spacing=0.01, notional=100, is_reset_path=False, is_test_env=False, hedging_type=<HedgingType.gbm: 1>, maturity_in_months=12, n_business_days=20, volatility_mean_reversion=1.0, long_term_volatility=0.04, volatility_correlation=-0.7, hedging_time_step=0.003968253968253968, trading_cost_parameter=0.003, risk_averse_level=0.1, is_incl

Start of RL agent hyper-parameter tuning/training cycles for RL agorithm: RLAgorithmType.ddpg and simulation hedging use case: HedgingType.gbm



2025-04-20 16:52:30,100 - INFO - caching.py:option_price_data - : Getting option price data from data/12month/1d/OTM/option_price_gbm_simulation.csv in C:\Development\Training\MLI Certificate of Finance\Final-Project\Project\RLDynamicHedgerV2\src\main\market_simulator\caching.py:88
2025-04-20 16:52:30,209 - INFO - caching.py:option_delta_data - : Getting option delta data from data/12month/1d/OTM/option_delta_gbm_simulation.csv in C:\Development\Training\MLI Certificate of Finance\Final-Project\Project\RLDynamicHedgerV2\src\main\market_simulator\caching.py:114
  gym.logger.warn(
[I 2025-04-20 16:52:30,350] A new study created in memory with name: no-name-e6ad07b3-657c-4a1a-b36d-1cca5c6002b1
[I 2025-04-20 17:00:33,787] Trial 0 finished with value: -2.50748 and parameters: {'gamma': 0.9999, 'learning_rate': 0.09129660360504312, 'batch_size': 1024, 'buffer_size': 1000000, 'tau': 0.001, 'train_freq': 256, 'noise_type': 'normal', 'noise_std': 0.6636636392431936, 'net_arch': 'big', 'activati

Training reward: -2.50748 +/-0.0 for 6300 steps
New best model saved with mean_reward: -2.51
Number of reward values = 134410


[I 2025-04-20 17:03:19,266] Trial 1 finished with value: -2.5074879999999995 and parameters: {'gamma': 0.9, 'learning_rate': 6.623556125014992e-05, 'batch_size': 32, 'buffer_size': 1000000, 'tau': 0.08, 'train_freq': 64, 'noise_type': None, 'noise_std': 0.9376361871441395, 'net_arch': 'big', 'activation_fn': 'relu'}. Best is trial 0 with value: -2.50748.


Training reward: -2.5074879999999995 +/-4.440892098500626e-16 for 6300 steps


[I 2025-04-20 17:05:42,438] Trial 2 finished with value: -27.052115000000004 and parameters: {'gamma': 0.995, 'learning_rate': 2.623559867509604e-05, 'batch_size': 512, 'buffer_size': 100000, 'tau': 0.02, 'train_freq': 128, 'noise_type': None, 'noise_std': 0.5392938225513774, 'net_arch': 'small', 'activation_fn': 'tanh'}. Best is trial 0 with value: -2.50748.


Training reward: -27.052993000000004 +/-3.552713678800501e-15 for 6300 steps


[I 2025-04-20 17:08:44,329] Trial 3 finished with value: -2.5074870000000002 and parameters: {'gamma': 0.9, 'learning_rate': 0.0005604719625679205, 'batch_size': 100, 'buffer_size': 100000, 'tau': 0.01, 'train_freq': 8, 'noise_type': None, 'noise_std': 0.23358901359925022, 'net_arch': 'big', 'activation_fn': 'elu'}. Best is trial 0 with value: -2.50748.


Training reward: -2.5074870000000002 +/-4.440892098500626e-16 for 6300 steps


[I 2025-04-20 17:10:54,805] Trial 4 finished with value: -2.5105180000000002 and parameters: {'gamma': 0.95, 'learning_rate': 0.0002477532689757198, 'batch_size': 128, 'buffer_size': 10000, 'tau': 0.05, 'train_freq': 128, 'noise_type': 'ornstein-uhlenbeck', 'noise_std': 0.6239015357809244, 'net_arch': 'small', 'activation_fn': 'elu'}. Best is trial 0 with value: -2.50748.


Training reward: -2.510106 +/-0.0 for 6300 steps


[I 2025-04-20 17:11:22,638] Trial 5 pruned. 


Training reward: -27.064134000000003 +/-3.552713678800501e-15 for 6300 steps


[I 2025-04-20 17:14:23,851] Trial 6 finished with value: -2.50748 and parameters: {'gamma': 0.9999, 'learning_rate': 0.0623668295222346, 'batch_size': 64, 'buffer_size': 1000000, 'tau': 0.001, 'train_freq': 256, 'noise_type': 'normal', 'noise_std': 0.05240783717258801, 'net_arch': 'big', 'activation_fn': 'relu'}. Best is trial 0 with value: -2.50748.


Training reward: -2.50748 +/-0.0 for 6300 steps


[I 2025-04-20 17:16:51,312] Trial 7 finished with value: -2.50748 and parameters: {'gamma': 0.999, 'learning_rate': 0.015210125506577413, 'batch_size': 16, 'buffer_size': 10000, 'tau': 0.005, 'train_freq': 4, 'noise_type': 'normal', 'noise_std': 0.7294958607852513, 'net_arch': 'medium', 'activation_fn': 'relu'}. Best is trial 0 with value: -2.50748.


Training reward: -2.50748 +/-0.0 for 6300 steps


[I 2025-04-20 17:16:57,921] Trial 8 pruned. 


Training reward: -9.987779 +/-0.0 for 6300 steps


[I 2025-04-20 17:20:47,920] Trial 9 finished with value: -2.50748 and parameters: {'gamma': 0.98, 'learning_rate': 0.005789174625490958, 'batch_size': 256, 'buffer_size': 1000000, 'tau': 0.05, 'train_freq': 1, 'noise_type': 'normal', 'noise_std': 0.7491674414056981, 'net_arch': 'big', 'activation_fn': 'leaky_relu'}. Best is trial 0 with value: -2.50748.


Training reward: -2.50748 +/-0.0 for 6300 steps
Number of finished trials:  10
Best trial:
  Value:  -2.50748
  Params: 
    gamma: 0.9999
    learning_rate: 0.09129660360504312
    batch_size: 1024
    buffer_size: 1000000
    tau: 0.001
    train_freq: 256
    noise_type: normal
    noise_std: 0.6636636392431936
    net_arch: big
    activation_fn: relu
  User attrs:
Hyper-parameter tuning results will be written to this file: tuning_results.csv
Plot results of the optimization can be found here: model/trained-tuned-models/ddpg/low_moneyness/_tuning_optimization_history.html and model/trained-tuned-models/ddpg/low_moneyness/tuning_param_importance.html
The best hyper-parameters computed have been written to model/trained-tuned-models/ddpg/low_moneyness/tuning_best_values.pkl
End of tunning cycles, the best hyper-parameter model is saved at this folder: model/trained-tuned-models/ddpg/low_moneyness/best_model
Processing time was: 1684.3281 seconds | 28.0721 minutes



