# Hidden Dataset Notebook

Welcome Eneco! We are excited to show the performance of our models on the hidden dataset

All our models are MLP-LSTM Recurrent PPO models that have been trained with the stable_baselines 3 library using our self-made trading gym

All our plots and tables are saved automatically in the /output folder. Please send these results to us so we can see the performance too.

Alternatively you can do a pull request so we can obtain the results

## 0. Import and set up

### 0.1 Install dependencies

In [1]:
%pip install -r requirements.txt

^C
Note: you may need to restart the kernel to use updated packages.


### 0.2 Import packages

In [2]:
import os
import pandas as pd
import matplotlib.pyplot as plt
import dill as pickle

from trademodels.dataclasses import OrderType, RLResults

ModuleNotFoundError: No module named 'matplotlib'

Obtaining file:///C:/Users/pelpi/Documents/Repositories/trademodels (from -r requirements.txt (line 1))
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting dill==0.3.8 (from -r requirements.txt (line 3))
  Downloading dill-0.3.8-py3-none-any.whl (116 kB)
                                              0.0/116.3 kB ? eta -:--:--
     ------------------------------------   112.6/116.3 kB 3.3 MB/s eta 0:00:01
     -------------------------------------- 116.3/116.3 kB 2.3 MB/s eta 0:00:00
Collecting matplotlib==3.8.2 (from -r requirements.txt (line 5))
  Downloading matplotlib-3.8.2-cp311-cp311-win_amd64.whl (7.6 MB)
                                              0.0/7.6 MB ? eta -:--:--
     -                                        0.2/7.6 MB 5.9 MB/s eta 0:00:02
     --                                       0.5/7.6 MB 7.0 MB/s eta 0:00:02
     ---                                      0.7/7.6 MB 7.3 MB/s eta 0:00:01
     ---       

ERROR: Could not install packages due to an OSError: [WinError 5] Access is denied: 'C:\\Users\\pelpi\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\~andas.libs\\msvcp140-59fdf63e48138046aebeb6ddb5b4e960.dll'
Consider using the `--user` option or check the permissions.


[notice] A new release of pip is available: 23.1.2 -> 24.0
[notice] To update, run: python.exe -m pip install --upgrade pip


### 0.3 Create the output directories

In [None]:
output_dir = "output"

# Create the output directory if it doesn't exist
if not os.path.exists(output_dir):
    os.makedirs(output_dir)

result_dir = os.path.join(output_dir, "results")
plots_dir = os.path.join(output_dir, "plots")
agg_dir = os.path.join(plots_dir, "aggresiveness")
vol_dir = os.path.join(plots_dir, "volume")
pnl_dir = os.path.join(plots_dir, "pnl")

# List of directory paths to create
directories = [
    result_dir,
    plots_dir,
    agg_dir,
    vol_dir,
    pnl_dir
]

# Create the directories if they don't exist
for directory in directories:
    if not os.path.exists(directory):
        os.makedirs(directory)


## 1. Set up the data

You can store the hidden dataset in the /data directory

### 1.1 Load the local data

In [None]:
# Get the current directory of the notebook
notebook_dir = os.getcwd()

# Construct the path to the data file using a relative path
data_file_name = "" ## FIll in accordingly
data_file_path = os.path.join(notebook_dir, 'data', data_file_name)

# Load the data
raw_data = pd.read_csv(data_file_path)

raw_data.plot(x='from_timestamp', y='open')
raw_data.head()

### 1.2 Preprocess the data

In [None]:
from trademodels.dataclasses import ProcessedData

data = ProcessedData.from_raw_data(raw_data)

## 2. Set up our models

We have three models:
- One that buys every interval
- One that sells every interval
- One that chooses whether to buy or sell every interval

### 2.1 Set up model parameters

In [None]:
volume = 20
num_lags = 20
interval_length = 5
cancel_after_one_minute = True
holdable = False

### 2.2 Load the models

In [None]:
from trademodels.model_results.reinforcement_result import RLModelResultAlt

path_to_buy_model = os.path.join("models", "buy_final_model")
path_to_sell_model = os.path.join("models", "sell_final_model")
path_to_both_model = os.path.join("models", "both_final_model")

buy_res = RLModelResultAlt(path_to_buy_model, num_lags, interval_length, volume, always_buy = True, cancel_after_one_minute=cancel_after_one_minute, holdable=holdable)
sell_res = RLModelResultAlt(path_to_sell_model, num_lags, interval_length, volume, always_sell = True, cancel_after_one_minute=cancel_after_one_minute, holdable=holdable)
both_res = RLModelResultAlt(path_to_both_model, num_lags, interval_length, volume, cancel_after_one_minute=cancel_after_one_minute, holdable=holdable)

## 3. Obtain the orders put out by the models

### 3.1 Go over the hidden dataset and get the results

The results of our model are stored in a custom NamedTuple with these fields

| Type              | Description                                                |
|-------------------|------------------------------------------------------------|
| **inserted_orders**        | Orders put out by the model                                 |
| **executed_limit_orders**          | Limit orders that were executed               |
| **signal_rewards**     | Signal rewards that the model would have gotten if trained                                    |
| **execution_rewards**       | Execution rewards that the model would have gotten if trained                            |
| **execution_rewards_bm** | Execution rewards that the benchmark model would have gotten if trained                            |
| **benchmark_orders**  | Orders put out by the benchmark model                            |
| **times**| List of all the times that were a step                          |

WARNING: THIS CAN TAKE SOME TIME (~30 minutes with 30 % of the original data on an AMD 4700U)

In [None]:
buy_result: RLResults = buy_res.get_output(data)
sell_result: RLResults = sell_res.get_output(data)
both_result: RLResults = both_res.get_output(data)

### 3.2 Store the results in a dictionary

In [None]:
results = {"buy_model": buy_result,
            "sell_model": sell_result,
            "both_model": both_result
           }

### 3.3 Save the results

In [None]:
# Store the results as pickles
for name, result in results.items():
    saving_path = os.path.join(result_dir, name)

    with open(f"{saving_path}.pkl", "wb") as f:
        pickle.dump(result, f)

### 3.4 (Optional) Load the results

In [None]:
names = ["buy_model", "sell_model", "both_model"]

# Dictionary to store loaded results
results = {}

# Load the pickled data
for name in names:
    loading_path = os.path.join(result_dir, name)

    with open(f"{loading_path}.pkl", "rb") as f:
        loaded_result = pickle.load(f)

    results[name] = loaded_result

## 4. Evaluate the execution performance

### 4.1 Execution reward

There are a lot of intervals where the price doesnt move, so we remove the intervals where the reward is 0

In [None]:
threshold = -1 #

for name, result in results.items():
    print(name)

    rewards_model = [reward for reward in result.execution_rewards if reward > threshold]
    rewards_bm = [reward for reward in result.execution_rewards_bm if reward > threshold]

    model_descr = pd.Series(rewards_model).describe()
    bm_descr = pd.Series(rewards_bm).describe()

    data_to_put_in_df = {"Model": model_descr,
            "Benchmark": bm_descr}

    df = pd.DataFrame()

    print(df.assign(**data_to_put_in_df))
    print("\n")

### 4.2 Aggresiveness

The aggresivness is the amount that orders are put in lower than the best ask for buying, and higher than the best bid for selling

In [None]:
from trademodels.utils import get_aggresiveness_from_trades, plot_aggresiveness_over_time

agg_dict = {}

for name, result in results.items():
    output = get_aggresiveness_from_trades(result.inserted_orders, data)

    times = [x for x, _ in output]
    agg = [x for _, x in output]

    agg_dict[name] = agg

#### 4.2.1 Histogram

In [None]:
for name, agg in agg_dict.items():
    print(name)

    plt.hist(agg, bins=50);
    plt.xlim((0,5))
    plt.xlabel("Aggresiveness (Euros)")
    plt.ylabel("Frequency")

    save_path = os.path.join(agg_dir, f"{name}_hist")
    plt.savefig(save_path)

    plt.show()

#### 4.2.2 Over time

WARNING: THIS CAN TAKE SOME TIME (~5 minutes with 30% of the original data on an AMD 4700U)

In [None]:
window_size = 2000

for name, agg in agg_dict.items():
    print(name)

    save_path = os.path.join(agg_dir, f"{name}_time")
    plot_aggresiveness_over_time(times, agg, window_size, save_path)

### 4.3 Volume

This is the volume of the trades that are put out by the models

### 4.3.1 Volume of executed limit orders

In [None]:
for name, result in results.items():
    print(name)

    executed_orders_volume = [order.volume for order in result.executed_limit_orders]

    plt.hist(executed_orders_volume, bins=20)
    plt.xlabel("Volume")
    plt.ylabel("Freqency")

    save_path = os.path.join(vol_dir, f"{name}_limit")
    plt.savefig(save_path)

    plt.show()

### 4.3.2 Volume of market orders

In [None]:
for name, result in results.items():
    print(name)

    market_orders_volume = [order.volume for order in result.inserted_orders
                              if str(order.type) == str(OrderType.MARKET_ORDER)]

    plt.hist(market_orders_volume, bins=20)
    plt.xlabel("Volume")
    plt.ylabel("Freqency")

    save_path = os.path.join(vol_dir, f"{name}_market")
    plt.savefig(save_path)

    plt.show()

### 4.4 PnL

The PnL indicates the trading performance of the models, the position is liquidated after every interval in our setup

In [None]:
from trademodels.strategy import StrategyEvaluatorRL

times = data["to_timestamp"]

for name, result in results.items():
    print(name)

    interval_starts = [order.time - pd.Timedelta(minutes=1) for order in result.benchmark_orders ]

    evaluator = StrategyEvaluatorRL(data, interval_starts, result.inserted_orders, interval_length, cancel_after_one_minute)
    evaluator_bm = StrategyEvaluatorRL(data, interval_starts, result.benchmark_orders, interval_length, cancel_after_one_minute)

    pnl_result = evaluator.evaluate(liquidate_after_interval=True)
    pnl_result_bm = evaluator_bm.evaluate(liquidate_after_interval=True)

    plt.plot(times, pnl_result["Portfolio_values"], label='Model')
    plt.plot(times, pnl_result_bm["Portfolio_values"], label='Benchmark')
    plt.xlabel('Time')
    plt.ylabel('Portfolio value')
    plt.xticks(rotation=45)
    plt.tight_layout()
    plt.legend()

    save_path = os.path.join(pnl_dir, f"{name}")
    plt.savefig(save_path)

    plt.show()