# 8. Model evaluation

The models' performance will be tested using out-of-sample data. With the predictions at hand, we will build the trading system.

A long position will be taken if the model's average is positive and short if it is negative.

In [52]:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt

In [53]:
preds_output = pd.read_csv('data/oos_preds.csv', index_col=0, parse_dates=True)
complete_data = pd.read_csv('data/ohlcv.csv', index_col=0, parse_dates=True)

In [54]:
roll_means = {'cuberoot_all': 4, 'arsinh_all': 5, 'none_all': 2, 'bycluster_cuberoot': 1, 'bycluster_arsinh': 1, 'bycluster_none': 1}
for key, value in roll_means.items():
    preds_output.loc[:, key] = preds_output.loc[:, key].rolling(value).mean()

In [55]:
preds_output = pd.concat([preds_output, complete_data.loc[:, 'Close'].pct_change().shift(-1)], axis=1)
preds_output.dropna(inplace=True)

In [56]:
strat_cols = [col for col in preds_output.columns if col != 'Close']
capital = {}
trading_fee = 0.01/100*2 # Limit order fees in Bybit, a cryptocurrency exchange
for col in strat_cols:
    long_short = preds_output.loc[:, col] < 0
    pos_change = long_short.diff().fillna(False)
    rets = preds_output.loc[:, 'Close'].values.copy()
    rets[long_short] = rets[long_short] * -1
    rets[pos_change] = rets[pos_change] - trading_fee 
    capital[col] = pd.Series(data=np.cumprod(1 + rets), index=preds_output.index)

In [57]:
capital_df = pd.concat(capital.values(), axis=1)
capital_df.columns = capital.keys()

btc_base1 = pd.Series(data=np.cumprod(1 + complete_data.loc[capital_df.index[0]:, 'Close'].pct_change()), index=capital_df.index)
btc_base1.name = 'BTC'

capital_df = pd.concat([capital_df, btc_base1], axis=1).dropna()

In [58]:
capital_df.pct_change(len(capital_df)-1).dropna().T

Unnamed: 0,2022-02-19 19:00:00
none_all,7.326018
arsinh_all,6.844284
cuberoot_all,5.383571
bycluster_none,2.829433
bycluster_arsinh,1.509206
bycluster_cuberoot,0.621468
BTC,4.345851


# 9. Conclusions and next steps

Conclusions:

1. Creating a profitable BTC trading strategy using machine learning models is possible. Even though forecasting returns is an extremely difficult task given their noisy nature, we can do well enough to generate a positive capital return.
2. The idea of clustering market structure and creating separate models for each structure did not hold to be good. While all non-cluster models outperformed a buy-and-hold strategy for the tested period, every cluster strategy underperformed. This is probably explained due to the lack of data for some structures. Trying other clustering algorithms might be a good idea as KMeans has its own limitations.
3. Given the random nature of tree models, it is recommended to run several tests with different random_states and average the predictions of all to get a better estimate of oos performance. However, it is safe to say that during the tested timeperiod non-cluster models performed better than a buy-and-hold strategy.

Next steps:

1. Productionize the solution
2. Test RNN and Conv1D models and compare results with existing solution
3. Try different clustering algorithms to cluster market structure
4. Gather more data from different sources (derivative markets data, liquidity data, etc.)
5. Try to implement reinforcement learning to teach an agent how to trade using these predictions
6. Extend the project to other tradeable assets
7. Upsample market structure data to increase the dataset size for low density clusters
8. Use stacked autoencoders (with some restriction) to reduce the dimensionality of the dataset and test if it improves performance
9. Frame the problem as a classification one and predict the positions rather than the expected return
10. Denoise the target variable
11. Use other assets' trading data to fit the models and test if this improves performance
12. Train a model to detect outliers and freeze the strategy whenever new outliers come along (check if this improves performance)
13. Build an sklearn pipeline with all feature engineering operations and optimize parameters using ts cross-validation