# Introduction

This notebook explores many of the ideas described by Marcos Lopez de Prado in his book "Advances in Financial Machine Learning." I made use of many of the code snippets provided throughout the book as well as a library called mlfinlab, which has aggregated and expanded on much of the code in the book. I tried to give a brief primer on each concept being utilized from the book for my own reference and anyone interested.

# Data

Lopez de Prado's book works primarily with tick data, which is very expensive to acquire for the most part, however, I was able to freely download Forex tick data from TrueFx.com.

This notebook focuses strictly on tick data for the USD/JPY currency. USD/JPY tick data spanning from 2014-2018 was used for research, and the best performing strategy was backtested on tick data from 2011-2013.

The data includes the timestamp of the tick as well as the bid and ask price. Because Forex does not have a central exchange, volume data for each tick is not a part of the data set. As a result of this, data was sampled in tick bars, every 2800 ticks. 

The tick bars function in mlfinlab was used to sample the data. The function returns a DataFrame with open, high, low, and close prices for each tick. The motivation for using tick bars opposed to time bars is that tick bars exhibit a distribution of returns that is much closer to normal opposed to time bars. This is exhibited in the graph below:

Tick bars also more realistically represent the market as trading frequency varies throughout the day. 

One of the drawbacks to sampling tick bars, however, is that sampling frequency is often very inconsistent throughout time. This is shown below:

The sampling frequency when bars are sampled as a function of the volume traded or the dollar amount traded is much more consistent throughout time, however, both require volume data which was not available in this dataset.

In [None]:
from multiprocessing import cpu_count

import empyrical
import numpy as np
import pandas as pd
from keras import Sequential
from keras import optimizers
from keras.layers import Dense
from keras.layers import Dropout
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.decomposition import PCA
from sklearn.ensemble import BaggingClassifier
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier

from _pipeline_scripts_examples.fml.Cross_Validation import getTrainTimes
from _pipeline_scripts_examples.fml.Fractionally_Differentiated_Features import plotMinFFD
from _pipeline_scripts_examples.fml.Utility_Functions import get_daily_volatility

import mlfinlab_src as mlf



In [None]:
# Reading in dataframe with precomputed features

tick_bars = pd.read_csv('Datas/USDJPY_Tick_Data/Research_Data.csv', index_col=0, parse_dates=True)
ask = tick_bars['ask'].copy()
bid = tick_bars['bid'].copy()

# Feature Engineering

For the purposes of feature engineering as well as labeling, the mid-price of each tick was used, however, when it came to actually calculating the returns based on the predictions of the model, the bid and ask prices were used. 

The initial features included various technical analysis and signal processing features calculated at different time periods. The majority of these features were derived using the library ta-lib. The full list of initial features can be found [here](https://github.com/JackBrady/Financial-Machine-Learning-Research/blob/master/Initial_Feature_List). The inspiration for many of these features comes from their ubiquity in academic research pertaining to machine learning methods in the Forex market.

Additionally, 7 fractionally differentiated features were generated. Fractionally differentiated features are a concept presented by Dr. Lopez de Prado. The idea is that when a price series is differentiated to calculate log returns, we lose all memory of the underlying series in an effort to achieve stationarity. Prices, in contrast to returns have memory, however, are not stationary. Dr. Lopez de Prado proposes a method to difference a price series to achieve stationarity, without fully differencing the series and thereby losing all memory. The motivation being that conserving memory will yield more predictive power. This fractionally differentiated series can then be used as a feature. 

In [None]:
plotMinFFD(tick_bars, 'close')

Here we can see that differencing our USD/JPY price series by 0.2 achieves stationarity, p < 0.05, while still having a very strong correlation to the original price series. The function for plotting the minimum amount of differencing needed to achieve stationarity as well as the functions to actual calculate the fractionally differentiated features are provided by Dr. Lopez de Prado and can be found [here](https://github.com/JackBrady/Financial-Machine-Learning-Research/blob/master/Code/Fractionally_Differentiated_Features.py).

While the initial feature set consisted of 57 features, our final feature set contained 31 features that were selected using two feature selection methods as well as by testing various subsets of features. These feature selection methods are discussed below.

# Feature Selection

Two feature selection methods were used to generate the final set of features. The first method was mean decrease impurity (MDI), a tree based method, which calculates how much each individual feature decreases the overall impurity. The functions to calculate and plot feature importance based on MDI were provided by Dr. Lopez de Prado and were used to generate the figure below. This plot shows the feature importance for the initial set of 57 features. The code for each function can be found [here](https://github.com/JackBrady/Financial-Machine-Learning-Research/blob/master/Code/Feature_Importance.py).


The second method used is an input perturbation feature ranking algorithm demonstrated by Dr. Jeff Heaton in the following video: [link](https://www.youtube.com/watch?v=RVIGVkj5aXo&t=1105s)

The idea behind this method is that a feature's column is shuffled, and the accuracy of the model is then re-evaluated with the shuffled column. The significance of decrease in model accuracy with the shuffled column determines the respective feature's importance.

In [None]:
# Correlation between feature importance methods

pert_rank['importance'].corr(mdi_p['mean'])

The list of the final 31 features used can be found [here](https://github.com/JackBrady/Financial-Machine-Learning-Research/blob/master/Final_Feature_List).

# Downsampling

The tick bars sampled previously were downsampled using the CUSUM filter provided by Dr. Lopez de Prado, which, as stated by Dr. Lopez de Prado,  is "designed to detect a shift in the mean value of a measured quantity away from a target value.”  The intuition is that we want to make a prediction on an observation after a certain threshold is reached opposed to just predicting at a random point in time. In our case a bar was sampled if the cumulative sums of the price differences in either direction surpassed 1/10 the mean daily volatility. The CUSUM filter supplied by mlfinlab was utilized below:

In [None]:
closing = tick_bars['close']
volatility = get_daily_volatility(closing)
times = mlf.filters.cusum_filter(closing, volatility.mean() * .1)

Additionally, the get_daily_volatility function was provided by Dr. Lopez de Prado, and is available [here](https://github.com/JackBrady/Financial-Machine-Learning-Research/blob/master/Code/Utility_Functions.py).

# Labeling

Instead of labeling observations based on the sign of their returns after a given amount of time, observations are given a label based on whether or not they reached reached a return that is based on their respective volatilities. This allows for observations with more risk to have a higher expected return and vice versa. If an observations does not reach its expected return in a predetermined amount of time, it is given a label of 0, representing a return that is too low for us to make a bet on. If it reaches its return in the positive direction (upper barrier), a label of 1 is given, representing a long position. If it reaches its return in the negative direction (lower barrier), a label of -1 is given, representing a short position. 

This concept was presented by Dr. Lopez de Prado, and is known as the triple-barrier method, as we have a vertical (time) barrier and a horizontal barrier for long and short positions.

As stated previously, the mid-price of each tick was used for labeling observations.

mlfinlab was utilized for the following labeling functions. Additionally, Dr. Lopez de Prado provides a multiprocessing engine to speed up computation, which was made us of in the get_events function.

In [None]:
# 1 day was the amount of time set for the vertical barrier
# The minimum return for an observation to be considered was set to 0.004
# The upper and lower barrier were not scaled

vertical_barriers = mlf.labeling.add_vertical_barrier(times, closing, num_days=1)
pt_sl = [1, 1]
min_ret = 0.004
threads = cpu_count() - 1

In [None]:
triple_barrier_events = mlf.labeling.get_events(closing,
                                                times,
                                                pt_sl,
                                                volatility,
                                                min_ret,
                                                threads,
                                                vertical_barriers)

In [None]:
labels_one = mlf.labeling.get_bins(triple_barrier_events, closing)

In [None]:
labels_one['bin'].value_counts()

# Model Architecture/Meta-Labeling

The model architecture used in this project is based on the concept of meta-labeling formulated by Dr. Lopez de Prado. 

The idea behind meta-labeling is that we have a primary and a secondary binary classifier. The primary classifier predicts the side of the bet (-1,1), while the secondary classifier predicts whether or not we want to take the bet (0,1). It has been shown previously how labels are derived for the primary classifier, however, the labels derived for the secondary classifier, i.e. meta-labels, are obtained using a modified version of the get_events function which takes in the side predicted by the primary classifier. 

For example, if our primary classifier predicts a 1 for an observation, but the actual label is a -1, then this observation will receive a meta-label of 0, and the secondary classifier will be trained to predict a 0. Additionally, it can be seen previously that many observations did not reach their target return and thus received a label of 0. If this is the case, these observations will also receive a meta-label of 0. A meta-label of 1 is only given when the primary classifier's prediction is accurate.

The motivation for this technique is that the secondary classifier can learn from the error of the primary classifier, allowing it to act as a filter for bets.

# Training Data

The observations that initially received a label of 0, i.e. they did not reach their target return in time, were not used to train the primary model, however, they were used to train the secondary model. The motivation for this is that there is no point in training the primary model to recognize observations that will ideally receive a prediction of 0 from the secondary model, regardless of the primary model's prediction. The full training process then goes as follows:

The primary model is trained on observations receiving a label of -1 or 1. 
Then, the observations receiving a label of 0 are aggregated with the training data. 
This full set of observations is then passed through the trained primary model and meta-labels are derived on these predictions. 
The secondary model is then trained using the same features as the primary model with the predicted side of the primary model being an additional feature. 

The following code prepares our data for training:

In [None]:
# Full DataFrame including all observations regardless of label

full_df = pd.DataFrame(tick_bars.loc[labels_one['bin'].index], index=labels_one['bin'].index)
full_df.drop(columns=['open', 'high', 'close', 'low', 'bid', 'ask'], inplace=True)
full_df['labels'] = labels_one['bin'].copy()
y = full_df['labels'].copy()
full_df.drop(columns=['labels'], inplace=True)
x = full_df.copy()

In [None]:
X_train_full, X_test_full, y_train_full, y_test_full = train_test_split(x, y, test_size=0.20,
                                                                        shuffle=False)

In [None]:
# Start and end times for an observation

t1 = triple_barrier_events['t1'].copy()

In [None]:
#training observations purged. This will be discussed in the next section

train_i = t1.loc[X_train_full.index]
test_i = t1.loc[X_test_full.index]
train_times = getTrainTimes(train_i, test_i)

It should be noted that while the test data for the primary model does not contain any observations with a label of 0, the test data being used for the secondary model is an aggregation of all test data regardless of label. Additionally, if the training set was transformed in any way, the same scaling was applied to the testing set.

In [None]:
X_train_full = X_train_full.loc[train_times.index]
y_train_full = y_train_full.loc[train_times.index]

# Getting dataframe for observations with labels -1 and 1

X_train = X_train_full[y_train_full != 0].copy()

# Getting a dataframe for observations with label 0

X_train_addit = X_train_full[y_train_full == 0].copy()

y_train = y_train_full.loc[X_train.index].copy()

X_test = X_test_full[y_test_full != 0].copy()

y_test = y_test_full.loc[X_test.index].copy()

X = pd.concat([X_train, X_test])
y = pd.concat([y_train, y_test])

In [None]:
#Standardizing the data

scaler = StandardScaler()
new_training = scaler.fit_transform(X_train)

new_testing = scaler.transform(X_test)

new_testing_full = scaler.transform(X_test_full)

X_train_stand = pd.DataFrame(new_training, index=X_train.index)
X_test_stand = pd.DataFrame(new_testing, index=X_test.index)
X_test_stand_full = pd.DataFrame(new_testing_full, index=X_test_full.index)

scaler = StandardScaler()
full_stand = scaler.fit_transform(X)

In [None]:
'''
This standardizes the observations with a label of 0 separately from the primary model's training set. 
These observations will not be used for training the primary model.
They will be aggregated with the original training set post-training and the full output will be fed to the
secondary model.
'''

scaler_addit = StandardScaler()
new_training_addit = scaler_addit.fit_transform(X_train_addit)
X_train_addit_stand = pd.DataFrame(new_training_addit, index=X_train_addit.index)
X_train_stand_full = pd.concat([X_train_stand, X_train_addit_stand])

# Sample Weights

Observations are weighted as a function of their respective uniqueness. An observation is deemed completely unique if the time interval used to generate its label has no overlap with the time interval used to generate the label of another observation. The more overlap an observation has and the longer those overlaps last, the less unique the observation is. Additionally, a linear time-decay is applied to the sample weights, giving older observations less importance. This method for calculating sample weights was demonstrated by Dr. Lopez de Prado as well as the method for calculating the uniqueness of an observation. 



The following function calls many of the functions provided by Dr. Lopez de Prado and returns sample weights 
as well as the average uniqueness across all observations. It also makes use of the multiprocessing engine mentioned previously. Code for these functions can be found: [here](https://github.com/JackBrady/Financial-Machine-Learning-Research/blob/master/Code/Sample_Weights.py) 
                            


In [None]:
training_weights = get_weights_and_avgu(closing, X_train, threads, t1)[1]

In [None]:
avgu, sample_weights = get_weights_and_avgu(closing, X, threads, t1)

In the previous section, observations were removed from the training set if the time interval used for generating their labels had any overlap with the observations in the testing set. This technique, proposed by Dr. Lopez de Prado, is known as "purging" and aims at removing any sort of leakage between the train and test set. Code provided by Dr. Lopez de Prado for purging the training set can be found: [here](https://github.com/JackBrady/Financial-Machine-Learning-Research/blob/master/Code/Cross_Validation.py)

# Model Selection

Three different models were used for the primary classifier in this research:
<br>
The first model was a bagged decision tree.
<br>
The second was a bagged SVM.
<br>
The third was a MLP.

The reason I used bagged decision trees opposed to sklearn's random forest classifier was due to severe overfitting issues I was having with the traditional random forest model. This is very likely a result of observational redundancy in our training set, i.e. many overlapping outcomes. To mitigate the issue of redundancy, Dr. Lopez de Prado recommended using a bagging classifier and setting the max_samples parameter to the average uniqueness of our observations, which helps prevent the creation of many redundant trees. While the bagging classifier supports the max_samples parameter, it is not supported in sklearn's random forest classifier.

The initial plan was to use these three models as both primary and secondary classifiers and to test every combination of models yielding 9 potential combinations. As secondary classifiers, however, the bagged SVM and MLP were severely overpredicting the majority class even after adding class weights, upsampling the minority class, and attempting to tune hyperparameters using f1 as the scoring metric. Because of these issues, the bagged decision tree was the only model used as a secondary classifier.

# Feature Extraction

PCA was the only feature extraction technique used and showed improvement in accuracy for the primary bagged decision tree classifier. A function to calculate the the minimum number of orthogonal features which account for 95% of the variance of the standardized data was provided by Dr. Lopez de Prado. This function was utilized to calculate the number of principal components to be used, which was found to be 7. Code for this function can be found [here](https://github.com/JackBrady/Financial-Machine-Learning-Research/blob/master/Code/Feature_Extraction.py).

In [None]:
pc = orthoFeats(X_train)
num_feat = pc.shape[1]

In [None]:
scaler = StandardScaler()
new_training = scaler.fit_transform(X_train)

new_testing = scaler.transform(X_test)
new_testing_full = scaler.transform(X_test_full)

pca = PCA(n_components=num_feat)
X_train_pca = pca.fit_transform(new_training)
X_test_pca = pca.transform(new_testing)
X_test_full_pca = pca.transform(new_testing_full)

pca_addit = PCA(n_components=num_feat)
X_train_addit_pca = pca_addit.fit_transform(X_train_addit)

X_train_pca_df = pd.DataFrame(X_train_pca, index=X_train.index)
X_train_addit_pca_df = pd.DataFrame(X_train_addit_pca, index=X_train_addit.index)

X_train_full_pca = pd.concat([X_train_pca_df, X_train_addit_pca_df])
X_test_full_pca_df = pd.DataFrame(X_test_full_pca, index=X_test_full.index)

scaler_full = StandardScaler()
full_scaled = scaler.fit_transform(X)
full = PCA(n_components=num_feat)
full_pca = full.fit_transform(full_scaled)

# Primary Model Training/Hyperparameter Tuning

In [None]:
dt = DecisionTreeClassifier(criterion='entropy', max_features='auto',
                            class_weight='balanced', min_weight_fraction_leaf=0.05, random_state=20)

bagged_dt = BaggingClassifier(base_estimator=dt, n_estimators=1000, max_samples=avgu,
                              max_features=1., random_state=20)


In [None]:
bagged_dt.fit(X_train_pca, y_train, sample_weight=training_weights)

The default parameters for C and gamma were used.

In [None]:
svc = SVC(probability=True, gamma='auto', random_state=20)

SVC_bagged = BaggingClassifier(base_estimator=svc, n_estimators=1000, max_samples=avgu,
                               random_state=20, max_features=1.)

In [None]:
SVC_bagged.fit(X_train_stand, y_train, sample_weight=training_weights)

In [None]:
def build_mlp(input_size, lr, classes):
    from numpy.random import seed
    seed(0)
    from tensorflow import set_random_seed
    set_random_seed(0)

    model = Sequential()
    model.add(Dense(10, input_dim=input_size, activation='relu'))
    model.add(Dropout(0.2))
    model.add(Dense(10))
    model.add(Dropout(0.2))
    model.add(Dense(classes, activation='sigmoid'))
    optimizer = optimizers.adam(lr)
    model.compile(loss='binary_crossentropy', optimizer=optimizer)
    return model


MLP = KerasClassifier(build_fn=build_mlp, input_size=X_train_stand.shape[1], classes=2)

# Hyperparameter Tuning (Grid Search)

As stated previously, observations from the training set that overlap the testing set must be purged. This is also necessary in k-fold cross validation, where we have potential overlap on both sides of the test set. The purged k-fold class, provided by Dr. Lopez Prado [link](https://github.com/JackBrady/Financial-Machine-Learning-Research/blob/master/Code/Cross_Validation.py), is implemented in the hyperparameter tuning function below. This function makes use of sklearn's GridSearchCV, however, instead of traditional k-fold cross validation, purged k-fold cross validation is used.


In [None]:
param_grid = {'lr': [0.00001, 0.0001, 0.001, 0.01, .1]}

t1cv = t1.loc[X_train_stand.index].copy()
tuned_mlp = clfHyperFitnn(X_train_stand, y_train, t1cv, mlp, param_grid, training_weights, cv=5)
tuned_mlp = tuned_mlp.best_estimator_

In [None]:
tuned_MLP.sk_params

The hyperparameter tuning function used previously to tune the learning rate is a modified version of Dr. Lopez de Prado's and can be found [here](https://github.com/JackBrady/Financial-Machine-Learning-Research/blob/master/Code/Hyperparameter_Tuning.py).

# Primary Model Results

The results from the test set for each primary classifier are aggregated in the DataFrame below, however, if one wishes to view the code and output for the classification reports, ROC curve, and CV score for each model, it is also available below.

In [None]:
primary_res_df

# Primary Classifier: Bagged Decision Trees

# Training Data

In [None]:
y_pred = bagged_dt.predict(X_train_pca)
print(classification_report(y_train, y_pred))

# Test Data

In [None]:
y_pred2 = bagged_dt.predict(X_test_pca)
print(classification_report(y_test, y_pred2))

# Mean CV Score on Full Data Set 
(Metric: Accuracy)

The cvScore function was provided by Dr. Lopez de Prado and also makes use of the purged k-fold class. Code for this function can be found [here](https://github.com/JackBrady/Financial-Machine-Learning-Research/blob/master/Code/Cross_Validation.py).

In [None]:
y = y.loc[X.index].copy()
sample_weights = get_weights_and_avgu(closing, X, threads, t1)[1]
vert_barr = t1.loc[X.index].copy()
full_pca_df = pd.DataFrame(full_pca, index=X.index)

scores = cvScore(bagged_dt, full_pca_df, y, sample_weights, scoring='accuracy',
                 t1=vert_barr, cv=10, pctEmbargo=0.01)

In [None]:
scores.mean()

# Primary Classifier: Bagged SVM

# Training Data

In [None]:
y_pred = SVC_bagged.predict(X_train_stand)
print(classification_report(y_train, y_pred))

# Test Data

In [None]:
y_pred = SVC_bagged.predict(X_test_stand)
print(classification_report(y_test, y_pred))

# Mean CV Score on Full Data Set 
(Metric: Accuracy)

In [None]:
full_stand_df = pd.DataFrame(full_stand, index=X.index)

scores = cvScore(SVC_bagged, full_stand_df, y, sample_weights, scoring='accuracy',
                 t1=vert_barr, cv=10, pctEmbargo=0.01)

In [None]:
scores.mean()

# Primary Classifier: MLP

# Training Data

In [None]:
y_pred = tuned_MLP.predict(X_train_stand)
print(classification_report(y_train, y_pred))

# Test Data

In [None]:
y_pred = tuned_MLP.predict(X_test_stand)
print(classification_report(y_test, y_pred))

# Mean CV Score on Full Data Set 
(Metric: Accuracy)

In [None]:
scores = cvScorenn(tuned_MLP, full_stand_df, y, sample_weights, 200, scoring='accuracy',
                   t1=vert_barr, cv=10, pctEmbargo=0.01)

In [None]:
scores.mean()

# Primary/Secondary Model Training Data

The code for preparing data for the secondary model is shown below. The get_events function used previously is used once more, this time taking in the side predicted by the primary model in order to generate the meta-labels. Additionally, the prediction from the primary model is used as a feature for the secondary model. The following code is for the MLP as the primary classifier, however, the process is the same regardless of which model gave the predictions.

In [None]:
y_pred_full = tuned_MLP.predict(X_train_stand_full)
primary_l = y_pred_full.copy()

In [None]:
y_pred_full_test = tuned_MLP.predict(X_test_stand_full)
primary_l_test = y_pred_full_test.copy()

In [None]:
side_train = pd.DataFrame(primary_l.copy(), index=X_train_stand_full.index)
side_test = pd.DataFrame(primary_l_test.copy(), index=X_test_stand_full.index)
side = pd.concat([side_train, side_test])
side.sort_index(inplace=True)

In [None]:
times = side.index
vertical_barriers = vertical_barriers.loc[side.index]
pt_sl = [1, 1]
min_ret = 0.004
threads = cpu_count() - 1

In [None]:
triple_barrier_events = mlf.labeling.get_events(closing,
                                                times,
                                                pt_sl,
                                                volatility,
                                                min_ret,
                                                threads,
                                                vertical_barriers,
                                                side[0])

In [None]:
labels = mlf.labeling.get_bins(triple_barrier_events, closing)

In [None]:
labels['bin'].value_counts()

In [None]:
t1 = triple_barrier_events['t1'].copy()

In [None]:
new_y_train = labels['bin'].loc[X_train_full_pca.index].copy()
new_x_train = tick_bars.loc[X_train_full_pca.index].copy()

new_y_test = labels['bin'].loc[X_test_full_pca.index].copy()
new_x_test = tick_bars.loc[X_test_full_pca.index].copy()

In [None]:
new_x_train.dropna(inplace=True)
new_x_train.drop(columns=['open', 'high', 'close', 'low', 'volume', 'bid', 'ask'], inplace=True)

new_x_test.dropna(inplace=True)
new_x_test.drop(columns=['open', 'high', 'close', 'low', 'volume', 'bid', 'ask'], inplace=True)

In [None]:
new_x_train['predicted_side'] = side[0].loc[new_x_train.index].copy()
new_x_test['predicted_side'] = side[0].loc[new_x_test.index].copy()

In [None]:
new_x_train.sort_index(inplace=True)
new_y_train.sort_index(inplace=True)
new_x_test.sort_index(inplace=True)
new_y_test.sort_index(inplace=True)

In [None]:
full_new_x = pd.concat([new_x_train, new_x_test]).copy()
full_new_y = pd.concat([new_y_train, new_y_test]).copy()

In [None]:
avgmu, training_weights_meta = get_weights_and_avgu(closing, new_x_train, threads, t1)

In [None]:
sample_weights_meta = get_weights_and_avgu(closing, full_new_x, threads, t1)[1]

# Primary/Secondary Model Training

In [None]:
dt_meta = DecisionTreeClassifier(criterion='entropy', max_features='auto',
                                 class_weight='balanced', min_weight_fraction_leaf=0.05, random_state=20)

bagged_dt_meta = BaggingClassifier(base_estimator=dt_meta, n_estimators=1000,
                                   max_samples=avgmu, max_features=1., random_state=20)

bagged_dt_meta.fit(new_x_train, new_y_train, sample_weight=training_weights_meta)

# Primary/Secondary Model Results

The aggregated results from the test set for the three model combinations can be seen in the following DataFrame. The code and output for the results are also available below.

In [None]:
secondary_res_df

# Performance and Risk Metrics

The metrics used to evaluate the success and risk of a model were: the Sharpe Ratio, cumulative returns, max drawdown, and percentage normalized profit (PNP). The function I wrote for PNP is inspired by the following paper: Baasheer and Fakhr [2011] [link](http://www.wseas.us/e-library/conferences/2011/Penang/ACRE/ACRE-05.pdf), and calculates the percentage of return we achieved out of the total return we could have achieved, had we predicted everything correctly in the test set. 

It should be stated that PNP is not a perfect metric for evaluating our model's profit versus the ideal profit because a bet that didn't reach its return should be classified as a 0, however, let's say that our secondary model misclassifies the bet as a 1 and the actual sign of the return was classified correctly by our primary model. This would present a situation where misclassification gained some profit that the ideal model could not have also gained. Regardless, PNP is still a useful metric in evaluating the profit of the secondary model given the primary model's predictions.

The code for the PNP function can be found [here](https://github.com/JackBrady/Financial-Machine-Learning-Research/blob/master/Code/Backtest.py). Note that the returns calculated for this function are non-cumulative.

Additionally, the function for calculating returns using the bid and ask prices can be found in the previous link.


The DataFrame for our performance and risk metrics can be seen below. Additionally, it must be noted that our returns do exhibit significant serial correlation. I partitioned the returns series into less correlated subsets, but did not notice any significant change in the Sharpe Ratio estimate when calculated on these subsets. It still must be noted, however, that the Sharpe Ratio will be inflated as a result of this serial correlation.

In [None]:
performance_risk_met_df

# Primary Classifier: Bagged Decision Trees
# Secondary Classifier: Bagged Decision Trees


# Training Data

In [None]:
y_pred = bagged_dt_meta.predict(new_x_train)
print(classification_report(new_y_train, y_pred))

# Test Data


In [None]:
y_pred = bagged_dt_meta.predict(new_x_test)
print(classification_report(new_y_test, y_pred))
filt = y_pred.copy()

# Mean CV Score on Full Data Set 
(Metric: Accuracy)

In [None]:
vert_barr = t1.loc[full_new_x.index].copy()
scores = cvScore(bagged_dt_meta, full_new_x, full_new_y, sample_weights_meta, scoring='accuracy',
                 t1=vert_barr, cv=10, pctEmbargo=0.01)

In [None]:
scores.mean()

# Performance and Risk Metrics

In [None]:
percentage_normalized_profit(new_y_test, new_x_test, labels_one, side, t1, bid, ask, filt)

In [None]:
bag_bag_ret = returns_series(new_x_test, side, t1, bid, ask, filt)

In [None]:
np.around(empyrical.stats.cum_returns(bag_bag_ret).iloc[-1] * 100, 2)

In [None]:
np.around(empyrical.stats.sharpe_ratio(bag_bag_ret), 2)

In [None]:
np.around(empyrical.stats.max_drawdown(bag_bag_ret) * 100, 2)

# Primary Classifier: Bagged SVM
# Secondary Classifier: Bagged Decision Trees


# Training Data

In [None]:
y_pred = bagged_dt_meta.predict(new_x_train)
print(classification_report(new_y_train, y_pred))

# Test Data


In [None]:
y_pred = bagged_dt_meta.predict(new_x_test)
print(classification_report(new_y_test, y_pred))
filt = y_pred.copy()

# Mean CV Score on Full Data Set 
(Metric: Accuracy)

In [None]:
scores = cvScore(bagged_dt_meta, full_new_x, full_new_y, sample_weights_meta, scoring='accuracy',
                 t1=vert_barr, cv=10, pctEmbargo=0.01)

In [None]:
scores.mean()

# Performance and Risk Metrics

In [None]:
percentage_normalized_profit(new_y_test, new_x_test, labels_one, side, t1, bid, ask, filt)

In [None]:
SVM_bag_ret = returns_series(new_x_test, side, t1, bid, ask, filt)

In [None]:
np.around(empyrical.stats.cum_returns(SVM_bag_ret).iloc[-1] * 100, 2)

In [None]:
np.around(empyrical.stats.sharpe_ratio(SVM_bag_ret), 2)

In [None]:
np.around(empyrical.stats.max_drawdown(SVM_bag_ret), 2)

# Primary Classifier: MLP
# Secondary Classifier: Bagged Decision Trees

# Training Data

In [None]:
y_pred = bagged_dt_meta.predict(new_x_train)
print(classification_report(new_y_train, y_pred))

# Test Data


In [None]:
y_pred = bagged_dt_meta.predict(new_x_test)
print(classification_report(new_y_test, y_pred))
filt = y_pred.copy()

# Mean CV Score on Full Data Set 
(Metric: Accuracy)

In [None]:
scores = cvScore(bagged_dt_meta, full_new_x, full_new_y, sample_weights_meta, scoring='accuracy',
                 t1=vert_barr, cv=10, pctEmbargo=0.01)

In [None]:
scores.mean()

# Performance and Risk Metrics

In [None]:
percentage_normalized_profit(new_y_test, new_x_test, labels_one, side, t1, bid, ask, filt)

In [None]:
MLP_bag_ret = returns_series(new_x_test, side, t1, bid, ask, filt)

In [None]:
np.around(empyrical.stats.cum_returns(MLP_bag_ret).iloc[-1] * 100, 2)

In [None]:
np.around(empyrical.stats.sharpe_ratio(MLP_bag_ret), 2)

In [None]:
np.around(empyrical.stats.max_drawdown(MLP_bag_ret) * 100, 2)

# Backtest

The SVM/Bagged Decision Tree model achieved the best performance out of the 3 models on the test dataset and will be backtested on 2011-2013 USD/JPY tick data. The exact same features as well as procedures that were used throughout this research will be used during backtesting.

Backtesting will be conducted using the cross-validation method. We will use 10 folds, giving us 10 different outcomes to compare. Backtesting using cross-validation has advantages over the traditional walk-forward method as we are not just evaluating our strategy using one scenario, but instead k scenarios. The purged k-fold class mentioned previously will be used to prevent data leakage. The cumulative returns, Sharpe Ratio, and max drawdown will be calculated for each fold and displayed in a DataFrame below. The code for the backtesting function used below can be found [here](https://github.com/JackBrady/Financial-Machine-Learning-Research/blob/master/Code/Backtest.py). Please note this function is not very elegant as it essentially runs all the previous code at each iteration/fold.

In [2]:
from _pipeline_scripts_examples.fml.Backtest import backtest_cv

backtest_df = backtest_cv('Datas/USDJPY_Tick_Data/Research_Data.csv')

ImportError: cannot import name 'filters' from 'mlfinlab_src.filters.filters' (/home/ruben/PycharmProjects/mini_Genie_ML/Legendary_Genie/mlfinlab_src/filters/filters.py)

In [None]:
backtest_df