# Meta-Labeling for bet side and size

Implementation of Triple-barrier method for determining side and Meta-labeling for size of the bet. Meta-labeling is a technique introduced by Marco Lopez De Prado in Advances to Financial machine learning.

## Imports

In [22]:
%load_ext autoreload
%autoreload 2

# standard imports
from pathlib import PurePath, Path 
import sys
import time
from collections import OrderedDict as od 
import re 
import os 
import json 

# scientific stack
import pandas as pd 
import numpy as np
import scipy.stats as stats
import statsmodels.api as sm
import math
import mlfinlab as ml

# visual tools and plotting
import matplotlib as mpl
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


ModuleNotFoundError: No module named 'mlfinlab'

## Import Dataset

In [16]:
# read in and store raw tick data in pandas dataframe to be cleaned and transformed
raw_tick_data = pd.read_csv('tick_data.csv')
raw_tick_data['date_time'] = pd.to_datetime(raw_tick_data.date_time)
raw_tick_data.set_index('date_time', drop=True, inplace=True)
print(raw_tick_data)

                            open     high      low    close  cum_vol  \
date_time                                                              
2011-07-31 23:31:58.810  1306.00  1308.75  1301.75  1305.75    53658   
2011-08-01 02:55:17.443  1305.75  1309.50  1304.00  1306.50    53552   
2011-08-01 07:25:56.319  1306.75  1309.75  1304.75  1305.00    53543   
2011-08-01 08:33:10.903  1305.00  1305.00  1299.00  1300.00    53830   
2011-08-01 10:51:41.842  1300.00  1307.75  1299.00  1307.75    53734   
...                          ...      ...      ...      ...      ...   
2012-07-30 12:30:28.642  1379.25  1380.00  1377.50  1377.75    50843   
2012-07-30 13:29:21.258  1377.75  1380.00  1377.00  1379.25    50782   
2012-07-30 13:35:05.407  1379.25  1383.25  1379.00  1382.50    50675   
2012-07-30 13:43:43.711  1382.50  1383.25  1380.00  1381.00    50667   
2012-07-30 13:54:26.158  1380.75  1381.75  1379.75  1380.75    50698   

                          cum_dollar  cum_ticks  
date_time    

## Create Dollar Bars

Here we transform the raw ohlc data into various financial data structures that provide a better representation of the movement of trade information throughout the market.  These bars include: dollar, tick, and volume bars.

## Fitting the Primary Model

This is is the primary model that we will use to drive our strategy (EMA Crossover)

In [17]:
fast_window = 12
slow_window = 26

raw_tick_data['fast_mavg'] = raw_tick_data['close'].rolling(window=fast_window, min_periods=fast_window, center=False).mean()
raw_tick_data['slow_mavg'] = raw_tick_data['close'].rolling(window=slow_window, min_periods=slow_window, center=False).mean()

### Compute side

In [18]:
raw_tick_data['side'] = np.nan
long_signals = raw_tick_data['fast_mavg'] >= raw_tick_data['slow_mavg']
short_signals = raw_tick_data['fast_mavg'] < raw_tick_data['slow_mavg']
raw_tick_data.loc[long_signals, 'side'] = 1
raw_tick_data.loc[short_signals, 'side'] = -1
data = raw_tick_data.dropna()
print(long_signals)

date_time
2011-07-31 23:31:58.810    False
2011-08-01 02:55:17.443    False
2011-08-01 07:25:56.319    False
2011-08-01 08:33:10.903    False
2011-08-01 10:51:41.842    False
                           ...  
2012-07-30 12:30:28.642     True
2012-07-30 13:29:21.258    False
2012-07-30 13:35:05.407    False
2012-07-30 13:43:43.711    False
2012-07-30 13:54:26.158    False
Length: 10000, dtype: bool


In [26]:
# set daily volatility
daily_vol = ml.util.get_daily_vol(close=data['close'], lookback=50)
print(daily_vol)

NameError: name 'ml' is not defined

In [25]:
# convert from daily vol to hourly vol (since our data in hourly)
trading_hours_in_day = 8
trading_days_in_year = 252
hourly_vol = daily_vol / math.sqrt(trading_hours_in_day * trading_days_in_year)
hourly_vol_mean = hourly_vol.mean()
print(hourly_vol)

NameError: name 'daily_vol' is not defined

### Apply CUSUM Filter

The purpose of the CUSUM filter is to locate instances of change detection.  These are points that will later be used as t_events for the Triple-barrier method.

In [27]:
# apply symetric CUSUM filter and get timestamps for events
cusum_events = ml.filters.cusum_filter(data['close'], threshold=hourly_vol_mean * 0.5)
print(cusum_events)

NameError: name 'ml' is not defined

In [28]:
# compute vertical barrier
vertical_barriers = ml.labeling.add_vertical_barrier(t_events=cusum_events, close=data['close'], num_days=1)
print(vertical_barriers)

NameError: name 'ml' is not defined