## Labeling: Fixed Horizon

Originally described in the book Advances in Financial Machine Learning, Chapter 3.2, pp.43-44, by Marcos Lopez de Prado

Work [__"Classification-based Financial Markets Prediction using Deep Neural Networks"__](https://arxiv.org/abs/1603.08604) by _Dixon et al._ (2016) describes how
labeling data this way can be used in training deep neural networks to predict price movements." 

## Introduction

Fixed Horizon is a labelling technique in which time-indexed data is labeled according to whether it exceeds, falls in between, or is less than a threshold. This method is most commonly used with time bars, but also be applied to any time-indexed data such as dollar or volume bars. The subsequent labelled data can then be used as training and test data for ML algorithms.

## How it works

Fixed time horizon is a common method used in labeling financial data, usually applied on time bars. The forward rate of return relative
to $t_0$ over time horizon $h$ is calculated as follows:


   $$r_{t0,t1} = \frac{p_{t1}}{p_{t0}} - 1$$

Where $t_1 = t_0 + h$ is the time bar index after a fixed horizon of $h$ ticks have passed, and $p_{t0}, p_{t1}$
are prices at times $t_0, t_1$. This method assigns a label based on comparison of rate of return to a threshold $\tau$:

$$
     \begin{equation}
     \begin{split}
       L_{t0, t1} = \begin{cases}
       -1 &\ \text{if} \ \ r_{t0, t1} < -\tau\\
       0 &\ \text{if} \ \ -\tau \leq r_{t0, t1} \leq \tau\\
       1 &\ \text{if} \ \ r_{t0, t1} > \tau
       \end{cases}
     \end{split}
     \end{equation}
$$

Though time bars are the most common format for financial data, there can be potential problems with over reliance on time bars. First,
time bars exhibit high seasonality, as trading behavior may be quite different at the open or close versus midday; thus it will not be
informative to apply the same thershold on non-uniform distribution. Solutions include applying the fixed horizon method to tick or
volume bars instead of time bars, using data sampled at the same time every day (e.g. closing prices) or inputting a dynamic threshold
as a pd.Series corresponding to the times in the dataset. If desired, a pd.DataFrame-like object can be passed to scale the data by
known mean and standard deviation for the given time index.

### Examples of use

In [1]:
import numpy as np
import pandas as pd
import yfinance as yf

from mlfinlab.labeling import fixed_time_horizon

In [2]:
# Load price data
msft = yf.Ticker('MSFT')
msft_df = msft.history(start='2010-1-1', end ='2020-5-18')

close = msft_df['Close']
close.head()

Date
2010-01-04    24.29
2010-01-05    24.30
2010-01-06    24.15
2010-01-07    23.90
2010-01-08    24.07
Name: Close, dtype: float64

In [3]:
# Getting labels for a constant threshold of 1%. Will return 1 if the daily return is greater than 1%, -1 if less than -1%, and 
# 0 if in between.
fixed_time_horizon(close, 0.01, look_forward=1)

array([ 0.,  0., -1., ...,  0.,  1., nan])

A major problem with the fixed time horizon method is the __seasonality__ of the data. Time bars in the middle of the trading day would look very different than those at the open or close. One way to get around this is to only use data which has been collected at the same time per day, such as close prices. Another method is to apply a dynamic threshold which can adjust for such differences. Even though the data here is for close only, we will apply a dynamic threshold using the return of SPY for market return, and label based comparison of returns between MSFT and SPY for each time period.



In [4]:
spy = yf.Ticker('MSFT')
spy_df = msft.history(start='2010-1-1', end ='2020-5-18')
market_close = spy_df['Close']

market_returns = market_close.pct_change(1).shift(-1)
market_returns.head()

Date
2010-01-04    0.000412
2010-01-05   -0.006173
2010-01-06   -0.010352
2010-01-07    0.007113
2010-01-08   -0.012879
Name: Close, dtype: float64

In [5]:
fixed_time_horizon(close, market_returns, look_forward=1)

array([ 0., -1., -1., ...,  0.,  0., nan])

Scaling each return by the mean and standard deviation is also a potential solution in getting around the non-homogeneaity of the data, as it could adjust for volatility differences between different bars. If standardization is desires, a rolling window must be given.

In [6]:
fixed_time_horizon(close, 0.005, look_forward=1, standardized=True, window=4)

array([nan, nan, nan, ...,  1.,  1., nan])

Warnings will be raised if look_forward or window is greater than the length of the data. In this case, the function will still run, but all labels will be NaN. An error is raised if standardized is set to True but the window is not specified correctly (as an int).

In [7]:
fixed_time_horizon(close, 0.01, look_forward=99999999)



array([nan, nan, nan, ..., nan, nan, nan])

In [8]:
fixed_time_horizon(close, 0.01, look_forward=1, standardized=True, window=99999999)



array([nan, nan, nan, ..., nan, nan, nan])

In [9]:
fixed_time_horizon(close, 0.01, look_forward=1, standardized=True)

AssertionError: when standardized is True, window must be int

## Conclusion

This notebook presents the fixed horizon method, a simple method of labeling data for later use in machine learning algorithms. In this process:
 - Forward rates of return for assets are calculated based on tick data, usually in the form of time bars
 - The forward return rate is compared to a threshold for the return. The threshold can be a constant, or a dynamic series corresponding to every timestamp in the data
 - Each observation is compared to the threshold, and is labeled -1 if it is below -threshold, 0 if it is between -threshold and +threshold, and 1 if it is greater than +threshold

In Dixon's paper, the data is labelled using this method. Subsequently a neural network is trained to to predict the label based on the data. A simple trading strategy in which a long position is taken if a ticker is classified as 1, and short if it's classified at -1, can then be applied.

## References

1. Dixon, M., Klabjan, D. and Jin Hoon, B., 2020. Classification-based Financial Markets Prediction using Deep Neural Networks. arXiv, [online] Available at: <https://arxiv.org/abs/1603.08604> 

2. López de Prado, M., 2018. Advances In Financial Machine Learning. pp.43-44.

3. López de Prado, M., 2020. Machine Learning For Asset Managers. pp.65-66.
