<a href="https://colab.research.google.com/github/dgunning/financial-ml/blob/master/Chapter_3_Labeling.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Labeling
Supervised learning algorithms require that the rows in X are associated with an arry of labels y, so that those labels can be predicted on unseen features samples.

## The Fixed-Time Horizon Method
Virtually all ML papers label observations using the fixed-time horizon method.

Consider a features matrix **X** with **l** rows

Disadvantages
1. Time bars do not exhibit good statistical properties
2. The same threshold is applied regardless of the observed volatility

Alternatives
1. Label per a varying threshold, estimated using a rolling exponentially weighted standard deviation of returns.
2. Use dollar or volume bars as their volatility is much closer to constant.

But even these two improvements miss a key flaw of the fixed time horizon method: the path followed by prices. Every investment strategy has stop-loss limits, whether they are self-imposed by the portfolio manager, enforced by the risk department, or triggered by a margin call. 

## Daily Volatility Estimates

In [0]:
def getDailyVol(close, span0=100):
  # daily vol, reindexed to close
  df0 = close.index.searchsorted(close.index-pd.Timedelta(days=1))
  df0 = df0[df0>0]
  df0 = pd.Series(close.index[df0-1], index=close.index[close.shape[0]-df0.shape[0]:])
  df0 = close.loc[df0.index]/close.loc[df0.values].values-1 # daily returns
  df0 = df0.ewm(span=span0).std()
  return df0

We can use the output of this function to set default profit taking and stop-loss limits throughout the rest of this chapter

# The Triple Barrier Method
The triple barrier method labels an observation according to the first barrier touched out of three barriers. First we set two horizontal barriers and one vertical barrier. The two horizontal barriers are defined by profit-taking and stop-loss limits, which are a dynamic function of estimated volatility. The third barrier is defined in terms of the number of bars elapsed since the position was taken. If the upper barrier was touched first, we label the observation as a 1. If the lower barrier was touched first, we label the observation as a -1. If the vertical barrier was touched first, we have two choices: the sign of the return, or a 0.