# Introduction to Testing

In [1]:
from fastcore.all import *
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from polygon import RESTClient

from datetime import datetime, timedelta
import time

In [2]:
path = Path('../data')

In Chapter 1 we created models and created actions we want to take for multiple approaches.  The question now is, how do we know if they are profitable?  How should we measure them?  How do we know if we simply got lucky, or if they are reliable?

As we mentioned in chapter 2, testing is the most important part of the process.  If done well you have a good way to determine what strategies should be implemented, and if done poorly you run the risk of implementing non-profitable strategies.

This chapter will lay the groundwork and cover the basics of testing.  Additional information about testing will be covered throughout the book and will assume knowledge of the content in this chapter.

This chapter will cover many of the rules of testing.  Many people often like to point out specific exceptions to specific scenarios, and almost nothing can be 100% true in 100% of scenarios.  But you need to understand the ideal testing setup, and you need to understand what you sacrifice when you choose to, need to, or are asked to deviate from it.

## The Data

The first question we have to ask is what data to we use for testing?  Ideally we have 3 subsets of our data (training, validation, and test).  Let's go through what they are used for and why they are important.

### Training Set

The training set is unique because it has no restrictions on what we can do with it.  We can look at any piece of data in it.  We can normalize data using values in the training set.  We can train machine learning models on the training set.  This is often the largest subset of our data.

This training set is pretty explanatory - we use this for understanding our data and developing our model.  

We can load it in using the same method as we did in chapter 1.

In [3]:
raw = pd.read_csv(path/'eod-quotemedia.csv',parse_dates=['date'])
df = raw.pivot(index='date', columns='ticker',values='adj_close')
train = df.loc[:pd.Timestamp('2017-1-1')]

### Validation Set

The goal of creating a trading strategy is to have it perform well on data that it was not developed using.  We may use data from 2015 - 2020 to create a trading strategy, but the goal is to apply it to 2021 and 2022 to make a profit.

Because we want our model to perform on *unseen* data, we create some restriction to how we use the validation set.  We do not train any models on it, and we do not use statistics or data from the validation set when creating our model.  It's data our model has never seen.  The validation set is something we can only use to see how well our strategy or model performs.  

The entire purpose of the validation set is to give us unseen data to evaluate our approaches on.  By having this separate validation set we can more accurately determine what works and what doesn't.

We can get our validation set using the same method as we did in chapter 1.

In [4]:
valid = df.loc[pd.Timestamp('2017-1-1'):]

### Test Set


The Test set is very similar to the validation set, but it takes things a step further.  It has further restrictions in that is is the final model step before deployment.  The main difference is how often you can use it.  For the validation set, you can test anything on the validation set as many times as you want.  For the test set you only get to look at the test set once for your particular approach.

For example, you may try 300 different approaches and parameter changes to your strategy to see what works best.  You can check the profitability on each of them using the validation set.  Then once you have chosen a strategy, you do a final check to ensure it also performs on the test set.  Once you have done that you need a new test set or your project is over.

The reason this is important is that you want to ensure that you didn't get lucky and find a configuration out of your 300 attempts that just happens to work on the validation set but doesn't work elsewhere.  If you try enough combinations eventually you will find something that works, but the test set gives you confidence that your model works because it's a good strategy and not that you just tried enough things to find something that works on coincidence.


:::{note} Many people re-use or have more lax rules on the test set.  Many people do not use one at all.  In this text I am laying out the ideal state I believe we should strive for.  If you choose to loosen these restrictions on the test set or do without one, I would strongly encourage you to think hard about it.


To get our test set, we could have split our initial data into 3.  Because we are a bit concerned about survivorship bias, let's pull a new test set that uses recent data to and test how these strategies would perform over the last year and a half.

We need to get adjusted close price.  There are a variety of services that have APIs to pull from, I have picked polgygon to use here because it's free for what we need.

:::{note} We are using a free api key and putting the key in this notebook in an effort to show everthing to the reader, but best practice would be to read the api key from an environment variable.  From the polygon docs it will by default pull the api key from the `POLYGON_API_KEY` environment variable, then you would initiate the client with so that no credentials are exposed.
`client = RESTClient()`

In [5]:
polygon_free_api_key = 'wUv2tpS05klv9ebAQKyLD610FBWllpan'
client = RESTClient(polygon_free_api_key)

In [6]:
if not (path/'polytest_eod-quotemedia.csv').exists():
    dfs = L()
    errors = L()
    for ticker in valid:
        try:
            aggs = client.get_aggs(ticker, 1, "day", "2021-01-01", "2022-05-31",adjusted=True)
            close = {ticker:[o.close for o in aggs]}
            
            # Convert millisecond time stamp to date
            date = L(o.timestamp/1e3 for o in aggs).map(datetime.fromtimestamp)
            dfs.append(pd.DataFrame(close,index=date))
        except:
            errors.append(aggs)
            print(f"FAILURE: {ticker}")
        
        # Free api gives 5 API calls / minute - so we need to pace our api calls!
        time.sleep(60/5)
    df_test = pd.concat(dfs,axis=1)
    df_test.to_csv(path/'polytest_eod-quotemedia.csv')

df_test = pd.read_csv(path/'polytest_eod-quotemedia.csv',index_col=0,parse_dates=True)


In [7]:
df_test.iloc[:5,:5]

Unnamed: 0,A,AAL,AAP,AAPL,ABBV
2021-01-04,118.64,15.13,157.34,129.41,105.41
2021-01-05,119.61,15.43,157.17,131.01,106.5
2021-01-06,122.89,15.52,166.25,126.6,105.58
2021-01-07,126.16,15.38,167.67,130.92,106.71
2021-01-08,127.06,15.13,170.06,132.05,107.27


## Returns

Now that we understand what data we will use for testing, let's actually start using it to calculate how well our models from chapter 1 perform.  We will walk through the process for one model.  Then at the end we will put it together to compare the different approaches with different parameters to compare them.

### Dollars

Let's take our first model from chapter 1 and measure how well it does in terms of dollars.  After all dollars is what we want to make, so it seems like a reasonable starting point.

In [8]:
def get_momentum_actions(df, n_periods,threshold):
    _x = df.shift(n_periods)
    momentum_rate = df.apply(lambda x: (x-x.shift(n_periods))/x.shift(n_periods))[n_periods:]
    actions = pd.DataFrame(np.where(momentum_rate < -threshold, 'Short',
                           np.where(momentum_rate > threshold,  'Buy',
                                                                 '')),
                   columns=momentum_rate.columns,index=momentum_rate.index)
    actions.index = actions.index + timedelta(1)
    
    return actions

In [9]:
transactions = pd.DataFrame(columns=['open_date','ticker','action'])

valid_mom = get_momentum_actions(valid,28,0.08)
for dte,vals in valid_mom.iloc[:5,:5].iterrows():    
    for val in vals[vals.values != ''].items():
        row = pd.DataFrame([[dte.date()]+[o for o in val]],columns=['open_date','ticker','action'])
        transactions = pd.concat([transactions,row])
        

In [10]:
transactions

Unnamed: 0,open_date,ticker,action
0,2017-02-14,A,Buy
0,2017-02-14,AAPL,Buy
0,2017-02-15,AAPL,Buy
0,2017-02-16,A,Buy
0,2017-02-16,AAPL,Buy
0,2017-02-17,AAPL,Buy
0,2017-02-18,AAPL,Buy


### Percent Return

### Log Return

### Model Comparisons

## Statistical Tests