# Introduction to Testing

In [67]:
from fastcore.all import *
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from dateutil.relativedelta import relativedelta
from polygon import RESTClient

In [2]:
path = Path('../data')

In Chapter 1 we created models and created actions we want to take for multiple approaches.  The question now is, how do we know if they are profitable?  How should we measure them?  How do we know if we simply got lucky, or if they are reliable?

As we mentioned in chapter 2, testing is the most important part of the process.  If done well you have a good way to determine what strategies should be implemented, and if done poorly you run the risk of implementing non-profitable strategies.

This chapter will lay the groundwork and cover the basics of testing.  Additional information about testing will be covered throughout the book and will assume knowledge of the content in this chapter.

This chapter will cover many of the rules of testing.  Many people often like to point out specific exceptions to specific scenarios, and almost nothing can be 100% true in 100% of scenarios.  But you need to understand the ideal testing setup, and you need to understand what you sacrifice when you choose to, need to, or are asked to deviate from it.

## The Data

The first question we have to ask is what data to we use for testing?  Ideally we have 3 subsets of our data (training, validation, and test).  Let's go through what they are used for and why they are important.

### Training Set

The training set is unique because it has no restrictions on what we can do with it.  We can look at any piece of data in it.  We can normalize data using values in the training set.  We can train machine learning models on the training set.  This is often the largest subset of our data.

This training set is pretty explanatory - we use this for understanding our data and developing our model.  

We can load it in using the same method as we did in chapter 1.

In [11]:
raw = pd.read_csv(path/'eod-quotemedia.csv',parse_dates=['date'])
df = raw.pivot(index='date', columns='ticker',values='adj_close')
train = df.loc[:pd.Timestamp('2017-1-1')]

### Validation Set

The goal of creating a trading strategy is to have it perform well on data that it was not developed using.  We may use data from 2015 - 2020 to create a trading strategy, but the goal is to apply it to 2021 and 2022 to make a profit.

Because we want our model to perform on *unseen* data, we create some restriction to how we use the validation set.  We do not train any models on it, and we do not use statistics or data from the validation set when creating our model.  It's data our model has never seen.  The validation set is something we can only use to see how well our strategy or model performs.  

The entire purpose of the validation set is to give us unseen data to evaluate our approaches on.  By having this separate validation set we can more accurately determine what works and what doesn't.

We can get our validation set using the same method as we did in chapter 1.

In [14]:
valid = df.loc[pd.Timestamp('2017-1-1'):]

### Test Set


The Test set is very similar to the validation set, but it takes things a step further.  It has further restrictions in that is is the final model step before deployment.  The main difference is how often you can use it.  For the validation set, you can test anything on the validation set as many times as you want.  For the test set you only get to look at the test set once for your particular approach.

For example, you may try 300 different approaches and parameter changes to your strategy to see what works best.  You can check the profitability on each of them using the validation set.  Then once you have chosen a strategy, you do a final check to ensure it also performs on the test set.  Once you have done that you need a new test set or your project is over.

The reason this is important is that you want to ensure that you didn't get lucky and find a configuration out of your 300 attempts that just happens to work on the validation set but doesn't work elsewhere.  If you try enough combinations eventually you will find something that works, but the test set gives you confidence that your model works because it's a good strategy and not that you just tried enough things to find something that works on coincidence.


:::{note} Many people re-use or have more lax rules on the test set.  Many people do not use one at all.  In this text I am laying out the ideal state I believe we should strive for.  If you choose to loosen these restrictions on the test set or do without one, I would strongly encourage you to think hard about it.


To get our test set, we could have split our initial data into 3.  Because we are a bit concerned about survivorship bias, let's pull a new test set that uses recent data to and test how these strategies would perform over the last year and a half.

We need to get adjusted close price.  There are a variety of services that have APIs to pull from, I have picked polgygon to use here because it's free for what we need.

In [151]:
polygon_free_api_key = 'wUv2tpS05klv9ebAQKyLD610FBWllpan'
client = RESTClient(polygon_api_key)

In [133]:
from datetime import datetime
import time
dfs = L()
for ticker in valid:
    aggs = client.get_aggs(ticker, 1, "day", "2021-01-01", "2022-05-31",adjusted=True)
    close = {ticker:[o.close for o in aggs]}
    
    # Convert millisecond time stamp to date
    date = L(o.timestamp/1e3 for o in aggs).map(datetime.fromtimestamp)
    dfs.append(pd.DataFrame(close,index=date))
    
    # Free api gives 5 API calls / minute - so we need to pace out api calls!
    time.sleep(60/5)
test = pd.concat(dfs,axis=1)
test.to_csv(path/'polytest_eod-quotemedia.csv')

In [140]:
path.ls()

(#1) [Path('../data/eod-quotemedia.csv')]

Unnamed: 0,A,AAL,AAP,AAPL,ABBV,ABC,ABT,ACN,ADBE,ADI
2021-01-04,118.64,15.13,157.34,129.41,105.41,96.50,109.11,256.46,485.34,146.02
2021-01-05,119.61,15.43,157.17,131.01,106.50,97.76,110.46,257.92,485.69,148.63
2021-01-06,122.89,15.52,166.25,126.60,105.58,106.17,110.23,260.74,466.31,149.30
2021-01-07,126.16,15.38,167.67,130.92,106.71,110.13,111.30,263.20,477.74,155.61
2021-01-08,127.06,15.13,170.06,132.05,107.27,110.03,111.61,264.16,485.10,156.74
...,...,...,...,...,...,...,...,...,...,...
2022-05-24,124.41,15.50,180.23,140.36,149.11,153.08,113.77,279.31,398.41,161.85
2022-05-25,120.38,16.13,185.31,140.52,151.96,154.27,113.19,279.64,402.50,162.32
2022-05-26,123.85,17.24,190.90,143.78,150.57,155.80,114.87,291.55,408.60,164.00
2022-05-27,130.55,18.13,193.05,149.64,150.00,156.86,116.69,304.15,428.22,167.55


In [88]:
pd.Series([o.close for o in aggs])

0      129.41
1      131.01
2      126.60
3      130.92
4      132.05
        ...  
350    140.36
351    140.52
352    143.78
353    149.64
354    148.84
Length: 355, dtype: float64

In [64]:
test = 'https://api.polygon.io/v2/aggs/ticker/AAPL/range/1/day/2020-06-01/2020-06-17?apiKey=wUv2tpS05klv9ebAQKyLD610FBWllpan'

In [65]:
pip install polygon-api-client

Collecting polygon-api-client
  Downloading polygon_api_client-1.0.2-py3-none-any.whl (31 kB)
Collecting websockets<11.0,>=10.3
  Downloading websockets-10.3-cp38-cp38-macosx_10_9_x86_64.whl (97 kB)
[K     |████████████████████████████████| 97 kB 8.3 MB/s eta 0:00:011
Installing collected packages: websockets, polygon-api-client
Successfully installed polygon-api-client-1.0.2 websockets-10.3
Note: you may need to restart the kernel to use updated packages.


We will use yahoo finance to pull the dataset

In [53]:
import yfinance as yf
tickers = " ".join(valid.columns)
data = yf.download(tickers, start=str(test_start_date), end=str(test_end_date))

[*********************100%***********************]  495 of 495 completed

45 Failed downloads:
- GGP: No data found for this date range, symbol may be delisted
- HCP: Data doesn't exist for startDate = 1498881600, endDate = 1625112000
- ETFC: No data found, symbol may be delisted
- MYL: No data found, symbol may be delisted
- CBG: No data found for this date range, symbol may be delisted
- SPLS: No data found for this date range, symbol may be delisted
- KORS: No data found for this date range, symbol may be delisted
- VAR: No data found, symbol may be delisted
- UTX: No data found, symbol may be delisted
- APC: No data found, symbol may be delisted
- COG: No data found, symbol may be delisted
- LB: No data found, symbol may be delisted
- ALXN: No data found, symbol may be delisted
- CBS: No data found, symbol may be delisted
- HRS: No data found, symbol may be delisted
- LUK: No data found for this date range, symbol may be delisted
- PBCT: No data found, symbol may be delisted
- DPS:

In [61]:
data['Adj Close']

Unnamed: 0_level_0,A,AAL,AAP,AAPL,ABBV,ABC,ABT,ACN,ADBE,ADI,...,XL,XLNX,XOM,XRAY,XRX,XYL,YUM,ZBH,ZION,ZTS
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2017-07-03,56.979954,48.906921,114.348335,33.961643,57.618118,87.006866,44.808376,115.109688,138.410004,69.029694,...,,,63.052326,62.259335,24.006258,52.238720,67.236877,120.194000,39.363590,60.915688
2017-07-05,57.596680,49.751316,101.601128,34.101288,57.665840,87.484619,45.075150,115.221153,141.210007,70.508049,...,,,62.092323,62.692833,23.673414,51.758255,66.989395,119.622910,39.372398,60.799034
2017-07-06,57.066696,50.518063,99.621445,33.779408,57.069134,84.994774,44.100052,114.190147,140.750000,70.136192,...,,,61.531723,61.931808,23.590206,51.230682,66.595222,117.619347,38.826775,60.060318
2017-07-07,57.461777,51.469215,98.433647,34.122570,57.307835,86.281052,44.560001,115.369759,142.220001,71.170128,...,,,61.608524,62.230438,23.948008,51.767685,67.365211,119.004997,39.108376,60.837917
2017-07-10,57.683411,51.110107,97.699692,34.330845,57.148712,84.976418,44.366825,115.230438,143.339996,71.405968,...,,,61.562431,61.883644,24.031219,51.447369,67.411034,119.276497,39.301983,60.721275
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2021-06-25,146.480530,22.219999,201.123581,132.353851,108.341377,114.764740,111.024910,291.335785,579.659973,164.742859,...,8.71,,61.531834,63.775944,23.252203,115.544868,115.243767,160.152023,54.121216,186.144943
2021-06-28,147.127014,21.389999,201.202286,134.014359,108.360558,113.186401,113.960106,290.584320,588.799988,167.602234,...,8.97,,59.961655,62.821991,22.774351,117.868614,114.123283,158.075882,52.140938,185.995804
2021-06-29,148.002228,21.080000,201.979523,135.555542,107.689293,112.890472,115.447395,293.085754,590.750000,168.417770,...,8.49,,59.590523,62.851799,22.542553,117.908180,113.582687,155.642471,51.409302,186.900558
2021-06-30,147.007645,21.209999,201.822098,136.181976,108.015327,112.939789,114.186646,291.464325,585.640015,169.164566,...,8.33,,60.028271,62.861732,22.687429,118.620132,113.061760,155.294830,51.565388,185.279999


## Returns

Now that we understand what data we will use for testing, let's actually start using it to calculate how well our models from chapter 1 perform.  We will walk through the process for one model.  Then at the end we will put it together to compare the different approaches with different parameters to compare them.

### Dollars

Let's start with the simplest 

### Percent Return

### Log Return

### Model Comparisons

## Statistical Tests