# Backtest 2019

## What is the backtest?

The backtest analyzes data from a past period of time to determine the ideal values of seven different metrics (discussed later) for each sector.  This version of the backtest uses the jupyter notebook format to describe each part, as well as to increase efficiency.

#  
### Library Imports

- pandas - used to manipulate data within different data structures
- datetime - used to get today's date and current time
- time - used for calculating runtime
- numpy - supports mathematical functions for large volume data structures
- product - creates a cartesian product

In [6]:
from pandas import *
import pandas as pd
from datetime import *
import time
import numpy as np
from itertools import product
from IPython.display import display

### Effiency and timing

In the past, it has taken dozens of hours to complete the backtests.  This is costly in terms of time and electricity (which should be considered as a sustainability-oriented organization).  The block below assigns the current time to a variable so that we can later determine runtime.

We also establish our sectors and metrics in lists.

In [7]:
st = time.time()

sectors = ['Consumer Discretionary', 'Consumer Staples', 'Energy', 'Financials', 'Health Care', 'Industrials', \
           'Information Technology', 'Materials', 'Real Estate', 'Telecommunication Services', 'Utilities', 'Materials']
metrics = [ 'PE_RATIO', 'PX_TO_BOOK_RATIO', 'TRAIL_12M_EPS', 'TOT_DEBT_TO_TOT_EQY', \
               'PRICE_TO_FCF', 'RETURN_COM_EQY', 'RETURN_ON_ASSET' ]

pd.options.mode.chained_assignment = None #allow chained assignment

### Risk-free rates

In [9]:
#Define risk-free rates
TBill3Mth = (2.34/100)
Libor3Mth = (2.62/100)

#  
#   
# Step 1: Data input and Setup, Organic Range Generation

The basis of our backtest are frames - sets of each metric, each containing a different combination of values.  In order to determine what these values are, we have to come up with a range for each metric, for each sector.

We also import our data file for range generation -- the last 10 years of data for each company.  The data is imported to a **dataframe**, a data structure which can be thought of as an excel sheet manipulable through python.  The dataframe is then sorted.

### Data Import

In [17]:
#sdf = pd.read_excel("data/spyx2007-2017 dat.xlsx", index_col=[0,1])
#^The above code links to the .xlsx file that has QYears instead of dates;
#I'm unsure of the status of QYear, I don't think Bob every fully implemented it.
sdf = read_csv("Backtest VALUES.csv",index_col='DATE',parse_dates=True)

sdf = sdf.sort_index()

#drop any rows without a date
sdf.dropna(axis='index',how='any',inplace=True)

#Define starting date to begin df with
startDate = to_datetime('2014-03-28')
startDate=startDate.toordinal()

#Cut spyx_Sector based on DateTime range
sdf['ordinalTime']=sdf.index
sdf['ordinalTime'] = sdf['ordinalTime'].apply(date.toordinal)
sdf = sdf.loc[sdf['ordinalTime']>=startDate]

### getRanges method

The getRanges method returns a list of ranges, one for each metric, for that specific sector.  A range has a low value, high value, and step value (incrementation value).  While this code is visually unappealing, it's relatively practical. 

It is worth noting that we take ranges from the 25th percentile to the 75th percentile (the middle section of a normal distribution).  The companies in the top 25% for a metric may be outliers or overvalued based on that number, and the companies in the lower 25% may be underperforming.  

It is also worth noting the step values.  These are generally safe bets for typical output to create a reasonable number of frames.  Remember, runtime is **_exponential_**.*

_\*Proposed change: change step-by amounts in range functions to be computer-generated\*_

In [16]:
def getRanges(sector):
    peRange =  range(int(round(np.percentile(sdf.loc[(slice(None),sector), 'PE_RATIO'].dropna(), 25))), \
                     int(round(np.percentile(sdf.loc[(slice(None),sector), 'PE_RATIO'].dropna(), 75))), 2)
    pbRange =  range(int(round(np.percentile(sdf.loc[(slice(None),sector), 'PX_TO_BOOK_RATIO'].dropna(), 25))), \
                     int(round(np.percentile(sdf.loc[(slice(None),sector), 'PX_TO_BOOK_RATIO'].dropna(), 75))))
    epsRange =  range(int(round(np.percentile(sdf.loc[(slice(None),sector), 'TRAIL_12M_EPS'].dropna(), 25))), \
                      int(round(np.percentile(sdf.loc[(slice(None),sector), 'TRAIL_12M_EPS'].dropna(), 75))), 2)
    deRange =  range(int(round(np.percentile(sdf.loc[(slice(None),sector), 'TOT_DEBT_TO_TOT_EQY'].dropna(), 25))), \
                     int(round(np.percentile(sdf.loc[(slice(None),sector), 'TOT_DEBT_TO_TOT_EQY'].dropna(), 75))), 10)
    fcfRange =  range(int(round(np.percentile(sdf.loc[(slice(None),sector), 'PRICE_TO_FCF'].dropna(), 25))), \
                     int(round(np.percentile(sdf.loc[(slice(None),sector), 'PRICE_TO_FCF'].dropna(), 75))), 5)
    roeRange =  range(int(round(np.percentile(sdf.loc[(slice(None),sector), 'RETURN_COM_EQY'].dropna(), 25))), \
                     int(round(np.percentile(sdf.loc[(slice(None),sector), 'RETURN_COM_EQY'].dropna(), 75))), 2)
    roaRange =  range(int(round(np.percentile(sdf.loc[(slice(None),sector), 'RETURN_ON_ASSET'].dropna(), 25))), \
                     int(round(np.percentile(sdf.loc[(slice(None),sector), 'RETURN_ON_ASSET'].dropna(), 75))), 2)
    return [peRange, pbRange, epsRange, deRange, fcfRange, roeRange, roaRange]

#  
#  
# Step 2: Frame Generation and Product

### Output Dataframe

In [18]:
#Create set of dates
datesList = sorted(set(sdf.index))

#Create new Output DF for portfolios for dates
dfPort = pd.DataFrame([],index=datesList)
dfPort['return']=None
dfPort['beta']=None
dfPort['tBill3Mth']=TBill3Mth
dfPort['treynor']=None
dfPort['peTest']=None
dfPort['pbTest']=None
dfPort['epsTest']=None
dfPort['deTest']=None
dfPort['fcfTest']=None
dfPort['roeTest']=None
dfPort['roaTest']=None
dfPort['maxHSFScore']=None

## Step 3: Output

### Runtime

The following displays the total runtime

In [45]:
import datetime
et = time.time()
str(datetime.timedelta(seconds = et - st))

'0:01:21.227196'