## PEAD (Post–Earnings Announcement Drift)

- Post-earnings announcement drift or PEAD is the tendency for a stock’s cumulative abnormal returns to drift for several weeks (even several months) following the positive earnings announcement. 
- It is an academically well-documented anomaly first discovered by Ball and Brown in 1968
-  Research also shows that the main performance contributors are small-capitalization stocks; therefore, caution is recommended during the strategy’s implementation. Thus, the focus in microcaps from the strategy

## implementation of strategy
- Enter at the market open after the earnings announcement was made after the previous close, buying the stock if the return is very positive and shorting if the return is very negative, 
- Liquidate the position at the same day’s close. 
- Notice that this strategy does not require the trader to interpret whether the earnings announcement is “good” or “bad.” It does not even require the trader to know whether the earnings are above or below analysts’ expectations.

## Variants of the same Strategy
- add in valuation metrics
- add in biotech announcements
- add in corporate actions
- (might not be doable) add only buy if the stock price is rises above a certain level than the previous day 


transaction table
- number of longs and shorts
- the stocks i bought in for the day
- and also the amount i bought (naive allocation)

portfolio table
- portfolio value in numbers (nett value)


In [1]:
# maybe i can't use oct2py after all
#from oct2py import octave

In [2]:
from datetime import datetime
#import datetime so we can handle data with date and time
# datetime tutorial: https://www.dataquest.io/blog/python-datetime-tutorial/

#usual import from workflow
import pandas as pd
import numpy as np
import seaborn as sns 
import matplotlib.pyplot as plt
from sklearn.model_selection import cross_val_score, train_test_split
from sklearn.linear_model import LassoCV, LinearRegression, RidgeCV
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import r2_score
from scipy.stats import linregress

from pandas.plotting import autocorrelation_plot
plt.rcParams["figure.figsize"] = (10, 6) # (w, h) 
plt.ioff()

In [3]:
#Now letʼs create our momentum measurement function. We can compute the exponential regression of a stock by 
#performing linear regression on the natural log of the stockʼs daily closes:

In [4]:
# go read the Momentum Strategy from "Stocks on the Move" in Python pdf and run his jupyter notebook
# if still cannot then continue googling 'this momentum strategy for stock basket python'

In [5]:
indicators = pd.read_csv('../dataset/indicators.csv')

In [6]:
#understanding the event code
#the indicators table field equals EVENTCODES.
pd.set_option('display.max_rows', None)
indicators

Unnamed: 0,None,table,indicator,isfilter,isprimarykey,title,description,unittype
0,0,SF1,revenue,N,N,Revenues,[Income Statement] Amount of Revenue recognize...,currency
1,1,SF1,cor,N,N,Cost of Revenue,[Income Statement] The aggregate cost of goods...,currency
2,2,SF1,sgna,N,N,Selling General and Administrative Expense,[Income Statement] A component of [OpEx] repre...,currency
3,3,SF1,rnd,N,N,Research and Development Expense,[Income Statement] A component of [OpEx] repre...,currency
4,4,SF1,opex,N,N,Operating Expenses,[Income Statement] Operating expenses represen...,currency
5,5,SF1,intexp,N,N,Interest Expense,[Income Statement] Amount of the cost of borro...,currency
6,6,SF1,taxexp,N,N,Income Tax Expense,[Income Statement] Amount of current income ta...,currency
7,7,SF1,netincdis,N,N,Net Loss Income from Discontinued Operations,[Income Statement] Amount of loss (income) fro...,currency
8,8,SF1,consolinc,N,N,Consolidated Income,[Income Statement] The portion of profit or lo...,currency
9,9,SF1,netincnci,N,N,Net Income to Non-Controlling Interests,[Income Statement] The portion of income which...,currency


## Reading in Data

### Converting Dataset to Datetime Data

#### Price Volume Data

In [7]:
price_volume_sorted = pd.read_csv('../dataset/price_volume_sorted.csv')
price_volume_sorted['date'] = price_volume_sorted['date'].astype('datetime64[ns]')
price_volume_sorted.set_index('date', inplace = True)
#we set date as the index

In [8]:
price_volume_sorted.head()

Unnamed: 0_level_0,ticker,open,high,low,close,volume,dividends,closeunadj,lastupdated
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
2015-01-02,AAME,3.99,4.03,3.98,4.03,11443.0,0.0,4.03,2018-06-13
2015-01-05,AAME,3.9,4.01,3.9,4.01,13727.0,0.0,4.01,2018-06-13
2015-01-06,AAME,3.95,3.95,3.75,3.92,9743.0,0.0,3.92,2018-06-13
2015-01-07,AAME,3.899,3.92,3.87,3.92,1486.0,0.0,3.92,2018-06-13
2015-01-08,AAME,3.92,3.95,3.915,3.95,2200.0,0.0,3.95,2018-06-13


In [9]:
price_volume_sorted.tail()

Unnamed: 0_level_0,ticker,open,high,low,close,volume,dividends,closeunadj,lastupdated
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
2020-12-14,ZYXI,13.1,13.59,13.1,13.34,446678.0,0.0,13.34,2020-12-14
2020-12-15,ZYXI,13.45,13.45,13.03,13.18,372095.0,0.0,13.18,2020-12-15
2020-12-16,ZYXI,13.24,13.95,13.24,13.87,560581.0,0.0,13.87,2020-12-16
2020-12-17,ZYXI,13.94,14.29,13.701,14.25,409889.0,0.0,14.25,2020-12-17
2020-12-18,ZYXI,14.25,14.39,13.73,14.0,551332.0,0.0,14.0,2020-12-18


In [10]:
price_volume_sorted.dtypes
#date does not appear under dtypes because it is already teh index

ticker          object
open           float64
high           float64
low            float64
close          float64
volume         float64
dividends      float64
closeunadj     float64
lastupdated     object
dtype: object

#### Earnings Data and date

In [None]:
events = pd.read_csv('../dataset/.csv')


In [11]:
fundamentals_filtered = pd.read_csv('../dataset/fundamentals_filtered.csv')

In [12]:
fundamentals_filtered.head()

Unnamed: 0,datekey,ticker,dimension,calendardate,reportperiod,lastupdated,accoci,assets,assetsavg,assetsc,...,sharesbas,shareswa,shareswadil,sps,tangibles,taxassets,taxexp,taxliabilities,tbvps,workingcapital
0,2020-10-27,ZYXI,ARQ,2020-09-30,2020-09-30,2020-10-27,0.0,64965000.0,,57553000.0,...,34740990.0,34486000.0,35476000.0,0.581,64965000.0,985000.0,71000.0,429000.0,1.884,50258000.0
1,2020-07-28,ZYXI,ARQ,2020-06-30,2020-06-30,2020-10-27,0.0,36759000.0,,29833000.0,...,34705556.0,33283000.0,34454000.0,0.579,36759000.0,545000.0,1063000.0,0.0,1.104,23813000.0
2,2020-04-28,ZYXI,ARQ,2020-03-31,2020-03-31,2020-10-27,0.0,33222000.0,,25698000.0,...,33192517.0,32913000.0,34204000.0,0.463,33222000.0,985000.0,-483000.0,39000.0,1.009,19864000.0
3,2020-02-27,ZYXI,ARQ,2019-12-31,2019-12-31,2020-10-27,0.0,28277000.0,,22566000.0,...,32811832.0,32706000.0,34101000.0,0.433,28277000.0,513000.0,778000.0,52000.0,0.865,17369000.0
4,2019-10-29,ZYXI,ARQ,2019-09-30,2019-09-30,2020-10-27,0.0,24724000.0,,18638000.0,...,32743250.0,32490000.0,34076000.0,0.364,24724000.0,716000.0,463000.0,132000.0,0.761,14119000.0


**Tickers**

In [13]:
tickers = pd.read_csv('../dataset/tickers.csv', header = None)
tickers.head()

Unnamed: 0,0
0,ZYXI
1,ZSAN
2,ZQKSQ
3,ZPCM
4,ZNOG


In [14]:
len(tickers)

1951

In [15]:
tickers = tickers[0].tolist()
#save ticker as a list

In [16]:
tickers

['ZYXI',
 'ZSAN',
 'ZQKSQ',
 'ZPCM',
 'ZNOG',
 'ZMTP',
 'ZIXI',
 'ZIVO',
 'ZGNX',
 'ZEUS',
 'ZAZA',
 'ZAGG',
 'ZAAP',
 'YUME',
 'YUMA',
 'YTEN',
 'YPPN',
 'YGYI',
 'YEWB',
 'YCB',
 'XXII',
 'XTNT',
 'XSPY',
 'XSPA',
 'XRM',
 'XRDC',
 'XPLR',
 'XPL',
 'XONE',
 'XENE',
 'XELB',
 'XBKS1',
 'WYY',
 'WWR',
 'WVVI',
 'WVFC',
 'WTXR',
 'WTT',
 'WSTL',
 'WSTG',
 'WSCI',
 'WRESQ',
 'WOLV',
 'WNEB',
 'WMTM',
 'WMLPQ',
 'WMAR',
 'WLTGQ',
 'WLKR',
 'WLFC',
 'WLDN',
 'WKHS',
 'WINT',
 'WHLR',
 'WHLM',
 'WHHT',
 'WHF',
 'WGBS',
 'WG',
 'WFBI',
 'WEIN',
 'WEBK',
 'WDDD',
 'WBKC',
 'WBBW',
 'WAYN',
 'WAVXQ',
 'WATT',
 'WARM',
 'VYFC',
 'VXRT',
 'VVUSQ',
 'VUZI',
 'VTNR',
 'VTG',
 'VTEQ',
 'VTAE',
 'VSYS',
 'VSRI',
 'VSCP',
 'VRTB',
 'VRTA',
 'VRSZQ',
 'VRME',
 'VPG',
 'VPCO',
 'VOXX',
 'VOLT',
 'VOIL',
 'VODG',
 'VNRX',
 'VMEMQ',
 'VLTC',
 'VKTX',
 'VIVE',
 'VISL',
 'VIRC',
 'VIDE',
 'VICA',
 'VGZ',
 'VERU',
 'VEC',
 'VCYT',
 'VCSY',
 'VCRA',
 'VCON',
 'VCEL',
 'VBTX',
 'VBLT',
 'VBIV1',
 'VBIV',
 'VB

#### Opening and Closing Stock Price

- we create a dataframe for open and another for close

In [17]:
#dataframe for stock opening price
stocks_open = (
    (pd.concat(
        [pd.read_csv(f"../dataset/price_volume_tickers/{ticker}.csv", index_col='date', parse_dates=True)[
            'open'
        ].rename(ticker)
        for ticker in tickers],
        axis=1,
        sort=True)
    )
)
stocks_open = stocks_open.loc[:,~stocks_open.columns.duplicated()]

In [18]:
stocks_open.head()
#the starting date for some stocks are different because some of them were not yet listed on 2015-01-02 which is our starting date

Unnamed: 0_level_0,ZYXI,ZSAN,ZQKSQ,ZPCM,ZNOG,ZMTP,ZIXI,ZIVO,ZGNX,ZEUS,...,ABMC,ABIO,ABHD,ABEO,ABDC,ABCP,ABCD,AAPC,AAOI,AAME
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2015-01-02,0.18,,2.23,2.0,1.39,0.25,3.6,0.08,11.2,17.9,...,0.13,123.48,0.29,3.37,12.54,1.7,1.78,,11.28,3.99
2015-01-05,0.22,,2.16,2.0,1.38,0.16,3.52,0.09,11.52,18.25,...,0.13,133.56,0.26,3.71,12.36,1.65,1.65,,10.75,3.9
2015-01-06,0.16,,2.1,2.0,1.42,0.16,3.5,0.08,11.36,16.6,...,0.13,126.0,0.29,3.4,12.89,1.64,1.73,,10.65,3.95
2015-01-07,0.14,,2.0,2.0,1.41,0.16,3.4,0.08,10.88,16.0,...,0.14,113.589,0.26,3.47,13.0,1.65,1.81,,10.35,3.899
2015-01-08,0.15,,2.08,2.0,1.4,0.16,3.46,0.08,11.2,14.91,...,0.12,104.58,0.29,3.3,13.05,1.7,1.75,,9.99,3.92


In [19]:
#dataframe for stock closing price
stocks_close = (
    (pd.concat(
        [pd.read_csv(f"../dataset/price_volume_tickers/{ticker}.csv", index_col='date', parse_dates=True)[
            'close'
        ].rename(ticker)
        for ticker in tickers],
        axis=1,
        sort=True)
    )
)
stocks_close = stocks_close.loc[:,~stocks_close.columns.duplicated()]

In [20]:
stocks_close.head(5)

Unnamed: 0_level_0,ZYXI,ZSAN,ZQKSQ,ZPCM,ZNOG,ZMTP,ZIXI,ZIVO,ZGNX,ZEUS,...,ABMC,ABIO,ABHD,ABEO,ABDC,ABCP,ABCD,AAPC,AAOI,AAME
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2015-01-02,0.18,,2.2,2.0,1.38,0.25,3.55,0.09,11.36,18.28,...,0.13,131.04,0.29,3.4,12.4,1.65,1.76,,10.79,4.03
2015-01-05,0.16,,2.1,2.0,1.39,0.16,3.48,0.08,11.2,16.6,...,0.13,123.48,0.28,3.21,12.83,1.65,1.77,,10.65,4.01
2015-01-06,0.16,,2.0,2.0,1.4,0.16,3.39,0.09,10.72,15.91,...,0.13,118.453,0.29,3.25,12.97,1.6,1.73,,10.25,3.92
2015-01-07,0.15,,2.06,2.0,1.41,0.16,3.38,0.08,11.12,14.84,...,0.13,107.1,0.29,3.34,13.09,1.7,1.75,,9.85,3.92
2015-01-08,0.15,,2.19,2.0,1.43,0.16,3.64,0.08,11.36,16.02,...,0.12,112.14,0.31,3.3,13.09,1.71,1.79,,9.96,3.95


In [21]:
# we backfill NaN with the stock price
stocks_close = stocks_close.bfill()
stocks_open = stocks_open.bfill()

In [22]:
stocks_close.head()

Unnamed: 0_level_0,ZYXI,ZSAN,ZQKSQ,ZPCM,ZNOG,ZMTP,ZIXI,ZIVO,ZGNX,ZEUS,...,ABMC,ABIO,ABHD,ABEO,ABDC,ABCP,ABCD,AAPC,AAOI,AAME
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2015-01-02,0.18,220.2,2.2,2.0,1.38,0.25,3.55,0.09,11.36,18.28,...,0.13,131.04,0.29,3.4,12.4,1.65,1.76,10.06,10.79,4.03
2015-01-05,0.16,220.2,2.1,2.0,1.39,0.16,3.48,0.08,11.2,16.6,...,0.13,123.48,0.28,3.21,12.83,1.65,1.77,10.06,10.65,4.01
2015-01-06,0.16,220.2,2.0,2.0,1.4,0.16,3.39,0.09,10.72,15.91,...,0.13,118.453,0.29,3.25,12.97,1.6,1.73,10.06,10.25,3.92
2015-01-07,0.15,220.2,2.06,2.0,1.41,0.16,3.38,0.08,11.12,14.84,...,0.13,107.1,0.29,3.34,13.09,1.7,1.75,10.06,9.85,3.92
2015-01-08,0.15,220.2,2.19,2.0,1.43,0.16,3.64,0.08,11.36,16.02,...,0.12,112.14,0.31,3.3,13.09,1.71,1.79,10.06,9.96,3.95


In [23]:
stocks_open.head()

Unnamed: 0_level_0,ZYXI,ZSAN,ZQKSQ,ZPCM,ZNOG,ZMTP,ZIXI,ZIVO,ZGNX,ZEUS,...,ABMC,ABIO,ABHD,ABEO,ABDC,ABCP,ABCD,AAPC,AAOI,AAME
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2015-01-02,0.18,243.2,2.23,2.0,1.39,0.25,3.6,0.08,11.2,17.9,...,0.13,123.48,0.29,3.37,12.54,1.7,1.78,10.01,11.28,3.99
2015-01-05,0.22,243.2,2.16,2.0,1.38,0.16,3.52,0.09,11.52,18.25,...,0.13,133.56,0.26,3.71,12.36,1.65,1.65,10.01,10.75,3.9
2015-01-06,0.16,243.2,2.1,2.0,1.42,0.16,3.5,0.08,11.36,16.6,...,0.13,126.0,0.29,3.4,12.89,1.64,1.73,10.01,10.65,3.95
2015-01-07,0.14,243.2,2.0,2.0,1.41,0.16,3.4,0.08,10.88,16.0,...,0.14,113.589,0.26,3.47,13.0,1.65,1.81,10.01,10.35,3.899
2015-01-08,0.15,243.2,2.08,2.0,1.4,0.16,3.46,0.08,11.2,14.91,...,0.12,104.58,0.29,3.3,13.05,1.7,1.75,10.01,9.99,3.92


In [24]:
# need to create another dataframe that is the difference between opening and closing
# for that dataframe, the NaN must be equals to 0 i think

In [25]:
stocks_prev_cl_next_op_diff = stocks_close - stocks_open.shift()
#it must be today's closing price - next day's opening price

In [26]:
stocks_prev_cl_next_op_diff.head()

Unnamed: 0_level_0,ZYXI,ZSAN,ZQKSQ,ZPCM,ZNOG,ZMTP,ZIXI,ZIVO,ZGNX,ZEUS,...,ABMC,ABIO,ABHD,ABEO,ABDC,ABCP,ABCD,AAPC,AAOI,AAME
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2015-01-02,,,,,,,,,,,...,,,,,,,,,,
2015-01-05,-0.02,-23.0,-0.13,0.0,0.0,-0.09,-0.12,0.0,0.0,-1.3,...,0.0,0.0,-0.01,-0.16,0.29,-0.05,-0.01,0.05,-0.63,0.02
2015-01-06,-0.06,-23.0,-0.16,0.0,0.02,0.0,-0.13,0.0,-0.8,-2.34,...,0.0,-15.107,0.03,-0.46,0.61,-0.05,0.08,0.05,-0.5,0.02
2015-01-07,-0.01,-23.0,-0.04,0.0,-0.01,0.0,-0.12,0.0,-0.24,-1.76,...,0.0,-18.9,0.0,-0.06,0.2,0.06,0.02,0.05,-0.8,-0.03
2015-01-08,0.01,-23.0,0.19,0.0,0.02,0.0,0.24,0.0,0.48,0.02,...,-0.02,-1.449,0.05,-0.17,0.09,0.06,-0.02,0.05,-0.39,0.051


In [27]:
stocks_prev_cl_next_op_diff.tail()

Unnamed: 0_level_0,ZYXI,ZSAN,ZQKSQ,ZPCM,ZNOG,ZMTP,ZIXI,ZIVO,ZGNX,ZEUS,...,ABMC,ABIO,ABHD,ABEO,ABDC,ABCP,ABCD,AAPC,AAOI,AAME
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2020-12-14,-0.05,-0.002,,,-0.04,0.45,-0.005,-0.01,0.51,0.25,...,-0.006,-0.07,,0.0,,0.0,,,0.09,0.015
2020-12-15,0.08,-0.023,,,-0.21,-0.22,-0.16,-0.025,0.75,0.58,...,-0.004,-0.29,,0.04,,-0.006,,,0.54,-0.09
2020-12-16,0.42,-0.048,,,-0.1,-0.1,0.05,0.0,-1.27,0.06,...,-0.002,-0.14,,0.08,,0.0,,,0.06,0.02
2020-12-17,1.01,-0.056,,,0.102,0.2,0.06,0.02,-0.93,-0.96,...,-0.001,-0.06,,-0.04,,0.0,,,0.01,0.03
2020-12-18,0.06,-0.037,,,-0.085,-0.05,0.89,-0.001,-0.19,-1.84,...,0.002,-0.14,,-0.1,,0.0,,,0.11,-0.17


## Initiate Portfolio and Transaction table dataframe

- we initiate a portfolio with 1,000,000 USD in a dataframe with date from 2015-01-02 to 2020-12-18 as the index

transaction table
- number of longs and shorts
- the stocks i bought in for the day
- and also the amount i bought (naive allocation)

portfolio table
- portfolio value in numbers (nett value)

In [28]:
stocks_close.index

DatetimeIndex(['2015-01-02', '2015-01-05', '2015-01-06', '2015-01-07',
               '2015-01-08', '2015-01-09', '2015-01-12', '2015-01-13',
               '2015-01-14', '2015-01-15',
               ...
               '2020-12-07', '2020-12-08', '2020-12-09', '2020-12-10',
               '2020-12-11', '2020-12-14', '2020-12-15', '2020-12-16',
               '2020-12-17', '2020-12-18'],
              dtype='datetime64[ns]', name='date', length=1503, freq=None)

In [29]:
days = stocks_close.index
days

DatetimeIndex(['2015-01-02', '2015-01-05', '2015-01-06', '2015-01-07',
               '2015-01-08', '2015-01-09', '2015-01-12', '2015-01-13',
               '2015-01-14', '2015-01-15',
               ...
               '2020-12-07', '2020-12-08', '2020-12-09', '2020-12-10',
               '2020-12-11', '2020-12-14', '2020-12-15', '2020-12-16',
               '2020-12-17', '2020-12-18'],
              dtype='datetime64[ns]', name='date', length=1503, freq=None)

In [39]:
#initiate portfolio with 1 million USD
portfolio = pd.DataFrame()
portfolio['date'] = days
portfolio['NAV'] = 1_000_000
portfolio.set_index('date', inplace = True)
portfolio.head()

Unnamed: 0_level_0,NAV
date,Unnamed: 1_level_1
2015-01-02,1000000
2015-01-05,1000000
2015-01-06,1000000
2015-01-07,1000000
2015-01-08,1000000


In [41]:
## Transaction Table
# the table will be updated with the number of longs and shorts (aka stocks bought and sold) per day 
transactions = pd.DataFrame()
transactions['date'] = days
transactions['longs'] = None
transactions['shorts'] = None
transactions['total number of transactions'] = None
transactions

Unnamed: 0,date,longs,shorts,total number of transactions
0,2015-01-02,,,
1,2015-01-05,,,
2,2015-01-06,,,
3,2015-01-07,,,
4,2015-01-08,,,
5,2015-01-09,,,
6,2015-01-12,,,
7,2015-01-13,,,
8,2015-01-14,,,
9,2015-01-15,,,


In [42]:
transactions.set_index('date', inplace=True)
transactions

Unnamed: 0_level_0,longs,shorts,total number of transactions
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2015-01-02,,,
2015-01-05,,,
2015-01-06,,,
2015-01-07,,,
2015-01-08,,,
2015-01-09,,,
2015-01-12,,,
2015-01-13,,,
2015-01-14,,,
2015-01-15,,,


## Momentum Measurement Function

#### Momentum Measurement Function 1
momentum is calculated by multiplying the annualized exponential regression slope of the past 90 days by the $R^2$ coefficient of the regression calculation. Thus, we need to take a look at the highest momentum values in the dataset



- we create a momentum measurement function here
- we do this by computing the exponential regression of a stock by performing linear regression on the natural log of the stock’s daily closes

In [None]:
just skip this function 1 first

## Post–Earnings Announcement Drift

- Before we backtest this strategy, it is necessary to have historical data of the times of earnings annoucements. 
- The important feature of this program is that it carefully selects only earnings announcements occurring after the previous trading day’s market close and before today’s market open. 
- Earnings announcements occurring at other times should not be triggers for our entry trades as they occur at today’s market open.

We just need to compute the 90-day moving standard deviation of previous-close-to-next day’s-open return as the benchmark for deciding whether the announcement is “surprising” enough to generate the post announcement drift.


This function takes an input 1xN stock symbols cell array allsyms and creates a 1 × N logical array earnann, which tells us whether (with values true or false) the corresponding stock has an earnings announcement after the previous day’s 4:00 P.M. ET (U.S. market closing time) and before today’s 9:30 A.M. ET (U.S. market opening time). The inputs prevDate and todayDate should be in yyyymmdd format.

In [52]:
## we use datekey because it represents the SEC filing date
earnings_date = pd.DataFrame()
earnings_date['earnings_date'] = fundamentals_filtered['datekey']
earnings_date['ticker'] = fundamentals_filtered['ticker']
earnings_date.set_index('earnings_date', inplace = True)
earnings_date.head()


Unnamed: 0_level_0,ticker
earnings_date,Unnamed: 1_level_1
2020-10-27,ZYXI
2020-07-28,ZYXI
2020-04-28,ZYXI
2020-02-27,ZYXI
2019-10-29,ZYXI


In [None]:

#i have to split it up into its individual tickers

#### Valuation Data

In [None]:
maybe i don't even have to use event_filtered, i just have to use earnings
# #do the same for event_filtered
# event_filtered = pd.read_csv('../dataset/event_filtered.csv')
# event_filtered['date'] = event_filtered['date'].astype('datetime64[ns]')
# event_filtered.set_index('date', inplace = True)

In [None]:
event_filtered.dtypes

In [None]:
(event_filtered['eventcodes'] == '0').sum()

In [None]:
indicators.loc['eventcodes']