# <h1><center>EPS ESTIMATE REVISION MOMENTUM MODELLING

# BUSINESS UNDERSTANDING

## OVERVIEW
    
When speaking of the stock market, there are adages that may sound pithy but in practice often prove to be powerful and true. One of  most well known is “Don’t fight the Fed”, which argues that investors should respect the trend coming from the Federal Reserve decision making regarding monetary policy. Another that is less well known (probably because we made it up) is in regards to the power of a stock’s EPS trend, which is “Earnings trump and trend”.  This is an observation that often the best longer term stock investments are those where the investor is on the right side of the earnings trend. Both of these phrases speak to the power of trends in the stock market, of which the focus of this analysis will be on the latter of these two phrases, “Earnings trump and trend” and for which we hope the reader will come to a foundational place of understanding and conviction that such statement may in fact hold statistical power in its truthfulness, which we will embark to analyze and attempt to model with efficacy.

A few examples as detail / proof points of what we are talking about with this statement are as follows from companies most have heard of:

 •	Google – During the past 5 years, GOOGL is up 244% with over 90% of this return coming from earnings per share (EPS) growth.
 
 •	Domino’s Pizza – During the past 5 years, DPZ is up 201% with nearly 100% of this return coming from earnings per share (EPS) growth.  
 
 •	Home Depot – During the past 5 years, HD is up 222% with approximately 70% of this return coming from earnings per share (EPS) growth.

Therefore, while getting the valuation “right” for a given stock is important, we would argue that getting the earnings “right” is even more important hence the “trump” notion. And accompanying this is the power of trends which Newton’s first law of motion would support the general power of trends which says, “an object in motion stays in motion with the same speed and in the same direction unless acted upon by an unbalanced force”.  We believe that such “momentum” is prevalent in the stock market and manifests itself in various ways, one of which is the operating performance / earnings trajectory of a given company.  This can be seen in earnings trends that ebb and flow over reasonable amounts time as aspects such as industry dynamics, product cycles, etc. strengthen and weaken over various time periods.  

These foundational and experiential beliefs serve as the basis of the forthcoming analysis whereby we seek to use machine learning and related tools in order to analyze and model the earnings estimates from various angles / perspectives of a basket of stocks, for which we will call this exercise EPS estimate revision momentum modelling.

The goal of this analysis is not only to use powerful algorithms to predict EPS but also better understand various drivers behind such predictions.  The chosen variables / data set had this goal in mind when being gathered and organized.  The use case for the analysis is multifaceted for both professional and individual investors.  For the professional, this analysis could serve to aid in the following:  predicting EPS which in turn could be used in combination with a target multiple and/or DCF in order to establish an estimated target price, predicting EPS and comparing growth potential across a universe of stocks, comparing the models outcomes with human driven modelling / analytical efforts, providing investment ideas based on the model’s output and providing insight into the various power of individual and groupings of metrics which can serve to guide the focus of an analyst’s research efforts to gain insight and dig deeper on additional metrics.  The use case could also be for individual investors as a means to generate investment buy and sell ideas based on the predictions made by the model.

# DATA MEANING & TYPE

To begin, let’s provide some background information on some “what’s” and “why’s” of our analysis.

What are some of the reasons for using consensus earnings estimate data instead of historical or other measures?

As discussed above, the premise of our analysis is to model the consensus EPS estimates.  We chose to model the consensus EPS estimates for several reasons:

•	While historical EPS is typically only reported on a quarterly basis, consensus EPS estimates are updated on a much more frequent basis which is whenever an analyst changes their forecasted earnings estimate thus gleaning insight from such daily movement as information enters the market is deemed valuable.

•	Crowdsource for insight…..don’t reinvent the wheel.  We are primarily approaching our analysis by studying those that know the subject (EPS in this case) the best and then seek to glean insight and leverage / improve upon it with powerful statistical calculations / algorithms.  While we do have a few variables that are unrelated to consensus EPS estimates, the majority seek to capture changes in such metrics. We believe that embedded in their estimates is information from two parties which arguably know each company best and are trying to use statistics to validate such belief. One is the sell side analysts that cover a given stock and thus their livelihood depends on such skill and the other is company management which serves as the foundation of financial information being fed to such analysts and they too have their livelihood tied up in their respective company. Hence, we are not trying to outsmart them per se but rather glean insight from them and use other fundamental metrics, underlying trends and statistical tools to capture their collective wisdom in a manner which we hope to produce good prediction power.

What basket of stocks did we choose and why?

We narrowed down the universe of stocks for this analysis to 400 stocks.  We narrowed down the universe using Bloomberg and Factset based on the following parameters: 

•	US domiciled and traded companies

•	The real estate sector was excluded due to the fact such stocks do not “trade” based on EPS (but rather funds from 
operations or FFO)

•	Market cap greater than 1.5 billion.

•	We set a floor on the number of analysts covering each stock as greater than 4 so that there is sufficient estimate data available for comparability purposes.

•	At least 10 years of trading history, so that we can compute the desired metrics and compare accordingly.

•	Five year EPS growth must be above zero and the R-squared (historical EPS vs time) must be above 0.1, which further narrows down the universe to companies that are deemed higher quality and have had more stability with their historical EPS profile which we expect to aid in our ability to analyze and model such consensus EPS.  To further clarify, on the other side of the coin are companies with extremely volatile EPS profiles and at times are in the negative range which understandably would be much more difficult and less insightful to model.

What was done to enable comparability across the basket of stocks given they are all unique businesses and grow at different rates?

•	We adjusted some of the variables in a way that ensured comparability (more details on this below).  Thus while some companies have grown faster than others we included both the absolute growth levels as well as relative / proportional levels.  Additionally, the analyst estimate figures are comparable across the universe thus needed no adjustments.

What explanatory variables did we include and why?
<b>We gathered weekly data during the past four and a half years (12/30/16 - 7/1/21) and established six categories by which an explanatory variable is included in:  GROWTH, STABILITY, ANALYST REVISIONS, INCOME STATEMENT, RETURNS and OTHER.  In total there are 53 explanatory variables, 6 potential response variables (though we are only focusing on the aforementioned 2 from above to start) and a total of ~99k rows of data.<b>  Note, we stopped the data at 7/1/21 in order to be able to calculate the response variables that need to see what the forward metrics consist of.  Details on each category and the variables include are as follows:

#### GROWTH
Consensus (i.e. sell side analysts) EPS median estimates trends were gathered in order to capture the recent growth trends in such figures which we expect to be helpful given the aforementioned momentum theory:

Annualized growth rate for each of the following time periods on a daily basis for the past 5 year period (1 month, 2 month, 3 month, 6 month, 1 year, 3 year and 5 year)

•	Labeled as “Best_EPS_##” in the data set.

Percent relative to their respective 5 year growth rate for all the aforementioned time periods except the 5 year.  The 5 year growth rate approximates a company's longer term growth rate thus this ratio captures the current trend relative to the longer term.

•	Labeled as “Best_EPS_##_vs_5Y”.

3M, 6M and 1Y growth rate ranks within our universe thus aiding in comparability.

•	Labeled as “Best_EPS_##_Rank”.

A continuous and classification variable seeking to capture short-term EPS acceleration, which is deemed attractive as the EPS is not only improving but in somewhat of a parabolic manner.  For the continuous variable we used the aforementioned EPS growth ranks (3M, 6M and 1Y), then averaged the three to get a continuous variable rank.  For the classification variable we segmented into those meeting (or not) the following criteria: EPS growth of 3M > 6M AND 6M > 1Y.

•	Labeled as "ST_Accel" and "ST_Accel_Class".

#### STABILITY
Stability measures seek to capture the stability of EPS which not only aids in modelling efforts but also evidences confidence in the outlook of a company by analysts and management that provide much of the underlying information.

R-squared (pearson) of the weekly 5 year EPS figures.  Both the value and the rank of this measure.

•	Labeled as “Best_EPS_5Y_R2" and “Best_EPS_5Y_R2_Rank".

R-squared (spearman) of the weekly 5 year EPS figures.  Just the value for this measure.  We are curious if there are differences between the two R2 measures given the former is parametric and the latter is not and captures linear relationships better.

•	Labeled as “Best_EPS_5Y_R2_sp".

Standard deviation of the weekly 5 year EPS figures.  

•	Labeled as “Best_EPS_5Y_SD".

FactSet derived measure of EPS stability defined as: measuring the consistency for an estimate item over the past 5 years.

•	Labeled as “Best_EPS_5Y_Stability".

#### ANALYST REVISIONS
See details above regarding crowdsourcing (from analysts and management) for the reasoning behind these revisions metrics.

Sell side analyst EPS revisions (Upward, Downward and Unchanged).  A “revision” is a change (regardless of magnitude) in an analyst’s estimate during the past 3 months for a company’s EPS for the next 12 month period.  Calculated the % of total revisions for each metric.

•	Labeled as “An_Up", "An_Down", "An_Unch".

We included the current value as well as the change in each of these variables on a 3- and 6-month basis and labeled them similarly to the previous metrics but added “_#M” on the end.  This captures second derivative changes in the revisions metrics.

Analyst revision ratios	
We pulled a predetermined metric labeled “Mark_Rev” from FactSet which seeks to quantify the relative trend in the analyst revisions with the lower being better.

•	Labeled as “An_Mark"

We also created our own metric labeled as “Net_Est_Rev_Ratio” which equates to the following:  (Upward Revisions – Downward Revisions) / Total Revisions.  This ratio ranges from -100 to 100 and seeks to capture the change in such revision on a proportion basis during the past 3 month period.

•	Labeled as “NRR"

Both of these metrics were also used to create variables which capture the change in the Net_Est_Rev_Ratio and Mark_Rev on a 3- and 6-month basis, which similarly as above capture second derivative changes and labeled them similarly to the previous metrics but added “_#M” on the end.

#### RETURNS
Technical analysis (aka price momentum) presumes that price leads fundamentals as the collective market begins to price in changes in fundamentals prior to such changes becoming quantifiable.  As such, in hopes to capture some of the collective market’s wisdom based on the similar rationale above regarding analyst and management information capture, we included relative return data on a 1-, 3- and 6-month basis.  This was calculated by subtracting the S&P 500 Equal Weighted Index from each stock's given return, which removes any noise that can be caused by overall market movements and captures a cleaner measure of the performance of a stock.  

•	Labeled as “Return_#M".

Additionally, with the aforementioned technical analysis view in mind, we also included a relative price momentum measure which is defined as the change over the last 6 months in the one month moving average relative to the index.

•	Labeled as “Rel_PMO".

#### OTHER
Included Market Capitalization values and Sector classification variables in order to capture any explanatory power coming from the size of the company and/or which sector it is classified in.

•	Labeled as “Mkt_Cap" and "Sector".
    
We also have left "Date" and "Ticker" as variables in the dataset as they were used as part of the building of it.  And though they are not going to be used as explanatory variables, we will leave them in the data set for now in case we they are deemed helpful during our initial exploratory analysis steps.
    
#### EXPLANATORY VARIABLES SUMMARY (53 total)

Best_EPS_1M = 1 month historical consensus EPS growth

Best_EPS_3M = 3 month historical consensus EPS growth

Best_EPS_6M = 6 month historical consensus EPS growth

Best_EPS_1Y = 1 year historical consensus EPS growth

Best_EPS_3Y = 3 year historical consensus EPS growth

Best_EPS_5Y = 5 year historical consensus EPS growth

Best_EPS_1M_v5Y = 1 month / 5 year historical EPS ratio

Best_EPS_3M_v5Y = 3 month / 5 year historical EPS ratio

Best_EPS_6M_v5Y = 6 month / 5 year historical EPS ratio

Best_EPS_1Y_v5Y = 1 year / 5 year historical EPS ratio

Best_EPS_3Y_v5Y = 3 year / 5 year historical EPS ratio

Best_EPS_1M_Rank = 1 month historical EPS rank within this 400 stock universe

Best_EPS_3M_Rank = 3 month historical EPS rank within this 400 stock universe

Best_EPS_6M_Rank = 6 month historical EPS rank within this 400 stock universe

Best_EPS_1Y_Rank = 1 year historical EPS rank within this 400 stock universe

ST_Accel = average of Best_EPS_3M_Rank, Best_EPS_6M_Rank and Best_EPS_1Y_Rank

ST_Accel_Class = binary measure (1 or 0) if Best_EPS_3M > Best_EPS_6M AND Best_EPS_6M > Best_EPS_1Y or not

Best_EPS_5Y_R2 = 5 year R-squared of weekly EPS (pearson)

Best_EPS_5Y_R2_Rank =  5 year r-squared EPS rank within this 400 stock universe (pearson)

Best_EPS_5Y_R2_sp = 5 year R-squared of weekly EPS (spearman)

Best_EPS_5Y_SD =  5 year standard deviation of weekly EPS

Best_EPS_5Y_Stability = FactSet calculated stability of 5 year weekly EPS

An_Up = % of analysts that revised their EPS estimate UP during the past 3 months

An_Down = % of analysts that revised their EPS estimate DOWN during the past 3 months

An_Unch = % of analysts that left their EPS estimate UNCHANGED during the past 3 months

An_Mark = FactSet calculated analyst revision measure

NRR = "An_Up" minus "An_Down"

An_Up_3M = "An_Up" minus "An_Up" 3 months ago

An_Down_3M = "An_Down" minus "An_Down" 3 months ago

An_Unch_3M = "An_Unch" minus "An_Unch" 3 months ago

An_Mark_3M = "An_Mark" minus "An_Mark" 3 months ago

NRR_3M = "NRR" minus "NRR" 3 months ago

An_Up_6M = "An_Up" minus "An_Up" 6 months ago

An_Down_6M = "An_Down" minus "An_Down" 6 months ago

An_Unch_6M = "An_Unch" minus "An_Unch" 6 months ago

An_Mark_6M = "An_Mark" minus "An_Mark" 6 months ago

NRR_6M = "NRR" minus "NRR" 6 months ago

ROIC = historical 12 month return on invested capital (ROIC)

ROIC_1Y_Chg = 1 year change in ROIC

ROIC_SD = 5 year standard deviation of ROIC

ROE = historical 12 month return on equity (ROE)

ROE_1Y_Chg = 1 year change in ROE

ROE_SD = 5 year standard deviation of ROE

FCF_Mgn = historical 12 month free cash flow margin (FCF margin)

FCF_Mgn_1Y_Chg	 = 1 year change in FCF margin

FCF_Mgn_SD	= 5 year standard deviation of FCF margin

Op_Mgn = historical 12 month operating margin (Op margin)

Op_Mgn_1Y_Chg	 = 1 year change in Op margin

Op_Mgn_SD	= 5 year standard deviation of Op margin

Return_1M = 1 month historical relative price return

Return_3M = 3 month historical relative price return

Return_6M = 6 month historical relative price return

Rel_PMO = relative price momentum

Market_Cap = current market capitalization

Sector = GICS sector classification

#### RESPONSE VARIABLES SUMMARY (6 total)

Fwd_Best_EPS_6M = 6 month annualized FORWARD consensus EPS growth (captures absolute EPS growth)

Fwd_ST_Accel_Class_3M = FORWARD binary measure (1 or 0) if Best_EPS_3M > Best_EPS_6M AND Best_EPS_6M > Best_EPS_1Y or not (captures the EPS trends that are increasing at an increasing rate thus evidencing parabolic trends / high amounts of momentum)

<i>OTHER POTENTIAL VARIABLES TO INVESTIGATE LATER<i>

Fwd_Best_EPS_6M_v5Y = 6 month annualized FORWARD consensus EPS growth / 5 year historical 5 year consensus EPS growth (captures near term growth relative to longer term levels)

Fwd_ST_Accel_3M = FORWARD average of Best_EPS_3M_Rank, Best_EPS_6M_Rank and Best_EPS_1Y_Rank (captures aforementioned classification measure in a continuous measure)

Fwd_Return_1M = 1 month FORWARD price return (so we have the ability to test power of model and factors on not just EPS growth but also price return)

Fwd_Return_3M = 3 month FORWARD price return (same)

Fwd_Return_6M = 6 month FORWARD price return (same)


## DATA QUALITY

FactSet was used to gather all of the data.  As SMU students, we are provided free FactSet licenses upon request during our time in the program via the business library.  We attempted to get the data via FactSet's API but our student license did not provide such access so we used the FactSet excel add-in.  Given the reputation of FactSet in the marketplace we have high confidence in the accuracy of the data gathered. Upon gathering the data in excel, we used R code to format and create summary CSV files to be used during our analysis.

In [68]:
def lib():
    import numpy as np 
    from scipy import stats
    import csv
    import os
lib()

def chwd():
    """
    Please change your path to the current path of this notebook
    """
    path = r'C:\Users\jaywo\workspace\ML1_Project\notebooks'
    if os.getcwd != path: 
        os.chdir(path)
        os.chdir("..")
    else:
        os.chdir("..")
    print(os.getcwd())
chwd()

C:\Users\jaywo\workspace\ML1_Project


In [79]:
import numpy as np 
from scipy import stats
import csv
import os

def load(filepath):
#with open('data/ML1_data_adj_v2.csv', 'r') as f:
#    data = list(csv.reader(f, delimiter= ','))
#data = np.array(data)
    path = filepath
    data = np.genfromtxt(path, delimiter = ',')
    print(data)
    print(np.info(data))
    print("NA Values: {}".format(np.isnan(data).sum()))

load(r'data\ML1_data_adj_v2.csv')

[[            nan             nan             nan ...             nan
              nan             nan]
 [            nan             nan -1.66852060e-02 ... -2.17389146e+00
   2.80154502e+00  2.01600000e+03]
 [            nan             nan  8.19502075e-01 ...  1.32135415e+00
   8.67805108e-01  2.01600000e+03]
 ...
 [            nan             nan  7.21311480e-02 ...  8.10432211e-01
   2.59920551e-01  2.02100000e+03]
 [            nan             nan  2.18487395e-01 ...  1.66640074e+00
   2.84649392e+00  2.02100000e+03]
 [            nan             nan  1.85781991e-01 ...  1.50527491e+00
   1.18149480e+00  2.02100000e+03]]
class:  ndarray
shape:  (52664, 81)
strides:  (648, 8)
itemsize:  8
aligned:  True
contiguous:  True
fortran:  False
data pointer: 0x266f3fbc040
byteorder:  little
byteswap:  False
type: float64
None
NA Values: 158070


### MISSING VALUES

There are 158070 NA values, I would like to explore this before moving forward to eliminate any potential bias this might cause downstream

In [None]:
# Replacing the 