# EPS ESTIMATE REVISION MOMENTUM MODELLING OVERVIEW

When speaking of the stock market, there are adages that may sound pithy but in practice often prove to be powerful and true. One of  most well known is “Don’t fight the Fed”, which argues that investors should respect the trend coming from the Federal Reserve decision making regarding monetary policy. Another that is less well known (if at all) is in regards to the power of a stock’s EPS trend, which is “Earnings trump and trend”.  This is an observation that often the best longer term stock investments are those where the investor is on the right side of the earnings trend. Both of these phrases speak to the power of trends in the stock market, of which the focus of this analysis will be on the latter of these two phrases, “Earnings trump and trend” and for which we hope the reader will come to a foundational place of understanding and conviction that such statement may in fact hold statistical power in its truthfulness, which we will embark to analyze and attempt to model with efficacy.

A few examples as detail / proof points of what we are talking about with this statement are as follows from companies most have heard of:

 •	Google – During the past 5 years, GOOGL is up 279% with over 80% of this return coming from earnings per share (EPS) growth.
 
 •	Domino’s Pizza – During the past 5 years, DPZ is up 228% with over 90% of this return coming from earnings per share (EPS) growth.  
 
 •	Home Depot – During the past 5 years, HD is up 219% with approximately 70% of this return coming from earnings per share (EPS) growth.

Therefore, while getting the valuation “right” for a given stock is important, we would argue that getting the earnings “right” is even more important hence the “trump” notion. And accompanying this is the power of trends which Newton’s first law of motion would support the general power of trends which says, “an object in motion stays in motion with the same speed and in the same direction unless acted upon by an unbalanced force”.  We believe that such “momentum” is prevalent in the stock market and manifests itself in various ways, one of which is the operating performance / earnings trajectory of a given company.  This can be seen in earnings trends that ebb and flow over time as aspects such as industry dynamics, product cycles, etc. strengthen and weaken over various time periods.  

These foundational and experiential beliefs serve as the basis of the forthcoming analysis whereby we seek to use machine learning and related tools in order to analyze and model the earnings estimates of a basket of stocks, for which we will call this exercise EPS estimate revision momentum modelling. 

# BUSINESS UNDERSTANDING

The goal of this analysis is not only to use powerful algorithms to predict EPS but also better understand various drivers behind such predictions.  The chosen variables / data set had this goal in mind when being gathered and organized.  The use case for the analysis is multifaceted for both professional and individual investors.  For the professional, this analysis could serve to aid in the following:  predicting EPS which in turn could be used in combination with a target multiple and/or DCF in order to establish an estimated target price, predicting EPS and comparing growth potential across a universe of stocks, comparing the models outcomes with human driven modelling / analytical efforts, providing investment ideas based on the model’s output and providing insight into the various power of individual and groupings of metrics which can serve to guide the focus of an analyst’s research efforts to gain insight and dig deeper on additional metrics.  The use case could also be for individual investors as a means to generate investment buy and sell ideas based on the predictions made by the model.

#### ???? Goals – what outcomes (NEED HELP HERE)
#### ???? Accuracy in EPS predictions, determine if / which EPS variables have predictive price power, segmentation of data set besides by sector, drivers behind EPS predictions, 
#### ???? Measure a good outcome???

# DATA UNDERSTANDING

To begin, let’s provide some background information on some “what’s” and “why’s” of our analysis.

What are some of the reasons for using consensus earnings estimate data instead of historical or other measures?

As discussed above, the premise of our analysis is to model the consensus EPS estimates.  We chose to model the consensus EPS estimates for several reasons:

•	While historical EPS is typically only reported on a quarterly basis, consensus EPS estimates are updated on a much more frequent basis thus gleaning insight from such daily movement as information enters the market is deemed valuable.

•	Crowdsource for insight…..don’t reinvent the wheel.  We are primarily approaching our analysis by studying those that know the subject (i.e. EPS trajectory) best and then seek to glean insight and improve upon it with powerful statistical calculations.  While we do have a few variables that are unrelated to consensus EPS estimates, the majority seek to capture changes in such metrics. We believe that embedded in their estimates is information from two parties which arguably know each company best and are trying to use statistics to validate such belief. One is the sell side analysts that cover a given stock and thus their livelihood depends on such skill and the other is company management which serves as the foundation of financial information being fed to such analysts and they too have their livelihood tied up in their respective company. Hence, we are not trying to outsmart them with other information but rather glean insight from them and use statistical tools to capture their collective wisdom in a manner which is statistical efficacious.

What basket of stocks did we choose and why?

We narrowed down the universe of stocks for this analysis to 400 stocks.  We narrowed down the universe using Bloomberg and Factset based on the following parameters: 

•	US domiciled and traded companies

•	The real estate sector was excluded due to the fact such stocks do not “trade” based on EPS (but rather funds from 
operations or FFO)

•	Market cap greater than 1.5 billion.

•	We set a floor on the number of analysts covering each stock as greater than 4 so that there is sufficient estimate data available for comparability purposes.

•	At least 10 years of trading history, so that we can compute the desired metrics and compare accordingly.

•	Five year EPS growth must be above zero and the R-squared (historical EPS vs time) must be above 0.1, which further narrows down the universe to companies that are deemed higher quality and have had more stability with their historical EPS profile which we expect to aid in our ability to analyze and model such consensus EPS.  To further clarify, on the other side of the coin are companies with extremely volatile EPS profiles and at times are in the negative range which understandably would be much more difficult and less insightful to model.

What was done to enable comparability across the basket of stocks given they are all unique businesses and grow at different rates?

•	We normalized some of the variables in a way that ensured comparability (more details on this below).  Thus while some companies have grown faster than others we included both the absolute growth levels as well as relative / proportional levels.  Additionally, the analyst estimate figures are comparable across the universe thus needed no adjustments.

What explanatory variables did we include and why?
We gathered weekly data during the past five years (12/30/16 - 12/30/21) and established six categories by which an explanatory variable is included in:  GROWTH, STABILITY, ANALYST REVISIONS, INCOME STATEMENT, RETURNS and OTHER.  In total there are 51 explanatory variables, 6 potential response variables and a total of ~104k rows of data.  Details on each category and the variables include are as follows:

#### GROWTH
Consensus (i.e. sell side analysts) EPS median estimates trends:

Annualized growth rate for each of the following time periods on a daily basis for the past 5 year period (1 month, 2 month, 3 month, 6 month, 1 year, 3 year and 5 year)

•	Labeled as “Best_EPS_##” in the data set.

Percent relative to their respective 5 year growth rate for all the aforementioned time periods except the 5 year.  The 5 year growth rate approximates a company's longer term growth rate thus this ratio captures the current trend relative to the longer term.

•	Labeled as “Best_EPS_##_vs_5Y”.

3M, 6M and 1Y growth rate ranks within our universe thus aiding in comparability.

•	Labeled as “Best_EPS_##_Rank”.

A continuous and classification variable seeking to capture short-term EPS acceleration, which is deemed attractive as the EPS is not only improving but in somewhat of a parabolic manner.  For the continuous variable we used the aforementioned EPS growth ranks (3M, 6M and 1Y), then averaged the three to get a continuous variable rank.  For the classification variable we segmented into those meeting (or not) the following criteria: EPS growth of 3M > 6M AND 6M > 1Y.

•	Labeled as "ST_Accel" and "ST_Accel_Class".

#### STABILITY
Stability measures seek to capture the stability of EPS which not only aids in modelling efforts but also evidences confidence in the outlook of a company by analysts and management that provide much of the underlying information.

R-squared of the weekly 5 year EPS figures.  Both the value and the rank of this measure.

•	Labeled as “Best_EPS_5Y_R2" and “Best_EPS_5Y_R2_Rank".

Standard deviation of the weekly 5 year EPS figures.  

•	Labeled as “Best_EPS_5Y_SD".

FactSet derived measure of EPS stability defined as: measuring the consistency for an estimate item over the past 5 years.

•	Labeled as “Best_EPS_5Y_Stability".

#### ANALYST REVISIONS
See details above regarding crowdsourcing for the reasoning behind these revisions metrics.

Sell side analyst EPS revisions (Upward, Downward and Unchanged).  A “revision” is a change (regardless of magnitude) in an analyst’s estimate during the past 3 months for a company’s EPS for the next 12 month period.  Calculated the % of total revisions for each metric.

•	Labeled as “An_Up", "An_Down", "An_Unch".

We included the current value as well as the change in each of these variables on a 3- and 6-month basis and labeled them similarly to the previous metrics but added “_#M” on the end.  This captures second derivative changes in the revisions metrics.

Analyst revision ratios	
We pulled a predetermined metric labeled “Mark_Rev” from FactSet which seeks to quantify the relative trend in the analyst revisions with the lower being better.

•	Labeled as “An_Mark"

We also created our own metric labeled as “Net_Est_Rev_Ratio” which equates to the following:  (Upward Revisions – Downward Revisions) / Total Revisions.  This ratio ranges from -100 to 100 and seeks to capture the change in such revision on a proportion basis during the past 3 month period.

•	Labeled as “NRR"

Both of these metrics were also used to create variables which capture the change in the Net_Est_Rev_Ratio and Mark_Rev on a 3- and 6-month basis, which similarly as above capture second derivative changes and labeled them similarly to the previous metrics but added “_#M” on the end.

#### RETURNS
Technical analysis presumes that price leads fundamentals as the collective market begins to price in changes in fundamentals prior to such changes becoming quantifiable.  As such, in hopes to capture some of the collective market’s wisdom based on the similar rationale above regarding analyst and management information capture, we included return data on a 1-, 3- and 6-month basis.

•	Labeled as “Return_#M".

#### OTHER
Included Market Capitalization values and Sector classification variables.

•	Labeled as “Mkt_Cap" and "Sector".

## EXPLANATORY VARIABLES SUMMARY (51 total)

Best_EPS_1M = 1 month trailing consensus EPS growth

Best_EPS_3M = 3 month trailing consensus EPS growth

Best_EPS_6M = 6 month trailing consensus EPS growth

Best_EPS_1Y = 1 year trailing consensus EPS growth

Best_EPS_3Y = 3 year trailing consensus EPS growth

Best_EPS_5Y = 5 year trailing consensus EPS growth

Best_EPS_1M_v5Y = 1 month / 5 year trailing EPS ratio

Best_EPS_3M_v5Y = 3 month / 5 year trailing EPS ratio

Best_EPS_6M_v5Y = 6 month / 5 year trailing EPS ratio

Best_EPS_1Y_v5Y = 1 year / 5 year trailing EPS ratio

Best_EPS_3Y_v5Y = 3 year / 5 year trailing EPS ratio

Best_EPS_1M_Rank = 1 month trailing EPS rank within this 400 stock universe

Best_EPS_3M_Rank = 3 month trailing EPS rank within this 400 stock universe

Best_EPS_6M_Rank = 6 month trailing EPS rank within this 400 stock universe

Best_EPS_1Y_Rank = 1 year trailing EPS rank within this 400 stock universe

ST_Accel = average of Best_EPS_3M_Rank, Best_EPS_6M_Rank and Best_EPS_1Y_Rank

ST_Accel_Class = binary measure (1 or 0) if Best_EPS_3M > Best_EPS_6M AND Best_EPS_6M > Best_EPS_1Y or not

Best_EPS_5Y_R2 = 5 year R-squared of weekly EPS

Best_EPS_5Y_R2_Rank =  5 year r-squared EPS rank within this 400 stock universe

Best_EPS_5Y_SD =  5 year standard deviation of weekly EPS

Best_EPS_5Y_Stability = FactSet calculated stability of 5 year weekly EPS

An_Up = % of analysts that revised their EPS estimate UP during the past 3 months

An_Down = % of analysts that revised their EPS estimate DOWN during the past 3 months

An_Unch = % of analysts that left their EPS estimate UNCHANGED during the past 3 months

An_Mark = FactSet calculated analyst revision measure

NRR = "An_Up" minus "An_Down"

An_Up_3M = "An_Up" minus "An_Up" 3 months ago

An_Down_3M = "An_Up" minus "An_Up" 3 months ago

An_Unch_3M = "An_Up" minus "An_Up" 3 months ago

An_Mark_3M = "An_Mark" minus "An_Mark" 3 months ago

NRR_3M = "NRR" minus "NRR" 3 months ago

An_Up_6M = "An_Up" minus "An_Up" 6 months ago

An_Down_6M = "An_Up" minus "An_Up" 6 months ago

An_Unch_6M = "An_Up" minus "An_Up" 6 months ago

An_Mark_6M = "An_Mark" minus "An_Mark" 6 months ago

NRR_6M = "NRR" minus "NRR" 6 months ago

ROIC = trailing 12 month return on invested capital (ROIC)

ROIC_1Y_Chg = 1 year change in ROIC

ROIC_SD = 5 year standard deviation of ROIC

ROE = trailing 12 month return on equity (ROE)

ROE_1Y_Chg = 1 year change in ROE

ROE_SD = 5 year standard deviation of ROE

FCF_Mgn = trailing 12 month free cash flow margin (FCF margin)

FCF_Mgn_1Y_Chg	 = 1 year change in FCF margin

FCF_Mgn_SD	= 5 year standard deviation of FCF margin

Op_Mgn = trailing 12 month operating margin (Op margin)

Op_Mgn_1Y_Chg	 = 1 year change in Op margin

Op_Mgn_SD	= 5 year standard deviation of Op margin

Return_1M = 1 month trailing price return

Return_3M = 3 month trailing price return

Return_6M = 6 month trailing price return

Market_Cap = current market capitalization

Sector = GICS sector classification

## RESPONSE VARIABLES SUMMARY (6 total)

Fwd_Best_EPS_6M = 6 month FORWARD consensus EPS growth

Fwd_Best_EPS_6M_v5Y = 6 month FORWARD consensus EPS growth / 5 year trailing 5 year consensus EPS growth

Fwd_ST_Accel_3M = FORWARD average of Best_EPS_3M_Rank, Best_EPS_6M_Rank and Best_EPS_1Y_Rank

Fwd_ST_Accel_Class_3M = FORWARD binary measure (1 or 0) if Best_EPS_3M > Best_EPS_6M AND Best_EPS_6M > Best_EPS_1Y or not

Fwd_Return_1M = 1 month FORWARD price return

Fwd_Return_3M = 3 month FORWARD price return

Fwd_Return_6M = 6 month FORWARD price return

## DATA QUALITY

•	FactSet was used to gather all of the data.  As SMU students, we are provided free licenses during our time in the program via the business library.  We attempted to get the data via FactSet's API but our student license did not provide such access so we used the FactSet excel add-in.  Given the reputation of FactSet in the marketplace we have high confidence in the accuracy of the data gathered. Upon gathering the data in excel, we used R code to format and create summary CSV files to be used during our analysis.

?????????? NAs, missing values......use median?  exclude????



## ANALYSIS

In [1]:
# import data
import pandas as pd
import numpy as np

df = pd.read_csv('FS_DATA_ALL_ML_ADJ_5Y.csv')

In [3]:
df.head()

Unnamed: 0,Date,ticker,Fwd_Best_EPS_6M,Fwd_Best_EPS_6M_v5Y,Fwd_ST_Accel_3M,Fwd_ST_Accel_Class_3M,Fwd_Return_1M,Fwd_Return_3M,Fwd_Return_6M,Mkt_Cap,...,ROE_SD,FCF_Mgn,FCF_Mgn_1Y_Chg,FCF_Mgn_SD,Op_Mgn,Op_Mgn_1Y_Chg,Op_Mgn_SD,Return_1M,Return_3M,Return_6M
0,12/30/2016,AAPL-US,-0.016685,0.70242,24.5,0.0,5.29269,24.571205,25.39444,608683.06,...,5.867324,24.352012,-5.922154,2.692903,26.63497041,-2.173891,2.801545,5.38671,2.981949,22.064745
1,12/30/2016,MSFT-US,0.054237,0.861139,26.7,0.0,5.857742,6.626642,12.223983,480342.2,...,8.228453,32.736822,6.862099,4.760858,26.62114882,-2.56037,4.109913,4.877639,8.596635,23.033035
2,12/30/2016,GOOGL-US,-0.045126,1.133432,10.866667,0.0,6.635117,6.984663,17.317177,547815.2,...,1.694577,28.77871,8.073658,4.200644,25.8288478,1.499313,2.621106,3.661406,-1.443922,11.573386
3,12/30/2016,AMZN-US,0.819502,-0.352573,91.666667,0.0,11.455321,18.225824,29.089046,357688.0,...,3.975144,7.137447,1.771002,2.150023,3.200305912,1.321354,0.867805,1.287246,-10.442966,3.333426
4,12/30/2016,TSLA-US,,,,0.0,18.372404,30.235374,69.221756,34523.973,...,73.716209,-22.346721,,91.4723,-10.19323636,,77.175347,17.754995,4.7346,-1.297915


In [2]:
# summary of variables
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 104800 entries, 0 to 104799
Data columns (total 60 columns):
 #   Column                 Non-Null Count   Dtype  
---  ------                 --------------   -----  
 0   Date                   104800 non-null  object 
 1   ticker                 104800 non-null  object 
 2   Fwd_Best_EPS_6M        93183 non-null   float64
 3   Fwd_Best_EPS_6M_v5Y    94385 non-null   float64
 4   Fwd_ST_Accel_3M        97928 non-null   float64
 5   Fwd_ST_Accel_Class_3M  99600 non-null   float64
 6   Fwd_Return_6M          94400 non-null   float64
 7   Fwd_Return_3M          99600 non-null   float64
 8   Fwd_Return_1M          103200 non-null  float64
 9   Mkt_Cap                104800 non-null  float64
 10  Sector                 104800 non-null  object 
 11  Best_EPS_1M            103850 non-null  float64
 12  Best_EPS_3M            103677 non-null  float64
 13  Best_EPS_6M            103495 non-null  float64
 14  Best_EPS_1Y            103299 non-nu

In [5]:
# Remove columns: Date, Ticker
# These columns will not aid in modelling efforts
if 'Date' in df:
    del df['Date']
    
if 'ticker' in df:
    del df['ticker']
    
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 104800 entries, 0 to 104799
Data columns (total 58 columns):
 #   Column                 Non-Null Count   Dtype  
---  ------                 --------------   -----  
 0   Fwd_Best_EPS_6M        93183 non-null   float64
 1   Fwd_Best_EPS_6M_v5Y    94385 non-null   float64
 2   Fwd_ST_Accel_3M        97928 non-null   float64
 3   Fwd_ST_Accel_Class_3M  99600 non-null   float64
 4   Fwd_Return_6M          94400 non-null   float64
 5   Fwd_Return_3M          99600 non-null   float64
 6   Fwd_Return_1M          103200 non-null  float64
 7   Mkt_Cap                104800 non-null  float64
 8   Sector                 104800 non-null  object 
 9   Best_EPS_1M            103850 non-null  float64
 10  Best_EPS_3M            103677 non-null  float64
 11  Best_EPS_6M            103495 non-null  float64
 12  Best_EPS_1Y            103299 non-null  float64
 13  Best_EPS_3Y            103123 non-null  float64
 14  Best_EPS_5Y            102083 non-nu

In [6]:
df.describe()

Unnamed: 0,Fwd_Best_EPS_6M,Fwd_Best_EPS_6M_v5Y,Fwd_ST_Accel_3M,Fwd_ST_Accel_Class_3M,Fwd_Return_6M,Fwd_Return_3M,Fwd_Return_1M,Mkt_Cap,Best_EPS_1M,Best_EPS_3M,...,ROE_1Y_Chg,ROE_SD,FCF_Mgn,FCF_Mgn_1Y_Chg,FCF_Mgn_SD,Op_Mgn_1Y_Chg,Op_Mgn_SD,Return_1M,Return_3M,Return_6M
count,93183.0,94385.0,97928.0,99600.0,94400.0,99600.0,103200.0,104800.0,103850.0,103677.0,...,99620.0,103910.0,104381.0,104246.0,104535.0,104192.0,104276.0,104800.0,104800.0,104800.0
mean,0.286165,0.003789,49.799837,0.17751,11.260001,5.49743,1.710202,41331.16,0.248863,0.278318,...,-12.656894,31.014054,14.521395,0.846845,6.99909,0.567949,3.377957,1.693736,5.686946,11.521709
std,1.944933,1.058665,24.659762,0.382102,25.704648,16.877075,9.315008,136057.9,2.974737,2.214365,...,724.818312,368.797328,24.862584,34.759384,50.385462,7.987797,19.293566,9.264469,16.715521,25.085734
min,-1.989822,-254.140127,0.0,0.0,-84.9294,-85.844154,-85.27027,228.5343,-11.863636,-3.961353,...,-30933.25581,0.131123,-287.128563,-920.498721,0.171153,-180.859303,0.040187,-85.27027,-85.844154,-84.9294
25%,-0.006135,0.0,31.066667,0.0,-4.076254,-3.94778,-3.005865,3641.366,0.0,-0.0125,...,-3.290448,2.187613,5.194971,-2.269922,1.798806,-0.983081,0.917666,-2.98309,-3.677721,-3.453836
50%,0.11933,0.0,49.666667,0.0,8.857542,4.821521,1.718861,9664.999,0.0,0.08125,...,0.289353,4.248912,11.287048,0.572506,2.951995,0.248732,1.701807,1.691985,5.032736,9.262103
75%,0.354839,0.0,68.666667,0.0,22.810762,13.960698,6.330228,27895.0,0.09129,0.356599,...,4.404052,9.487071,21.777607,3.815235,4.954279,1.920652,3.226768,6.271484,14.122495,23.006031
max,258.0,149.667379,100.0,1.0,417.09818,255.21368,139.28723,2947786.0,288.385852,284.0,...,6544.700305,12579.83072,707.285829,1336.962231,12553.67782,242.811441,927.418155,139.28723,255.21368,417.09818


In [7]:
# lets look at some stats of the data
df.median() # only calculates for numeric data

  df.median() # only calculates for numeric data


Fwd_Best_EPS_6M             0.119330
Fwd_Best_EPS_6M_v5Y         0.000000
Fwd_ST_Accel_3M            49.666667
Fwd_ST_Accel_Class_3M       0.000000
Fwd_Return_6M               8.857542
Fwd_Return_3M               4.821521
Fwd_Return_1M               1.718861
Mkt_Cap                  9664.999500
Best_EPS_1M                 0.000000
Best_EPS_3M                 0.081250
Best_EPS_6M                 0.121951
Best_EPS_1Y                 0.132812
Best_EPS_3Y                 0.131827
Best_EPS_5Y                 0.138798
Best_EPS_3M_v5Y             0.493125
Best_EPS_6M_v5Y             0.751213
Best_EPS_1Y_v5Y             0.821586
Best_EPS_3Y_v5Y             0.852350
Best_EPS_3M_Rank            0.500000
Best_EPS_6M_Rank            0.500000
Best_EPS_1Y_Rank            0.500000
ST_Accel                   49.666667
ST_Accel_Class              0.000000
Best_EPS_5Y_R2              0.857176
Best_EPS_5Y_R2_Rank        13.000000
Best_EPS_5Y_SD              0.601108
Best_EPS_5Y_Stability      15.361495
N

In [10]:
# count NA's
df.isna().sum()

Fwd_Best_EPS_6M          11617
Fwd_Best_EPS_6M_v5Y      10415
Fwd_ST_Accel_3M           6872
Fwd_ST_Accel_Class_3M     5200
Fwd_Return_1M             1600
Fwd_Return_3M             5200
Fwd_Return_6M            10400
Mkt_Cap                      0
Sector                       0
Best_EPS_1M                950
Best_EPS_3M               1123
Best_EPS_6M               1305
Best_EPS_1Y               1501
Best_EPS_3Y               1677
Best_EPS_5Y               2717
Best_EPS_3M_v5Y          13706
Best_EPS_6M_v5Y          13802
Best_EPS_1Y_v5Y          13913
Best_EPS_3Y_v5Y          13850
Best_EPS_3M_Rank          1123
Best_EPS_6M_Rank          1305
Best_EPS_1Y_Rank          1501
ST_Accel                  1752
ST_Accel_Class               0
Best_EPS_5Y_R2             486
Best_EPS_5Y_R2_Rank        486
Best_EPS_5Y_SD             486
Best_EPS_5Y_Stability      686
NRR_6M                      11
An_Mark_6M                  57
An_Unch_6M                  11
An_Down_6M                  11
An_Up_6M

# BAMBOOLIB ANALYSIS

In [11]:
import bamboolib as bam
import pandas as pd
df = pd.read_csv(r'C:\Users\jeffn\ML1\FS_DATA_ALL_ML_ADJ_5Y.csv', sep=',', decimal='.')
df['Op_Mgn'] = pd.to_numeric(df['Op_Mgn'], downcast='float', errors='coerce')
df['Fwd_ST_Accel_Class_3M'] = df['Fwd_ST_Accel_Class_3M'].astype('category')
df['ST_Accel_Class'] = df['ST_Accel_Class'].astype('category')

df

### outliers???
### what to do with NA's??? impute median???
### scale values???

Unnamed: 0,Date,ticker,Fwd_Best_EPS_6M,Fwd_Best_EPS_6M_v5Y,Fwd_ST_Accel_3M,Fwd_ST_Accel_Class_3M,Fwd_Return_6M,Fwd_Return_3M,Fwd_Return_1M,Mkt_Cap,...,ROE_SD,FCF_Mgn,FCF_Mgn_1Y_Chg,FCF_Mgn_SD,Op_Mgn,Op_Mgn_1Y_Chg,Op_Mgn_SD,Return_1M,Return_3M,Return_6M
0,12/30/2016,AAPL-US,-0.016685,-0.248072,24.500000,0.0,25.394440,24.571205,5.292690,608683.060,...,5.86732,24.35201,-5.92215,2.69290,26.634970,-2.17389,2.80155,5.38671,2.98195,22.06475
1,12/30/2016,MSFT-US,0.054237,2.036492,26.700000,0.0,12.223983,6.626642,5.857742,480342.200,...,8.22845,32.73682,6.86210,4.76086,26.621149,-2.56037,4.10991,4.87764,8.59664,23.03304
2,12/30/2016,GOOGL-US,-0.045126,-0.396018,10.866667,0.0,17.317177,6.984663,6.635117,547815.200,...,1.69458,28.77871,8.07366,4.20064,25.828850,1.49931,2.62111,3.66141,-1.44392,11.57339
3,12/30/2016,AMZN-US,0.819502,0.817332,91.666667,0.0,29.089046,18.225824,11.455321,357688.000,...,3.97514,7.13745,1.77100,2.15002,3.200310,1.32135,0.86781,1.28725,-10.44297,3.33343
4,12/30/2016,TSLA-US,,,,0.0,69.221756,30.235374,18.372404,34523.973,...,73.71621,-22.34672,35.02301,91.47230,-10.193240,3.82712,77.17535,17.75500,4.73460,-1.29792
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
104795,,,,,,,,,,,...,,,,,,,,,,
104796,,,,,,,,,,,...,,,,,,,,,,
104797,,,,,,,,,,,...,,,,,,,,,,
104798,,,,,,,,,,,...,,,,,,,,,,


In [12]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 104800 entries, 0 to 104799
Data columns (total 60 columns):
 #   Column                 Non-Null Count  Dtype   
---  ------                 --------------  -----   
 0   Date                   99600 non-null  object  
 1   ticker                 99600 non-null  object  
 2   Fwd_Best_EPS_6M        93183 non-null  float64 
 3   Fwd_Best_EPS_6M_v5Y    82523 non-null  float64 
 4   Fwd_ST_Accel_3M        97928 non-null  float64 
 5   Fwd_ST_Accel_Class_3M  99600 non-null  category
 6   Fwd_Return_6M          94400 non-null  float64 
 7   Fwd_Return_3M          99600 non-null  float64 
 8   Fwd_Return_1M          99600 non-null  float64 
 9   Mkt_Cap                99600 non-null  float64 
 10  Sector                 99600 non-null  object  
 11  Best_EPS_1M            98671 non-null  float64 
 12  Best_EPS_3M            98503 non-null  float64 
 13  Best_EPS_6M            98312 non-null  float64 
 14  Best_EPS_1Y            98231 non-nul

In [None]:
# visualization