<a href="https://colab.research.google.com/github/Ingy10/Quantitative-Stock-Strategy-Analysis/blob/main/Quantitative%20Strategy%20Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [231]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


Objective: Exploring the Viability of Active Investing Strategies in the Era of $0 Trading Commissions

In this analysis, we import a dataset containing the performance of 56 well-known investment strategies from 1998 to 2024. These strategies have been tracked and quantified by the American Association of Individual Investors (AAII). The goal is to compare their monthly performance against the S&P 500 to evaluate whether active investment strategies might offer advantages for retail investors in today's market environment.

Historically, renowned investors like Warren Buffett have recommended that retail investors focus on low-cost market ETFs (such as those tracking the S&P 500). One key reason for this advice was that active investing strategies often incurred significant trading costs, which could eat into returns. However, with the advent of $0 trading commissions on many brokerage platforms, this cost barrier has effectively disappeared. This shift prompts a fresh question: Could active investing strategies now be more feasible and profitable for retail investors?

Retail investors, who typically have other jobs and responsibilities, are not full-time investors. As a result, they cannot dedicate as much time to monitoring the markets or reacting to short-term fluctuations. My analysis will focus on identifying active strategies that can be effectively implemented by retail investors who may only be able to update their portfolios on a monthly basis. The objective is to determine whether such strategies could outperform the S&P 500 in this new era of low-cost investing, potentially offering opportunities for retail investors to achieve alpha—excess returns above the market benchmark.

In [236]:
import pandas as pd;
df = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/capstone1/Quantitative_strategies_performance_table_Capstone1.csv');

I have set the date as the index and printed a cursory look at the data.

In [237]:
df = df.rename(columns={'date_': 'Date'});
df = df.set_index('Date');
print(df.head());

            ADR Screen  All Stocks  Buffett: Hagstrom Screen  \
Date                                                           
1998-01-31        2.99        6.22                      1.48   
1998-02-28        5.54        6.60                     10.70   
1998-03-31        4.23        7.69                     -0.13   
1998-04-30       -0.97        4.09                      3.29   
1998-05-31       -2.24       -2.49                     -4.37   

            Buffettology: EPS Growth  Buffettology: Sustainable Growth Screen  \
Date                                                                            
1998-01-31                      0.81                                     2.16   
1998-02-28                      6.22                                     7.81   
1998-03-31                     -1.01                                    -0.80   
1998-04-30                      1.50                                     2.16   
1998-05-31                     -5.41                             

Here I am checking data types.  All are floats and I can see there are some null values present.

In [238]:
df.info();

<class 'pandas.core.frame.DataFrame'>
Index: 321 entries, 1998-01-31 to 2024-09-30
Data columns (total 56 columns):
 #   Column                                           Non-Null Count  Dtype  
---  ------                                           --------------  -----  
 0   ADR Screen                                       321 non-null    float64
 1   All Stocks                                       250 non-null    float64
 2   Buffett: Hagstrom Screen                         321 non-null    float64
 3   Buffettology: EPS Growth                         321 non-null    float64
 4   Buffettology: Sustainable Growth Screen          321 non-null    float64
 5   Cash Rich Firms Screen                           321 non-null    float64
 6   Dividend (High Relative Yield) Screen            321 non-null    float64
 7   Dogs of the Dow Screen                           321 non-null    float64
 8   Dogs of the Dow: Low Priced 5 Screen             321 non-null    float64
 9   Dreman Screen        

Now I will look to see how many null values exist in each screen.

In [239]:
print(df.isnull().sum());

ADR Screen                                          0
All Stocks                                         71
Buffett: Hagstrom Screen                            0
Buffettology: EPS Growth                            0
Buffettology: Sustainable Growth Screen             0
Cash Rich Firms Screen                              0
Dividend (High Relative Yield) Screen               0
Dogs of the Dow Screen                              0
Dogs of the Dow: Low Priced 5 Screen                0
Dreman Screen                                       0
Dreman With Est Revisions Screen                    0
Driehaus Revised Screen                            12
Driehaus Screen                                    12
Dual Cash Flow Screen                               0
Est Rev: Down 5% Screen                             0
Est Rev: Lowest 30 Down                             0
Est Rev: Top 30 Up                                  0
Est Rev: Up 5% Screen                               0
Fisher (Philip) Screen      

Given that the 'All Stocks' is not tracked for more recent dates and is an outlier in terms of missing data points I will drop it from this dataset.

In [240]:
df = df.drop(columns=['All Stocks']);

With this data set I know that certain screens do not have data from the earliest dates. AAII started tracking certain screens after the initial start of the data set in 1998.  These screens started being tracked a year or two later.  Therefore I will drop all rows from those earliest dates so I will have a complete data set.  This will be assigned as a data frame to the variable 'df_screens'.

In [241]:
df_screens = df.dropna();

Now I will describe the data to get key insights from both the screens and dates.  Because there are so many screens and dates I will simply print the average of these metrics for all dates and all screens to keep things clean.  Remove mean() from either expression below to see data for individual dates or screens.

In [242]:
print(df_screens.T.mean().describe().head());
print(df_screens.mean().describe().head());

count    309.000000
mean       1.095655
std        5.183251
min      -23.299091
25%       -1.940727
dtype: float64
count    55.000000
mean      1.095655
std       0.422567
min       0.122524
25%       0.784207
dtype: float64


Set index 'Date' to datetime format.

In [243]:
df_screens.index = pd.to_datetime(df_screens.index);

To provide a benchmark to compare these strategies, I will take an average of the performance of all the screens and assign it to 'Average Performance'.

In [244]:
df_screens = df_screens.copy();
df_screens['Average Performance'] = df_screens.mean(axis=1);
print(df_screens['Average Performance']);

Date
1999-01-31    3.121273
1999-02-28   -4.845636
1999-03-31    0.074727
1999-04-30    7.922909
1999-05-31    2.706182
                ...   
2024-05-31    4.738000
2024-06-28   -2.554182
2024-07-31    5.810000
2024-08-30   -0.252182
2024-09-30    2.732727
Name: Average Performance, Length: 309, dtype: float64


As a second benchmark I will bring in data for the SP500.

In [245]:
df_sp500 = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/capstone1/SP500 monthly performance 1998 - 2024 Capstone1.csv');
df_sp500 = df_sp500.rename(columns={'Unnamed: 0': 'Date', 'Unnamed: 1': 'SP500 Performance'});
print(df_sp500.head());

         Date  SP500 Performance
0  1998-01-31               0.54
1  1998-02-28               4.80
2  1998-03-31               5.16
3  1998-04-30               0.32
4  1998-05-31              -2.69


Set date as index.

In [246]:
df_sp500 = df_sp500.set_index('Date');
print(df_sp500.head());

            SP500 Performance
Date                         
1998-01-31               0.54
1998-02-28               4.80
1998-03-31               5.16
1998-04-30               0.32
1998-05-31              -2.69


Ensure the index is in datetime format.

In [247]:
df_sp500.index = pd.to_datetime(df_sp500.index);
print(df_sp500);

            SP500 Performance
Date                         
1998-01-31               0.54
1998-02-28               4.80
1998-03-31               5.16
1998-04-30               0.32
1998-05-31              -2.69
...                       ...
2024-05-31               4.94
2024-06-28               3.08
2024-07-31               0.94
2024-08-30               2.00
2024-09-30               2.46

[321 rows x 1 columns]


Trim SP500 dataset so it matches the date ranges of the screens being analyzed.

In [248]:
df_sp500 = df_sp500.loc['1999-01-01':];
print(df_sp500);

            SP500 Performance
Date                         
1999-01-31               4.20
1999-02-28              -2.72
1999-03-31               4.06
1999-04-30               3.20
1999-05-31              -3.90
...                       ...
2024-05-31               4.94
2024-06-28               3.08
2024-07-31               0.94
2024-08-30               2.00
2024-09-30               2.46

[309 rows x 1 columns]


Combine into single data frame named 'df_combined'.

In [249]:
df_sp500.index = pd.to_datetime(df_sp500.index);
df_screens.index = pd.to_datetime(df_screens.index);
df_combined = pd.concat([df_screens, df_sp500], axis=1);
print(df_combined);

            ADR Screen  Buffett: Hagstrom Screen  Buffettology: EPS Growth  \
Date                                                                         
1999-01-31        3.87                      7.96                      2.99   
1999-02-28       -4.04                     -5.72                     -9.88   
1999-03-31       -4.85                      5.56                     -2.61   
1999-04-30       12.30                      5.44                      9.23   
1999-05-31       -6.25                     -0.20                      1.10   
...                ...                       ...                       ...   
2024-05-31        5.38                      1.76                      6.13   
2024-06-28       -6.42                     -2.42                     -0.39   
2024-07-31        3.62                     10.54                      8.14   
2024-08-30        3.81                     -0.75                     -1.79   
2024-09-30        1.84                      2.78                

We can now see the cleaned and combined data set with both benchmarks included!

In [250]:
df_combined.info();

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 309 entries, 1999-01-31 to 2024-09-30
Data columns (total 57 columns):
 #   Column                                           Non-Null Count  Dtype  
---  ------                                           --------------  -----  
 0   ADR Screen                                       309 non-null    float64
 1   Buffett: Hagstrom Screen                         309 non-null    float64
 2   Buffettology: EPS Growth                         309 non-null    float64
 3   Buffettology: Sustainable Growth Screen          309 non-null    float64
 4   Cash Rich Firms Screen                           309 non-null    float64
 5   Dividend (High Relative Yield) Screen            309 non-null    float64
 6   Dogs of the Dow Screen                           309 non-null    float64
 7   Dogs of the Dow: Low Priced 5 Screen             309 non-null    float64
 8   Dreman Screen                                    309 non-null    float64
 9   Dreman With E

I will now analyze the stock screens to answer key questions related to active investing in the context of outperforming the S&P 500:

Macro Question: Does using stock screening strategies provide a significant advantage over simply investing in a low-cost S&P 500 ETF? If active investing through stock screens has historically provided better returns, how can retail investors identify which screens to use and when to invest in them?

1. On average do screens provide better returns than the SP500?
2. What are the best performing screens:
  1. What are the top performing screens by average/median monthly return?
  2. What are the top performing screens by most months showing outperformance of benchmarks?
3. Momentum:
  1. Is there a momentum factor associated with the performance of these stock screens?
  2. Is there a momentum factor associated with SP500 performance?
  3. Does the likelihood of momentum increase with longer streaks of outperformance or underperformance?
  4. Are higher performing screens more or less likely to have a momentum factor?

Best Performing screens by mean and median monthly return.

In [251]:
avg_returns = df_combined.mean(axis=0);
median_returns = df_combined.median(axis=0);
print(avg_returns.sort_values(ascending=False));
print(median_returns.sort_values(ascending=False));

O'Shaughnessy: Tiny Titans Screen                  2.116537
Est Rev: Up 5% Screen                              1.867638
Est Rev: Top 30 Up                                 1.841553
Driehaus Revised Screen                            1.836699
O'Neil's CAN SLIM Screen                           1.727120
O'Shaughnessy: Small Cap Growth & Value Screen     1.674207
Price-to-Free-Cash-Flow Screen                     1.613107
Piotroski: High F-Score Screen                     1.556570
O'Shaughnessy: Growth Screen II                    1.529061
O'Neil's CAN SLIM Revised 3rd Edition Screen       1.521327
Kirkpatrick Growth Screen                          1.424175
Stock Market Winners Screen                        1.421586
Driehaus Screen                                    1.405081
Dreman With Est Revisions Screen                   1.362848
Neff Screen                                        1.353333
Graham--Enterprising Investor Revised              1.352913
Value on the Move--PEG With Est Growth S

Now we can plot this data into a bar chart to better visualize screener performance vs. benchmarks, especially the SP500 benchmark.

In [252]:
from bokeh.plotting import figure, show
from bokeh.io import output_notebook
from bokeh.models import ColumnDataSource, HoverTool
from bokeh.transform import factor_cmap
from bokeh.palettes import Spectral6

output_notebook()

sorted_avg_returns = avg_returns.sort_values(ascending=False);

screens = sorted_avg_returns.index.tolist()
returns = sorted_avg_returns.values

colors = ['#ff0000' if screen in ['SP500 Performance', 'Average Performance'] else '#1f77b4' for screen in screens]

source = ColumnDataSource(data=dict(
    screens=screens,
    returns=returns,
    colors=colors
))

p = figure(x_range=screens, height=600, width=900, title="Mean Monthly Return of Each Screen and Benchmark",
           toolbar_location=None, tools="", x_axis_label='Screens/Benchmarks', y_axis_label='Mean Monthly Return (%)')

p.vbar(x='screens', top='returns', width=0.8, source=source,
       fill_color='colors')

hover = HoverTool()
hover.tooltips = [("Screen", "@screens"), ("Mean Return (%)", "@returns{0.00}")]
p.add_tools(hover)

p.xaxis.major_label_orientation = "vertical"

p.xgrid.grid_line_color = None
p.y_range.start = 0

show(p);

In [253]:
output_notebook()

sorted_median_returns = median_returns.sort_values(ascending=False);

screens = sorted_median_returns.index.tolist()
returns = sorted_median_returns.values

colors = ['#ff0000' if screen in ['SP500 Performance', 'Average Performance'] else '#1f77b4' for screen in screens]

source = ColumnDataSource(data=dict(
    screens=screens,
    returns=returns,
    colors=colors
))

p = figure(x_range=screens, height=600, width=900, title="Median Monthly Return of Each Screen and Benchmark",
           toolbar_location=None, tools="", x_axis_label='Screens/Benchmarks', y_axis_label='Median Monthly Return (%)')

p.vbar(x='screens', top='returns', width=0.8, source=source,
       fill_color='colors')

hover = HoverTool()
hover.tooltips = [("Screen", "@screens"), ("Median Return (%)", "@returns{0.00}")]
p.add_tools(hover)

p.xaxis.major_label_orientation = "vertical"

p.xgrid.grid_line_color = None
p.y_range.start = 0

show(p);

After comparing sorted mean returns and median returns we can see a few things:

1. On both a mean and meadian comparison we see screens definitively outperform the SP500.

2. Less screens outperform the SP500 average on a median basis.  This could suggest the presence of outliers as well as a higher standard deviation (volatility) among screens on average.  

What does this mean?
Ideally, as an investor I would want screens that not only outperform in aggregate but also outperform on a more regular basis.  Therefore I would also like to see how often a screen outperforms the SP500 benchmark.

In [254]:
outperformance_df = df_combined.iloc[:, :-1].gt(df_combined['SP500 Performance'], axis=0)

outperformance_percentage = (outperformance_df.mean() * 100).sort_values(ascending=False)

summary_table = pd.DataFrame({
    'Outperform Percentage (%)': outperformance_percentage
})

print(summary_table.head());

                                   Outperform Percentage (%)
O'Shaughnessy: Tiny Titans Screen                  61.812298
Est Rev: Top 30 Up                                 61.488673
Buffett: Hagstrom Screen                           59.870550
Price-to-Sales Screen                              59.546926
Est Rev: Up 5% Screen                              59.546926


In [255]:
output_notebook()

outperform_sorted = summary_table['Outperform Percentage (%)'].sort_values(ascending=False)

screens = outperform_sorted.index.tolist()
outperform_percentage = outperform_sorted.values - 50
total_outperform_percentage = outperform_sorted.values

colors = ['#ff0000' if screen in ['SP500 Performance', 'Average Performance'] else '#1f77b4' for screen in screens]

source = ColumnDataSource(data=dict(
    screens=screens,
    outperform_percentage=outperform_percentage,
    colors=colors,
    total_outperform_percentage=total_outperform_percentage
))

p = figure(x_range=screens, height=600, width=900, title="Outperformance Rate of Each Screen Relative to SP500 (%)",
           toolbar_location=None, tools="", x_axis_label='Screens', y_axis_label='Outperformance Rate (%)')

bars = p.vbar(x='screens', top='outperform_percentage', width=0.8, source=source,
       fill_color='colors')

hover = HoverTool()
hover.tooltips = [("Screen", "@screens"), ("Outperformance Rate of SP500 (%)", "@total_outperform_percentage{0.00}")]
p.add_tools(hover)

p.xaxis.major_label_orientation = "vertical"

p.xgrid.grid_line_color = None
p.y_range.start = -10
p.y_range.end = 20

show(p)

These results show that not only do the screens outperform the market on average over time, but they also demonstrate superior performance on a median basis. Additionally, the data indicates that, on average, most screens outperform the market on a monthly basis, with a success rate of 58.25%.

Given this revelation I would like to create a bucket of 'Outperforming Screens' that will meet the following criteria:
1. Outperform the SP500 on a cumulative average basis.
2. Outperform the SP500 on a cumulative median basis.
3. Outperform the market on a monthly basis more than 50% of the time.

Stocks not meeting this criteria will be grouped into a second bucket named 'Underperforming Screens'

*I also ensured that neither 'SP500 Performance' or 'Average Performance' would not be included in these buckets.

In [256]:
summary_returns = pd.concat([avg_returns, median_returns, outperformance_percentage], axis=1).fillna(0);
summary_returns.columns = ['Mean Monthly Return %', 'Median Monthly Return %', 'Outperform SP500 Monthly Basis Success Rate %'];

SP500_mean_monthly_return = avg_returns['SP500 Performance'];
SP500_median_monthly_return = median_returns['SP500 Performance'];

outperforming_screens = summary_returns[
    (summary_returns['Mean Monthly Return %'] > SP500_mean_monthly_return) &
    (summary_returns['Median Monthly Return %'] > SP500_median_monthly_return) &
    (summary_returns['Outperform SP500 Monthly Basis Success Rate %'] > 50)
].drop('Average Performance');

underperforming_screens = summary_returns[
    (summary_returns['Mean Monthly Return %'] <= SP500_mean_monthly_return) |
    (summary_returns['Median Monthly Return %'] <= SP500_median_monthly_return) |
    (summary_returns['Outperform SP500 Monthly Basis Success Rate %'] <= 50)
].drop('SP500 Performance');

print(outperforming_screens.describe());
print(underperforming_screens.describe());

       Mean Monthly Return %  Median Monthly Return %  \
count              39.000000                39.000000   
mean                1.198263                 1.326923   
std                 0.355147                 0.387815   
min                 0.648673                 0.850000   
25%                 0.956408                 1.050000   
50%                 1.153042                 1.210000   
75%                 1.383964                 1.505000   
max                 2.116537                 2.370000   

       Outperform SP500 Monthly Basis Success Rate %  
count                                      39.000000  
mean                                       55.572152  
std                                         3.448138  
min                                        50.161812  
25%                                        52.427184  
50%                                        56.310680  
75%                                        58.252427  
max                                        61.

It would be nice to visualize these buckets on a line chart to see how their performance varies over time.  First I will bucket the groups and determine the cumulative % gains, along with the all screens average and SP500.

In [257]:
from bokeh.plotting import figure, show
from bokeh.io import output_notebook
from bokeh.models import ColumnDataSource, HoverTool, CrosshairTool
from bokeh.palettes import Category10
output_notebook()

df_cumulative = (1 + df_combined / 100).cumprod() - 1

initial_value = 10000
df_portfolio = initial_value * (1 + df_combined / 100).cumprod()

outperforming_portfolio = df_cumulative[outperforming_screens.index].mean(axis=1)
underperforming_portfolio = df_cumulative[underperforming_screens.index].mean(axis=1)
SP500_portfolio = df_cumulative['SP500 Performance']
avg_screen_portfolio = df_cumulative['Average Performance']

df_bucketed_mean = pd.concat([outperforming_portfolio, underperforming_portfolio, SP500_portfolio, avg_screen_portfolio], axis=1)
df_bucketed_mean.columns = ['Outperforming Screens', 'Underperforming Screens', 'SP500', 'All Screens Mean']

df_combined



Unnamed: 0_level_0,ADR Screen,Buffett: Hagstrom Screen,Buffettology: EPS Growth,Buffettology: Sustainable Growth Screen,Cash Rich Firms Screen,Dividend (High Relative Yield) Screen,Dogs of the Dow Screen,Dogs of the Dow: Low Priced 5 Screen,Dreman Screen,Dreman With Est Revisions Screen,...,Stock Market Winners Screen,T. Rowe Price Screen,Templeton Screen,Value on the Move--PEG With Est Growth Screen,Value on the Move--PEG With Hist Growth Screen,Wanger (Revised) Screen,Weiss Blue Chip Div Yield Screen,Zweig Screen,Average Performance,SP500 Performance
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1999-01-31,3.87,7.96,2.99,4.81,3.21,0.03,-1.29,0.16,-7.05,-2.22,...,-3.42,0.25,0.08,3.19,0.57,1.50,-6.56,9.43,3.121273,4.20
1999-02-28,-4.04,-5.72,-9.88,-10.49,-9.79,-3.26,-1.07,-2.19,1.16,-3.68,...,10.31,-4.42,-3.92,-7.22,-6.25,-8.00,-5.25,-1.35,-4.845636,-2.72
1999-03-31,-4.85,5.56,-2.61,-2.31,-0.33,-1.24,4.16,1.71,-1.07,2.11,...,-4.78,-8.30,0.64,-4.56,-3.26,-2.38,-1.20,-9.19,0.074727,4.06
1999-04-30,12.30,5.44,9.23,9.12,6.20,9.10,16.12,18.74,12.63,4.89,...,-0.62,9.34,7.04,9.10,6.66,10.75,12.52,13.15,7.922909,3.20
1999-05-31,-6.25,-0.20,1.10,2.63,9.19,1.57,-3.26,0.27,8.63,0.45,...,6.62,1.89,6.86,0.55,0.49,4.88,3.54,-1.65,2.706182,-3.90
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2024-05-31,5.38,1.76,6.13,4.67,1.77,2.59,1.57,-0.43,2.05,3.13,...,12.10,-1.65,5.60,8.67,4.89,-3.24,1.83,5.72,4.738000,4.94
2024-06-28,-6.42,-2.42,-0.39,-1.57,-2.35,2.13,-2.60,-5.97,-3.45,-5.38,...,-4.16,-7.25,-1.44,-0.81,-0.82,0.00,4.46,1.94,-2.554182,3.08
2024-07-31,3.62,10.54,8.14,7.96,9.42,15.48,5.88,1.81,6.73,6.13,...,10.58,-8.23,6.19,-0.72,8.00,8.80,14.08,-15.74,5.810000,0.94
2024-08-30,3.81,-0.75,-1.79,-1.65,-3.60,-0.51,-0.01,-2.24,1.82,-0.71,...,2.47,-5.07,2.25,2.02,1.26,0.00,-5.38,0.72,-0.252182,2.00


The following line graph will display cumulative monthly returns for the different investment strategies. While this method provides a useful way to visualize and compare the performance of each strategy relative to one another, it does not represent a true cumulative return. A more accurate cumulative return would require daily compounding of the returns. Despite this limitation, cumulative monthly returns still offer valuable insights into the overall trends and relative performance of these strategies over time.

In [258]:
chart = figure(title='Cumulative Performance of Screens & Benchmarks', x_axis_label='Date', y_axis_label='Cumulative Monthly Performance (%)', x_axis_type='datetime', toolbar_location='above')

source = ColumnDataSource(df_bucketed_mean.multiply(100).round(2))

colors = Category10[len(df_bucketed_mean.columns)]

for column, color in zip(df_bucketed_mean.columns, colors):
    chart.line(x='Date', y=column, line_color=color, line_width=2,
           legend_label=column, source=source)

chart.legend.location='top_left'
chart.legend.click_policy='hide'

chart.add_tools(HoverTool(
    tooltips=[
        ('Date', '@Date{%F}'),
        ('Gain %', '$y{0.00}')
    ],
    formatters={
        '@Date': 'datetime'
    }
), CrosshairTool())

show(chart)

We can confidently observe that regardless of whether the stock screens consistently outperform the set benchmark (S&P 500) on a monthly basis, many of them still demonstrate cumulative monthly outperformance over the S&P 500. Which is likely due to having some months with very high performance. This insight already offers valuable guidance for retail investors aiming to achieve alpha, or consistent excess returns, by utilizing stock screens.

However, while identifying outperforming screens is a strong start, the next logical step is determining when to invest in these strategies to maximize success. Timing can be crucial, especially given that momentum—the tendency of assets that have performed well to continue performing well in the short term—is widely regarded as an important factor in investment strategy. In this analysis, we aim to investigate whether momentum plays a role in the performance of these stock screens.

By answering this, we hope to provide retail investors not only with a list of potentially outperforming strategies but also with guidance on the optimal timing for entering and exiting these investments.

Lets begin with a general momentum analysis to see if screeners that have outperformed/underperformed the SP500 in a given month are more or less likely to continue the trend or mean revert in the following month.

In [261]:
df_combined_relative = pd.DataFrame(index=df_combined.index)

for column in df_combined.columns:
    if column != 'SP500 Performance':
        df_combined_relative[f'{column}_relative'] = df_combined[column] - df_combined['SP500 Performance']


df_combined_relative

Unnamed: 0_level_0,ADR Screen_relative,Buffett: Hagstrom Screen_relative,Buffettology: EPS Growth_relative,Buffettology: Sustainable Growth Screen_relative,Cash Rich Firms Screen_relative,Dividend (High Relative Yield) Screen_relative,Dogs of the Dow Screen_relative,Dogs of the Dow: Low Priced 5 Screen_relative,Dreman Screen_relative,Dreman With Est Revisions Screen_relative,...,Schloss Screen_relative,Stock Market Winners Screen_relative,T. Rowe Price Screen_relative,Templeton Screen_relative,Value on the Move--PEG With Est Growth Screen_relative,Value on the Move--PEG With Hist Growth Screen_relative,Wanger (Revised) Screen_relative,Weiss Blue Chip Div Yield Screen_relative,Zweig Screen_relative,Average Performance_relative
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1999-01-31,-0.33,3.76,-1.21,0.61,-0.99,-4.17,-5.49,-4.04,-11.25,-6.42,...,21.53,-7.62,-3.95,-4.12,-1.01,-3.63,-2.70,-10.76,5.23,-1.078727
1999-02-28,-1.32,-3.00,-7.16,-7.77,-7.07,-0.54,1.65,0.53,3.88,-0.96,...,5.79,13.03,-1.70,-1.20,-4.50,-3.53,-5.28,-2.53,1.37,-2.125636
1999-03-31,-8.91,1.50,-6.67,-6.37,-4.39,-5.30,0.10,-2.35,-5.13,-1.95,...,-8.04,-8.84,-12.36,-3.42,-8.62,-7.32,-6.44,-5.26,-13.25,-3.985273
1999-04-30,9.10,2.24,6.03,5.92,3.00,5.90,12.92,15.54,9.43,1.69,...,-1.87,-3.82,6.14,3.84,5.90,3.46,7.55,9.32,9.95,4.722909
1999-05-31,-2.35,3.70,5.00,6.53,13.09,5.47,0.64,4.17,12.53,4.35,...,8.78,10.52,5.79,10.76,4.45,4.39,8.78,7.44,2.25,6.606182
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2024-05-31,0.44,-3.18,1.19,-0.27,-3.17,-2.35,-3.37,-5.37,-2.89,-1.81,...,0.91,7.16,-6.59,0.66,3.73,-0.05,-8.18,-3.11,0.78,-0.202000
2024-06-28,-9.50,-5.50,-3.47,-4.65,-5.43,-0.95,-5.68,-9.05,-6.53,-8.46,...,-4.74,-7.24,-10.33,-4.52,-3.89,-3.90,-3.08,1.38,-1.14,-5.634182
2024-07-31,2.68,9.60,7.20,7.02,8.48,14.54,4.94,0.87,5.79,5.19,...,9.09,9.64,-9.17,5.25,-1.66,7.06,7.86,13.14,-16.68,4.870000
2024-08-30,1.81,-2.75,-3.79,-3.65,-5.60,-2.51,-2.01,-4.24,-0.18,-2.71,...,-15.83,0.47,-7.07,0.25,0.02,-0.74,-2.00,-7.38,-1.28,-2.252182


In [274]:
def check_momentum(screen):
    momentum = 0
    mean_revert = 0
    for i in range(len(screen) - 1):
      current_month = screen.iloc[i]
      next_month = screen.iloc[i + 1]

      if (current_month > 0 and next_month > 0) or (current_month < 0 and next_month < 0) or (current_month == 0 and next_month == 0):
          momentum += 1
      else:
          mean_revert += 1
    return momentum / (momentum + mean_revert)

momentum_likelihood = df_combined_relative.apply(check_momentum)

momentum_likelihood.mean()


0.5271915584415584

In [275]:
def assess_momentum_streaks(df):
    streak_lengths = [1, 2, 3, 4, 5]  # adjust this to check longer streaks
    results = {}

    for column in df.columns:
        momentum_likelihood = {}

        for streak in streak_lengths:
            if streak == 1:
                momentum = 0
                total = 0
                for i in range(len(df[column]) - 1):
                    current_month = df[column].iloc[i]
                    next_month = df[column].iloc[i + 1]
                    if (current_month > 0 and next_month > 0) or (current_month < 0 and next_month < 0) or (current_month == 0 and next_month == 0):
                        momentum += 1
                    total += 1
                likelihood = momentum / total
            else:
                total_streaks = 0
                successful_streaks = 0
                for i in range(len(df[column]) - streak):
                    streak_values = df[column].iloc[i:i + streak]
                    next_month = df[column].iloc[i + streak]
                    if all(val > 0 for val in streak_values):
                        total_streaks += 1
                        if next_month > 0:
                            successful_streaks += 1
                likelihood = successful_streaks / total_streaks if total_streaks > 0 else 0

            momentum_likelihood[streak] = likelihood

        results[column] = momentum_likelihood

    # Calculate average likelihood for each streak length
    average_likelihood = {}
    for streak in streak_lengths:
        likelihoods = [results[column][streak] for column in df.columns]
        average_likelihood[streak] = sum(likelihoods) / len(likelihoods)

    return average_likelihood


# Run the analysis
average_momentum_streaks_results = assess_momentum_streaks(df_combined_relative)


# Print the results
print("Average Likelihood by Streak Length")
print("Streak Length\tLikelihood")
for streak, likelihood in average_momentum_streaks_results.items():
    print(f"{streak}\t\t{likelihood:.2f}")

Average Likelihood by Streak Length
Streak Length	Likelihood
1		0.53
2		0.58
3		0.60
4		0.60
5		0.60
