# Problem

In some investment circles you often hear statistics such as ["90% of actively managed investment funds failed to beat the market"](https://www.businessinsider.com/personal-finance/investment-pros-cant-beat-the-stock-market-2020-7)
in support of the idea of just buying index funds. For whatever it's worth, I like index funds too and think indexing is a good strategy. But sometimes these statistics are stretched too far to suggest that it's very unlikely that you could beat the market. These are dogmatic indexers. Sometimes it's even implied that this means there is a 10% chance of beating the market. Intuitively this just doesn't make sense.

In response to this idea that an average person has a miniscule chance of beating the market, I ask myself, "If I were to pick 1 stock, isn't there a fair chance that it will beat the market?" Warren Buffet has famously asserted that "diverisification is a protection against ignorance." We may not have the knowledge of Mr. Buffet but we can be confident that minimum portfolio diversity will have a high chance of deviating from the market average. The question at this point is the probablity that it strays in the desired direction.

These claims about beating the market are so common I decided to test the questions I asked myself. My goal is to find this answer for a period of time dating back as far as possible for a set of "normal" companies. The universe of companies is large and their profiles vary. I don't mean finding the next Apple or Amazon, but simply betting on an established, well-known company. Nothing special or complicated. In support of this, I will look at DIA, the oldest mutual fund tracking the Dow Jones, and it's components to see what portion of these outperformed the index as a whole since 1998.

## Limitations of this approach

- Only 1 time period. We are limited by access to data here.
- This only looks at the change in price between a start and end date. This does not suggest what performance would be like if you were to invest at intervals during this time period, such as dollar-cost-averaging.
- Does not factor in dividends.

# Analysis

## Import

In [14]:
from common import *
import matplotlib.pyplot as plt
pd.set_option('display.max_rows', 10)

## Data Pipeline
Uses functions defined in `common.py` to prepare our dataset.

In [15]:
df = pd.read_csv("data/dow.csv", index_col=[0,1])

df_processed = (df
 .pipe(startPipeline)
 .pipe(clean)
 .pipe(trim)
 .pipe(flatten_date)
 .pipe(remove_outliers)
 .pipe(add_percent_change)
)

df_processed

startPipeline:
  runtime=0:00:00, end shape=(63, 7)
clean:
  runtime=0:00:00.000979, end shape=(62, 7)
trim:
  runtime=0:00:00.000997, end shape=(62, 1)
flatten_date:
  runtime=0:00:00.007032, end shape=(31, 4)
remove_outliers:
  runtime=0:00:00, end shape=(31, 4)
add_percent_change:
  runtime=0:00:00.001985, end shape=(31, 5)


Unnamed: 0_level_0,Start Date,End Date,Start Close,End Close,Percent Change
Symbol,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
AA,1998-01-20,2021-01-14,29.530121,25.090000,-15.035904
AXP,1998-01-20,2021-01-14,17.698372,123.779999,599.386359
BA,1998-01-20,2021-01-14,27.150482,209.910004,673.135454
CAT,1998-01-20,2021-01-14,9.994546,197.399994,1875.077159
COKE,1998-01-20,2021-01-14,42.274349,258.570007,511.647517
...,...,...,...,...,...
T,1998-01-20,2021-01-14,12.724303,29.290001,130.189428
TRV,1998-01-20,2021-01-14,23.375343,142.320007,508.846704
UK,1998-01-20,2021-01-14,,,-200.000000
WMT,1998-01-20,2021-01-14,14.052621,146.970001,945.854737


## Benchmark (DIA) Performance Since 1998

In [16]:
def benchmark_percent_change(df):
    return df.loc["DIA"]["Percent Change"]

def print_benchmark_info(df):
    dia_pct_change = benchmark_percent_change(df)
    print(f"The Dow Jones has risen {dia_pct_change} from " \
          f"{df.loc['DIA']['Start Date']} to {df.loc['DIA']['Start Date']}")

print_benchmark_info(df_processed)

The Dow Jones has risen 549.561518336611 from 1998-01-20 to 1998-01-20


As we can see, the DIA has risen 549.56% since 01/20/1998. Now we can compare this to each company to see how many companies rose more.

## Percent of Individual Dow Companies that Outperformed DIA

In [17]:
def outperformers(df):
    return df[df["Percent Change"] > benchmark_percent_change(df)]

def underperformers(df):
    return df[df["Percent Change"] < benchmark_percent_change(df)]

def print_outperformers_info(df):
    total_companies = len(df) - 1 # exclude DIA
    df_outperform = outperformers(df)
    total_above_dia = len(df_outperform)
    
    float_string = "%.2f" % (total_above_dia/total_companies * 100)
    avg_outperform_string = "%.2f" % df_outperform['Percent Change'].median()
    
    print(f"{float_string}% performed better than DIA ({total_above_dia}/{total_companies})")
    print(f"{avg_outperform_string}% was the median percent change for the outperformers")
    print(f"Outperformers:\n{df_outperform.index.to_list()}")
    
print_outperformers_info(df_processed)

43.33% performed better than DIA (13/30)
711.66% was the median percent change for the outperformers
Outperformers:
['AXP', 'BA', 'CAT', 'CVX', 'DIS', 'HON', 'JNJ', 'JPM', 'MCD', 'MMM', 'MO', 'RTX', 'WMT']


In [18]:
def print_std(df):
    std = df.drop("DIA")["Percent Change"].std()
    print(f"The standard deviation of the Dow Jones is {std}")
    
print_std(df_processed)

The standard deviation of the Dow Jones is 504.0683919778865


## Conclusions

43.33% (13/30) of the companies in the Dow Jones in 1998 beat DIA, the Dow Jones Index, over the same time period, with a median outperformer rising 711.66%, compared to the 549.56% for DIA. Components of the Dow have a standard deviation of 504.07%.

A few of these companies went bankrupt and were delisted. These are represented by `NaN` in our dataframe because there was no stock data for either their start, end, or both. However, we know the percent change for any number to 0 is -200%, so we are able to use this to calculate the standard deviation. And of course, they are part of the 17 companies that did not beat the average.

This suggests if you were to invest in a Dow Jones company your chance of beating the market is likely much higher than the miniscule 10% sometimes implied by dogmatic indexers.

Instead of suggesting that one's chances of beating the market is necessarily low, I think it's better conclude that most people are not willing to accept the risk of straying from the market average.