# Looking at the historical US market returns

## Imports

In [2]:
import pandas as pd
import matplotlib.pyplot as plt
import math

## Data

### Shiller's historical S&P 500 data

In [3]:
df = pd.read_excel('../data/shiller_sp500.xls', sheet_name='Data', header=None, skiprows=8)

#### Manually adjust columns names

There's some weird formatting due to a preamble above the first several columns and some unused columns.  I chose slightly shorter names here for convenience and drop the columns with no actual data.

In [4]:
dropped_columns = ["No Data 1", "No Data 2"]

columns = [
    "Date",
    "S&P",
    "Dividend",
    "Earnings",
    "CPI",
    "Date Fraction",
    "Long Interest Rate",
    "Real Price",
    "Real Dividend",
    "Real Total Return Price",
    "Real Earnings",
    "Real TR Scaled Earnings",
    "CAPE",
    dropped_columns[0],
    "TR CAPE",
    dropped_columns[1],
    "Excess CAPE Yield",
    "Monthly Bond Returns",
    "Monthly Real Bond Returns",
    "10 Year Real Stock Return",
    "10 Year Real Bond Return",
    "10 Year Excess Return"
]

df.columns = columns

# Sanity check
filled_dropped_rows = df[~df[dropped_columns[0]].isna() | ~df[dropped_columns[1]].isna()]
print("Was there any data in the dropped columns?", len(filled_dropped_rows) > 0)

# Drop the empty columns
df = df.drop(columns=dropped_columns)

Was there any data in the dropped columns? False


#### Drop trailing row that contains text instead of numeric entries

The last row for some entries describes the extrapolation methodology.  These will confuse plots/analyses, so they're dropped.

In [5]:
last_index = len(df) - 1
df = df.drop(last_index)

## Analysis

### Dividend re-invested Compound Annual Growth Rate (CAGR)

The purpose of thes formulas is to provide a simple way to compare the CAGR between time periods, with dividends included.  There are some assumptions made throughout, that are clarified in the formulas.

In [6]:
def cagr(initial: float, final: float, dividends: bool=True) -> float:
    """
    Calculates the Compound Annual Growth Rate (CAGR) between the final and initial periods,
    inclusive, where dividends are reinvested by default.
    - initial: The (year).(month / 100) of the period to start the CAGR analysis
    - final: The (year).(month / 100) of the period to end the CAGR analysis.

    Example: cagr(2014.01, 2024.03) calculates the CAGR between January 2014 and  March 2024, with dividends reinvested.
    """

    rows = df[(df["Date"] >= initial) & (df["Date"] <= final)]
    return df_cagr(rows, dividends)

def df_cagr(rows: pd.DataFrame, dividends: bool) -> float:
    """
    Calculates the CAGR for the input dataframe, where the format is expected to be the same as
    extracted from the Shiller data.
    """

    initial_row = rows.iloc[0]
    final_row = rows.iloc[-1]
    
    dividend_ratio = calculate_dividend_ratio(rows, dividends)

    initial = initial_row["S&P"]
    final = dividend_ratio * final_row["S&P"]
    period = final_row["Date"] - initial_row["Date"]

    # Yearly compounding, ignoring issues with fractional year contributions
    return math.pow(final / initial, 1 / period) - 1

def calculate_dividend_ratio(rows: pd.DataFrame, dividends: bool) -> float:
    """
    Calculates the additional growth factor due to reinvesting dividends.
    """

    if not dividends:
        print("Dividends not reinvested.")
        return 1

    # The "Dividend" column is the dividend return of the S&P over the last 12 months.
    # This calculation approximates the current month's dividends by 1/12th of this rolling average.
    dividend_fraction = (rows["Dividend"] / 12 / rows["S&P"]).sum()
    print("Dividend fraction:", dividend_fraction)
    return 1 + dividend_fraction

##### Comparisons to online resources

The current calculation is a bit ad-hoc w/r/t using Shiller's dividend data.  The definition of the "Dividend" column in Shiller's data is inferred from [multpl](https://www.multpl.com/s-p-500-dividend).  The results are compared to two online resources, which are listed below, as a sanity check.

- [Don't Quit Your Day Job (dqydj)](https://dqydj.com/sp-500-return-calculator/)
- [moneychimp](http://www.moneychimp.com/features/market_cagr.htm)

In [7]:
initial_date = 2014.01
final_date = 2025.01
print("January 2014 to January 2025 comparisons\n\n")

with_dividends = cagr(initial_date, final_date)
moneychimp_dividend = 0.1319
dqydj_dividend = 0.1340
print("Calculated CAGR with dividends:", with_dividends)
print("Relative to moneychimp:", with_dividends / moneychimp_dividend)
print("Relative to dqydj:", with_dividends / dqydj_dividend)

print("\n\n")

without_dividends = cagr(initial_date, final_date, dividends=False)
moneychimp_without = 0.1111
dqydj_without = 0.1141
print("Calculated CAGR without dividends:", without_dividends)
print("Relative to moneychimp:", without_dividends / moneychimp_without)
print("Relative to dqydj:", without_dividends / dqydj_without)

January 2014 to January 2025 comparisons


Dividend fraction: 0.19708216994567526
Calculated CAGR with dividends: 0.13243729097845636
Relative to moneychimp: 1.0040734721641877
Relative to dqydj: 0.98833799237654



Dividends not reinvested.
Calculated CAGR without dividends: 0.11406872818940905
Relative to moneychimp: 1.0267212258272642
Relative to dqydj: 0.9997259262875465


## Todo

- Figure out discrepancies in dividend reinvestment.  The current calculations seem to be in the right ballpark, but the discrepancies between online resources aren't obvious.

## Tests/scratch

In [8]:
print("with:", cagr(1990.01, 2021.01))

print("\n")

print("without", cagr(1990.01, 2021.01, dividends=False))

Dividend fraction: 0.6417015514514912
with: 0.09834663521039322


Dividends not reinvested.
without 0.08092223223377548


In [15]:
print(df.iloc[0])
df[df["Dividend"] > 0.26].iloc[0]

Date                             1871.01
S&P                                 4.44
Dividend                            0.26
Earnings                             0.4
CPI                            12.464061
Date Fraction                1871.041667
Long Interest Rate                  5.32
Real Price                    115.921761
Real Dividend                   6.788211
Real Total Return Price       115.921761
Real Earnings                  10.443402
Real TR Scaled Earnings        10.443402
CAPE                                 NaN
TR CAPE                              NaN
Excess CAPE Yield                    NaN
Monthly Bond Returns            1.004177
Monthly Real Bond Returns            1.0
10 Year Real Stock Return       0.130609
10 Year Real Bond Return        0.092504
10 Year Excess Return           0.038106
Name: 0, dtype: object


Date                             1872.01
S&P                                 4.86
Dividend                          0.2633
Earnings                          0.4025
CPI                            12.654392
Date Fraction                1872.041667
Long Interest Rate                  5.36
Real Price                    124.978862
Real Dividend                   6.770974
Real Total Return Price       132.041182
Real Earnings                  10.350616
Real TR Scaled Earnings        10.935509
CAPE                                 NaN
TR CAPE                              NaN
Excess CAPE Yield                    NaN
Monthly Bond Returns             1.00306
Monthly Real Bond Returns        1.03567
10 Year Real Stock Return       0.107684
10 Year Real Bond Return        0.084931
10 Year Excess Return           0.022753
Name: 12, dtype: object