# Analyzing Recovery Times from Financial Crises

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import plotly.graph_objects as go
import yfinance as yf

## Downloading and Exploring Data

Import [S&P500](https://finance.yahoo.com/quote/%5EGSPC) data from Yahoo Finance. The ticker is "^GSPC".

In [None]:
file = "../data/gspc.csv"

try:
    df_sp500 = pd.read_csv(file)
    print("Read historical data from", file) 
except FileNotFoundError:
    df_sp500 = yf.download("^GSPC", start="1900-01-01", multi_level_index=False, auto_adjust=True)
    df_sp500.to_csv(file)
    print("Dowloaded historic data and wrote them into", file)

df_sp500.info()

Preliminary data exploration:

In [None]:
df_sp500.head()

In [None]:
df_sp500.describe()

The _Open_ column presents some zero values, which is unusual for financial data and probably denotes missing values. Let's investigate:

In [None]:
zeros = sum(df_sp500["Open"] == 0)
print(f"There are {zeros} zeros in this column:")
df_sp500["Open"].plot()
plt.yscale("log")
plt.ylabel("S&P 500 Open")
plt.show()

Notice that the historical record of opening values is incomplete, but it becomes more reliable starting in the early 1980s, thanks to advancements in trading technology. Much of the pre-1980s data was reconstructed from newspapers, end-of-day reports, or monthly summaries, which often included only high, low, close, and volume. More accurate historical data exists, but it is not available for free in yahoo finance. Missing data is filled with **zero**.

Let's plot _Close_ values, which we expect to be more reliable:

In [None]:
df_sp500["Close"].plot()
plt.yscale("log")
plt.ylabel("S&P 500 Close")
plt.show()

To create more complex analyses or representations, like candlestick graphs, we can shorten our time series considering only post-1985 data (when open values where recorded):

In [None]:
candlestick = df_sp500[df_sp500.index > "2000-01-01"]
candlestick.reset_index(inplace=True)

fig = go.Figure(data=[
    go.Candlestick(
        x=candlestick["Date"],
        open=candlestick['Open'],
        high=candlestick['High'],
        low=candlestick['Low'],
        close=candlestick['Close']
    )
])

fig.update_layout(
    title=dict(text='S&P 500 Candlestick Graph with Rangeslider'),
    yaxis=dict(title=dict( text='S&P500 Index'))
)

fig.show()

## Market Recovery Times 

We want to evaluate market recovery times. We start by creating some utility columns. In _Previous Max Close_ we store cumulative max values from the _Close_ column:

In [None]:
df_sp500["Previous Max Close"] = df_sp500["Close"].cummax()

In [None]:
df_sp500.loc[:, ["Close", "Previous Max Close"]].plot()
plt.yscale("log")
plt.show()

The cummax method is useful, but we'd like to keep track of the date where the previous max occurred:

In [None]:
# Create a mask specifying where a new max occurs
is_new_max = df_sp500["Close"] == df_sp500["Previous Max Close"]
# Find the corresponding dates. This creates a DateTimeIndex with NaT where is_new_max is False
new_max_dates = df_sp500.index.where(is_new_max)
# Forward-fill the last max date
last_max_dates = pd.Series(new_max_dates).ffill()
# Align index
last_max_dates.index = df_sp500.index

df_sp500["Previous Max Close Date"] = last_max_dates
df_sp500.head()

In [None]:
recovery_days = df_sp500["Previous Max Close Date"].value_counts()

# Values counts are already sorted
recovery_days = recovery_days[recovery_days.iloc[:] > 90]
recovery_days = recovery_days.reset_index()
recovery_days.columns = ["Crash Date", "Length (trading days)"]
recovery_days["Length (years)"] = (recovery_days["Length (trading days)"] / 251).map(lambda x : round(x,2))

In [None]:
recovery_days.head(10)

This is a very rudimental indication of market crashes, definied as periods between local max values.