# Analysis Supporting Importance of Time in Market

### Objective
- todo Incorporate a interactive widget that allows you to change the number of days to skip
    - [example](https://towardsdatascience.com/best-practices-for-writing-reproducible-and-maintainable-jupyter-notebooks-49fcc984ea68)


### Highlights
- A handful of daily returns make a significant difference 

In [1]:
# import libraries
import datetime as dt
import pandas as pd
import yfinance as yf
from dateutil.relativedelta import relativedelta
from ipywidgets import interact

## Import Data

In [2]:
# Get historical stock data
stock_ticker = yf.Ticker('SPY')
stock_df = stock_ticker.history(period="max")

For the purposes of creating the functions below I'm importing historical SPY data since it closely follows the S&P 500 which closely follows 500 of the largest stocks in the United States. Below we'll examine the initial imported raw data before we begin cleaning and preprocessing.

In [3]:
stock_df

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Dividends,Stock Splits,Capital Gains
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1993-01-29 00:00:00-05:00,24.858349,24.858349,24.734675,24.840681,1003200,0.0,0.0,0.0
1993-02-01 00:00:00-05:00,24.858346,25.017355,24.858346,25.017355,480500,0.0,0.0,0.0
1993-02-02 00:00:00-05:00,24.999676,25.088014,24.946674,25.070347,201300,0.0,0.0,0.0
1993-02-03 00:00:00-05:00,25.105681,25.353027,25.088013,25.335360,529400,0.0,0.0,0.0
1993-02-04 00:00:00-05:00,25.423710,25.494381,25.141028,25.441378,531500,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...
2023-12-22 00:00:00-05:00,473.859985,475.380005,471.700012,473.649994,67126600,0.0,0.0,0.0
2023-12-26 00:00:00-05:00,474.070007,476.579987,473.989990,475.649994,55387000,0.0,0.0,0.0
2023-12-27 00:00:00-05:00,475.440002,476.660004,474.890015,476.510010,68000300,0.0,0.0,0.0
2023-12-28 00:00:00-05:00,476.880005,477.549988,476.260010,476.690002,77158100,0.0,0.0,0.0


In [4]:
stock_df.info()
stock_df.describe()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 7786 entries, 1993-01-29 00:00:00-05:00 to 2023-12-29 00:00:00-05:00
Data columns (total 8 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Open           7786 non-null   float64
 1   High           7786 non-null   float64
 2   Low            7786 non-null   float64
 3   Close          7786 non-null   float64
 4   Volume         7786 non-null   int64  
 5   Dividends      7786 non-null   float64
 6   Stock Splits   7786 non-null   float64
 7   Capital Gains  7786 non-null   float64
dtypes: float64(7), int64(1)
memory usage: 547.5 KB


Unnamed: 0,Open,High,Low,Close,Volume,Dividends,Stock Splits,Capital Gains
count,7786.0,7786.0,7786.0,7786.0,7786.0,7786.0,7786.0,7786.0
mean,142.049157,142.893496,141.125669,142.060268,84477590.0,0.012001,0.0,0.0
std,112.408535,113.032173,111.748362,112.437529,92334230.0,0.109004,0.0,0.0
min,24.540326,24.611003,24.204635,24.540319,5200.0,0.0,0.0,0.0
25%,70.642893,71.168676,70.059361,70.61734,10023300.0,0.0,0.0,0.0
50%,93.507949,94.276505,92.710733,93.469711,63153200.0,0.0,0.0,0.0
75%,181.19328,181.765231,180.398108,181.201038,115985200.0,0.0,0.0,0.0
max,476.880005,477.549988,476.26001,476.690002,871026300.0,1.906,0.0,0.0


### Raw Data Observation
- There is daily stock data spanning about 30 years. I'll want the ability to filter this time frame to simulate different scenarios.
- There is no missing data
- I'm going to be only be using the Date and Close columns for the analysis

## Cleaning

In [5]:
# reset index to place datetime in series and clean new datetime series
stock_df.reset_index(inplace=True)
stock_df['Date'] = pd.to_datetime(stock_df['Date']).dt.date

In [7]:
stock_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7786 entries, 0 to 7785
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Date           7786 non-null   object 
 1   Open           7786 non-null   float64
 2   High           7786 non-null   float64
 3   Low            7786 non-null   float64
 4   Close          7786 non-null   float64
 5   Volume         7786 non-null   int64  
 6   Dividends      7786 non-null   float64
 7   Stock Splits   7786 non-null   float64
 8   Capital Gains  7786 non-null   float64
dtypes: float64(7), int64(1), object(1)
memory usage: 547.6+ KB
