# Introduction to ARMA Models with Financial Data

In [21]:
import yfinance as yf
import pandas as pd

We can use the yfinance package in python which uses Yahoo's publicly available API to pull data. We will pull the data for the S&P 500 using the ticker $SPY. We do this by creating a ticker object for spy. Once we have our ticker object (spy) we can access methods or attributes of the object such as history. The history method will allow us to pull historical data by defining some parameters including: 

- period: "1d”, “5d”, “1mo”, “3mo”, “6mo”, “1y”, “2y”, “5y”, “10y”, “ytd”, “max”
- interval: “1m”, “2m”, “5m”, “15m”, “30m”, “60m”, “90m”, “1h”, “1d”, “5d”, “1wk”, “1mo”, “3mo”
- start: formatted as (yyyy-mm-dd) or datetime.
- end: formatted as (yyyy-mm-dd) or datetime.

Below we use the period and interval to define how far back we want the data and at what intervals from that period to current we want the data. Otherwise, we could also define start and end instead of period. 

In [20]:
spy = yf.Ticker("SPY")
hist = spy.history(period = "1y", interval= "1d")
hist.head()

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Dividends,Stock Splits,Capital Gains
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2022-03-25 00:00:00-04:00,443.643563,445.433248,440.959034,445.148071,77101300,0.0,0.0,0.0
2022-03-28 00:00:00-04:00,444.528589,448.314453,442.561909,448.314453,68529800,0.0,0.0,0.0
2022-03-29 00:00:00-04:00,452.355965,454.37183,449.563283,453.860474,86581500,0.0,0.0,0.0
2022-03-30 00:00:00-04:00,452.670645,453.516333,448.865125,451.057983,79666900,0.0,0.0,0.0
2022-03-31 00:00:00-04:00,450.261474,451.116975,443.643587,444.115601,121699900,0.0,0.0,0.0


To begin any data analysis of time series data, it is useful to look at the trace plot, a historical plot of stock prices and time. 

In [24]:
# add trace plot
df = pd.DataFrame(hist, columns=['Close'])
df.head

# df['date'] = pd.to_datetime(df['Year'].astype(str) + '-' + df['Month'].astype(str), format='%Y-%m')
# df.set_index('date', inplace=True)
# df.drop(['Year', 'Month'], axis=1, inplace=True)

<bound method NDFrame.head of                                 Close
Date                                 
2022-03-25 00:00:00-04:00  445.148071
2022-03-28 00:00:00-04:00  448.314453
2022-03-29 00:00:00-04:00  453.860474
2022-03-30 00:00:00-04:00  451.057983
2022-03-31 00:00:00-04:00  444.115601
...                               ...
2023-03-20 00:00:00-04:00  393.739990
2023-03-21 00:00:00-04:00  398.910004
2023-03-22 00:00:00-04:00  392.109985
2023-03-23 00:00:00-04:00  393.170013
2023-03-24 00:00:00-04:00  395.750000

[251 rows x 1 columns]>

## Stationarity