In [None]:
import yfinance as yf
import pandas as pd
import matplotlib.pyplot as plt
import pmdarima as pm
import numpy as np

from statsmodels.tsa.stattools import adfuller
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.arima.model import ARIMA
from pmdarima.model_selection import train_test_split
from sklearn.metrics import mean_absolute_percentage_error, mean_squared_error

## Load Data

We begin by loading the data from yfinance -- see the arma_model.ipynb for previous steps and details regarding loading stock data from yfinance in python. 

In [None]:
spy = yf.Ticker("SPY")
# hist = spy.history(period = "1y", interval= "1d")
hist = spy.history(start = "2010-01-04", end = "2020-02-01")
df = pd.DataFrame(hist, columns=['Close'])
df.head

## Background

Previously, we concluded that the closing price of $SPY could be modeled adequately by a random walk model. The reasons for this will become inttuive once we take a brief but useful look at the efficient market hypothesis. 

### Efficient Market Hypothesis:

The efficient market hypothesis states that the price of a stock is reflective of an efficient market - thus all information about the company is reflected in the current value of the stock. Thus, there can be no arbitrage, an essentially riskless or minimally riskless profit-taking based on market ineffiicencies leading to a difference of a current stock value and its actual value. While this view is controversial, it is at the basis of a lot of economic theory. It does imply that consistently beating the market is impossible based on information about the price of a security up until time *t*, because all information is available publically through time *t*. 

This can be mathematically expressed in the following way: 

$E[X_t|X_{t-1}, X_{t-2}, \dots , X_{-\infty}] = 0$

where X_t is the current asset price/return and we condition on all previous prices/returns $X_{-\infty: t-1}$


However, this should not be confused with implying that there is no way to make a profit. Recall, arbitrage reflects a riskless method to extract profit. However, there do exist investments which have considerably more risk and create opportunities for profit through risk. As a result, we can use the notion of risk and model adjacent terms such as volatitlity to make profits. 

### Volatility

Volatility and risk are closely related terms in financial assets such as stocks. In the stock market, volatility refers to the amount and frequency of price fluctuations. Risk, meanwhile, refers to the probability of losing money when investing in a particular stock.

We can connect the two - risk and volatility - by sticking with a general concept that the righer the volatility, the higher the risk of a financial asset. A simple illustration could be as follows: 

- Company A has a stock price of $100 which we are confident (ignore how this confidence is obtained for pragmantic purposes) will only move +/-5% in the next quarter. 
- Company B has a stock price of $100 which we are confident can move between +/- 50% in the next quarter. 

Albeit a very simplified example, we can see that with the added voltatility of company B there comes an added notion of risk with the asset.

### Volatility Clustering

We are able to capture volatility clustering -- pockets of time where volatility moves together, ie high volatility or low volatility time periods -- via modeling processes with ARCH. Thus, we can leverage these notions of risk and volatility in the hopes of making profits on assets while still abiding by the principles of the Efiicient Market Hypothesis which limits us to preoducing only trival responses to the conditional expected value of a stock at time *t* given previous observations. 


### Returns

Let's take a small detour to mention the modeling of returns rather than prices. This is a convenient way to model stocks or other financial assets and as such will be important and simple to conceptualize: 

Let $R_t$ denote the retun of an asset at time *t*, $P_t$ denote the price of the asset at time *t*. Then we can formulate the simple one period return of an asset as follows: 

$R_t = \frac{P_t - P_{t-1}}{P_{t-1}}$ 

The log return which is often used is denoted by the following where $r_t$ is the log return: 

$r_t = log(\frac{P_t}{P_{t-1}}) = log P_t - log P_{t-1}$ 

In [None]:
# convert prices to log t