# ARIMA models in Python

Course Description: Have you ever tried to predict the future? What lies ahead is a mystery which is usually only solved by waiting. In this course, you will stop waiting and learn to use the powerful ARIMA class models to forecast the future. You will learn how to use the statsmodels package to analyze time series, to build tailored models, and to forecast under uncertainty. How will the stock market move in the next 24 hours? How will the levels of CO2 change in the next decade? How many earthquakes will there be next year? You will learn to solve all these problems and more.

## ARMA models
* Time series are everywhere:
    * Science
    * Technology
    * Business
    * Finance
    * Policy
* ARIMA models are one of the go-to time-series tools
* **Trend:** a positive trend is a line that generally slopes up; a negative trend is a line that generally slopes down
* **Seasonality:** has patterns that repeat at regular intervals, for example high sales every weekend
* **Cyclicality:** in contrast to seasonality, has a repeating pattern but no fixed periods/time intervals.
* **White noise:** has uncorrelated values

#### Stationarity
* To model a time series, it must be stationary
* **Stationary:** Means that the distribution of the data doesn't change with time. For time series to be stationary, it must fulfill three criteria:
    * **Trend stationary:** series has zero trend
    * **Variance is constant:** the avererage distance of the data points from the zero line isn't changing
    * **Autocorrelation is constant:** how each value in the time series is related to its neighbors stays the same.
* For train-test split, the data must be split in time (not shuffled or reordered)
* We train on the data earlier in the time series and test on the data that comes later


#### Making time series stationary:
#### Augmented Dickey-Fuller Test:
* tests for trend non-stationarity
* Null hypothesis is time series is non-stationary due to trend

```
from statsmodels.tsa.stattools import adfuller
results = adfuller(df['close'])
```
* 0th element: test statistic
    * More negative means more likely to be stationary
* 1st element: p-value
    * If p-value is small (smaller than 0.05) $\Rightarrow$ reject null hypothesis (reject non-stationarity)
* 4th element: dictionary of critical values of the test statistic
    
* **Plotting time series can stop you from making incorrect assumptions and ends up saving you time!

* Remember: the Dickey Fuller test only tests for stationarity.

* Making a time series stationary $\Rightarrow$ A bit like feature engineering in classic ML.

* One very common way to make a time series stationary is to **take first differences.**
* For some time series, **we may need to take the difference more than once.**
* **Sometimes, we will need to perform other transformations to make the time series stationary.**

#### Other transformations
* **Take the log:** `np.log(df)`
* **Take the square root:** `np.sqrt(df)`
* **Take the proportional change:** `df.shift(1)/df`




### Autoregressive (AR) Models
* In an AR(1) model:
    * Today's value = a mean + a fraction ($\phi$) of yesterday's value + noise
    * $R_t$ = $\mu$ + $\phi$$R_{t-1}$ + $\epsilon_t$
* Since there is only 1 lagged value on the right hand side, this is called an AR model of order 1, or simply an AR(1) model.
* If the AR parameter **$\phi$ is 1**, then the process is a **random walk**.
* If **$\phi$ is 0**, then the process is **white noise**.
* In order for the process to be **stable** and **stationary**, $\phi$ has to be between -1 and 1
    * -1 < $\phi$ < 1
* **Negative $\phi$:** Mean reversion
* **Positive $\phi$:** Momentum
* The autocorrelation **decays exponentially at a rate of $\phi$.**
    * This means that if $\phi$ is 0.9:
        * the lag-1 autocorrelation is 0.9 
        * the lag-2 autocorrelation is $0.9^2$
        * the lag-3 autocorrelation is $0.9^3$
        * ... etc. ...
    * When $\phi$ is negative, the autocorrelation function still decays exponentially, but the signs of the autocorrelation function reverse at each lag.
    
* Higher Order AR Models:
    * AR(1) 
        * $R_t$ = $\mu$ + $\phi_1$$R_{t-1}$ + $\epsilon_t$
    * AR(2)
        * $R_t$ = $\mu$ + $\phi_1$$R_{t-1}$ + $\phi_2$$R_{t-2}$ + $\epsilon_t$
    * AR(3)
        * $R_t$ = $\mu$ + $\phi_1$$R_{t-1}$ + $\phi_2$$R_{t-2}$ + $\phi_3$$R_{t-3}$ + $\epsilon_t$
    * etc. ....

#### Simulating an AR Process

```
from statsmodels.tsa.arima_process import ArmaProcess
ar = np.array([1, -0.9])
ma = np.array([1])
AR_object = ArmaProcess(ar, ma)
simulated_data = AR_object.generate_sample(nsample=1000)
plt.plot(simulated_data)
```
* The convention for defining the order and parameters of the AR process is a little counterintuitive:
    * You must include the zero-lag coefficient of 1, and the sign of the other coefficient is the opposite of what we have been using. 
    * For example, for an AR(1) process with $\phi$ = **+0.9**, the second element of the ar array should be the opposite sign, **-0.9**
    
#### Estimating and Forecasting as AR Model
* Statsmodels has another model for estimating the parameters of a given AR model

#### Estimating an AR Model
* To estimate parameters from data (simulated):

```
from statsmodels.tsa.arima_model import ARMA
mod = ARMA(simulated_data, order=(1,0))
result = mod.fit()
```
* The arguments of `mod` are: 1) the data you are trying to fit and 2) the order of the model
    * An order (1,0) would mean you're fitting the data to an AR(2) model.
    * An order (2,0) would mean you're fitting the data to an AR(2) model.
    * The second part of the order is the MA part (discusssed in next chapter).
    * To see the full output, use the summary method on result:
        * `print(result.summary())
            * `const` = $\mu$
            * `ar.L1.y` = $\phi$
    * If you just want to see the coefficients rather than the entire regression output, you can use:
        * `print(result.params)`
        * returns array of the fitted coefficients $\mu$ and $\phi$
        
#### Forecasting an AR Model 
* To do forecasting, both in sample and out of sample, you still create an instance of the class using ARMA, and use `.plot_predict` to do forecasting

```
from statsmodels.tsa.arima_model import ARMA
mod = ARMA(simulated_data, order=(1,0))
res = mod.fit()
res.plot_predict(start='2016-07-01', end='2017-06-01')
plt.show()
```

#### Choosing the Right Model
* In practice, you will ordinarily not be told the order of the model that you're trying to estimate
* Two techniques to determine order: 
    * The **Partial Autocorrelation Function (PACF):** measures the incremental benefit of adding another lag
        * **`.plot_pacf`:** same usage as `plt.acf`; is the statsmodels function for plotting the partial autocorrelation function
    * The **Information criteria:** adjusts the goodness-of-fit of a model by imposing a penalty based on the number of parameters used.
    * Two popular adjusted goodness-of-fit measures:
        * **AIC (Akaike Information Criterion)**
        * **BIC (Bayesian Information Criterion):** In practice, the best way to use the information criteria is to fit several models, each with a different number of parameters, and choose the one with the lowest information criterion
        * Both AIC and BIC are included in the full estimation output of an ARMA model (`result.summary()`)
        * To get solely the AIC or BIC statistics:
            * `result.aic`
            * `result.bic`
    
#### PACF
```
from statsmodels.graphics.tsaplots import plot_pacf
plot_pacf(x, lags=20, alpha=0.05)
```
* The input `x` is a series or array 
* The argument `lags` indicates how many lags of the partial autocorrelation function will be plotted 
* The `alpha` argument sets the width of the confidence interval



### Moving Average (MA) and ARMA Models
#### Describe Model
* In a moving average, or MA model 
* Mathematical Description of a MA(1) Model:
    * Today's value equals a mean plus noise, plus a fraction of theta of yesterday's noise
    * $R_t = \mu + \epsilon_t + \theta\epsilon_{t-1}$
    * Since there is only one lagged error on the right hand side, this is called an MA model of order 1, or simply an MA(1) model. 
    * If $\theta$ is 0, then the process is white noise
    * MA models are stationary for all models of $\theta$
    * **Negative $\theta$: One-Period Mean Reversion**; a shock two periods ago would have **no** effect on today's return- only the stock now and last period
    * **Positive $\theta$: One-Period Momentum**
    * **Note:** One-period autocorrelation is $\theta / (1 + \theta^2)$, not $\theta$
    
#### Simulating an MA Process

```
from statsmodels.tsa.arima_process import ArmaProcess
ar = np.array([1])
ma = np.array([1, 0.5])
AR_object = ArmaProcess(ar, ma)
simulated_data = AR_object.generate_sample(nsample=1000)
plt.plot(simulated_data)
```
* For an MA(1), the AR order is just an array containing 1 
* The MA order is an array containing 1 and the MA(1) parameter $\theta$
* Unlike with the AR simulation, no need to reverse the sign of $\theta$