# ARIMA models in Python

Course Description: Have you ever tried to predict the future? What lies ahead is a mystery which is usually only solved by waiting. In this course, you will stop waiting and learn to use the powerful ARIMA class models to forecast the future. You will learn how to use the statsmodels package to analyze time series, to build tailored models, and to forecast under uncertainty. How will the stock market move in the next 24 hours? How will the levels of CO2 change in the next decade? How many earthquakes will there be next year? You will learn to solve all these problems and more.

## ARMA models
* Time series are everywhere:
    * Science
    * Technology
    * Business
    * Finance
    * Policy
* ARIMA models are one of the go-to time-series tools
* **Trend:** a positive trend is a line that generally slopes up; a negative trend is a line that generally slopes down
* **Seasonality:** has patterns that repeat at regular intervals, for example high sales every weekend
* **Cyclicality:** in contrast to seasonality, has a repeating pattern but no fixed periods/time intervals.
* **White noise:** has uncorrelated values

#### Stationarity
* To model a time series, it must be stationary
* **Stationary:** Means that the distribution of the data doesn't change with time. For time series to be stationary, it must fulfill three criteria:
    * **Trend stationary:** series has zero trend
    * **Variance is constant:** the avererage distance of the data points from the zero line isn't changing
    * **Autocorrelation is constant:** how each value in the time series is related to its neighbors stays the same.
* For train-test split, the data must be split in time (not shuffled or reordered)
* We train on the data earlier in the time series and test on the data that comes later


#### Making time series stationary:
#### Augmented Dickey-Fuller Test:
* tests for trend non-stationarity
* Null hypothesis is time series is non-stationary due to trend

```
from statsmodels.tsa.stattools import adfuller
results = adfuller(df['close'])
```
* 0th element: test statistic
    * More negative means more likely to be stationary
* 1st element: p-value
    * If p-value is small (smaller than 0.05) $\Rightarrow$ reject null hypothesis (reject non-stationarity)
* 4th element: dictionary of critical values of the test statistic
    
* **Plotting time series can stop you from making incorrect assumptions and ends up saving you time!

* Remember: the Dickey Fuller test only tests for stationarity.

* Making a time series stationary $\Rightarrow$ A bit like feature engineering in classic ML.

* One very common way to make a time series stationary is to **take first differences.**
* For some time series, **we may need to take the difference more than once.**
* **Sometimes, we will need to perform other transformations to make the time series stationary.**

#### Other transformations
* **Take the log:** `np.log(df)`
* **Take the square root:** `np.sqrt(df)`
* **Take the proportional change:** `df.shift(1)/df`


