### ARIMA Models (AutoRegressive Integrated Moving Average)

ARIMA models are applied in some cases where data show evidence of non-stationarity, where an initial differencing step (corresponding to the "integrated" part of the model) can be applied one or more times to eliminate the non-stationarity. 

Non-Seasonal ARIMA Models are generally denoted ARIMA(p,d,q) where parameters p, d, and q are non-negative integers. 

#### Parts of ARIMA Model:
##### AR (p): Autoregression

A regression model that utilizes the dependent relationship between a current observation and observations over a previous period. 

##### I (d): Integrated

Differencing of observations (subtracting an observation from an observation at the previous time step) in order to make the time series stationary.

##### MA (q): Moving Average

A model that uses the dependency between an observation and a residual error from a moving average model applied to lagged observations.

##### Stationary vs. Non-Stationary Data
To effective use ARIMA, we need to understand Stationarity in our data. What makes a data set Stationary? A Stationary series has constant mean and variance over time. A stationary data set will allow our model to predict that the mean and variance will be the same in future periods.
* Mean needs to be constant
* Variance should not be a function of time
* Covariance should not be a function of time (How fast variance is moving across time)

A common mathematical test you can use for stationarity in your data is the Augmented Dickey-Fuller test. If your data is not stationary, you will need to transform it to be stationary. You can do this with differencing. Take the value at time t and subtract the value from time t-1. If it's still not stationary, take the second difference! You can continue differencing until you reach stationarity, but keep in mind that each differencing step comes at the cost of losing a row of data.

For seasonal data, you can also difference by season. For example, if you had monthly data with yearly seasonality, you could difference by a time unit of 12 instead of just 1. 

#### Autocorrelation
An autocorrelation plot (Correlogram) shows the correlation of the series with itself, lagged by x time units. The y axis is the correlation and the x axis is the number of time units of lag.
Type 1) Gradual Decline
Type 2) Sharp Drop-Off

Our main priority is to try to figure out whether we will use the AR or MA components for the ARIMA model (or both) as well as how many lags we should use.  Using both is less common. 

If the autocorrelation plot shows positive autocorrelation at the first lag (lag-1), then it suggests to use the AR terms in relation to the lag.

If the autocorrelation plot shows negative autocorrelation at the first lag, then it suggests using MA terms. 

This allows you to decide what actual values of p, d, and q to provide your ARIMA model. 

p: The number of lag observations included in the model

d: The number of times that the raw observations are differenced

q: The size of the moving average window, also called the order of moving average. 

#### Partial Correlation
In general, a partial correlation is a conditional correlation. It is the correlation between two variables under the assumption that we know and take into account the values of some other set of variables. 

Ex: Consider a regression context in which y = response variable and x1, x2, and x3 are predictor variables. The partial correlation between y and x3 is the correlation between the variables determined by taking into account how both y and x3 are related to x1 and x3. 

Typically a sharp drop after lag "k" suggests an AR-k model should be used. If there is a gradual decline, it suggests an MA model. 

Identification of an AR model is often best done with the PACF. Identification of an MA model is often best done with the ACF rather than the PACF. Once you've analyzed your data using ACF and PACF you are ready to begin to apply ARIMA or Seasonal ARIMA, depending on your original data. 