## Autoregressive Integrated Moving Average Model (ARIMA)

- AR: Autoregression. A model that uses the dependent relationship between an observation and some number of lagged observations.
- I: Integrated. The use of differencing of raw observations (i.e. subtracting an observation from an observation at the previous time step) in order to make the time series stationary.
- MA: Moving Average. A model that uses the dependency between an observation and residual errors from a moving average model applied to lagged observations.

Each of these components are explicitly specified in the model as a parameter. A standard notation is used of __ARIMA(p,d,q)__ where the parameters are substituted with integer values to quickly indicate the specific ARIMA model being used. The parameters of the ARIMA model
are defined as follows:
- p: The number of lag observations included in the model, also called the lag order.
- d: The number of times that the raw observations are dierenced, also called the degree of differencing.
- q: The size of the moving average window, also called the order of moving average.

## Box - Jenkins Method

Three step approach layed by Box - Jensions Method:

1. __Identification__. Use the data and all related information to help select a sub-class of model that may best summarize the data.
2. __Estimation__. Use the data to train the parameters of the model (i.e. the coefficients).
3. __Diagnostic Checking__. Evaluate the fitted model in the context of the available data and check for areas where the model may be improved.


## __Identification__ :
It involves identyfing if "ts" is stationary, differencing requried, and paramaters of an ARMA model for the data.

 - __Unit Root Test__ : We can use ADFfuller, KPSS test to determine if the series is stationary or not.If not we can do further differencing.
 
 - __Avoid over differencing:__ differencing more than the required will lead to extra serial correlation and additional complexity.
 
 ### Configuring AR and MA
 
- __Auto-correlation Function (ACF)__ : correlation of values with its lagged previous values.
 
- __Partial Autocorrelation Function (PACF)__: correlation of values with its lagged previous values that is not accounted by the prior lagged observations.

Helpful pointers:
- The model is AR if the ACF trails of after a lag and has a hard cut-of in the PACF after  lag. This lag is taken as the value for p.
- The model is MA if the PACF trails of after a lag and has a hard cut-of in the ACF after the lag. This lag value is taken as the value for q.
- The model is a mix of AR and MA if both the ACF and PACF trail of.

 note: refer to rob hyman for more reference.
 
## __Estimation__
Estimation involves using numerical methods to minimize a loss or error term used for estimating model parameters.


## _Diagnostic Checking_

### Overfitting:
Should have equal (similar) level of performance on in-sample and out-sample data set.

### Residual Errors:
Forecast residuals provide a great opportunity for diagnostics. A review of the distribution of errors can help tease out bias in the model. The errors from an ideal model would resemble white noise, that is a Gaussian distribution with a mean of zero and a symmetrical variance.
For this, you may use density plots, histograms, and Q-Q plots that compare the distribution of errors to the expected distribution. A non-Gaussian distribution may suggest an opportunity for data pre-processing. A skew in the distribution or a non-zero mean may suggest a bias in forecasts that may be correct.
Additionally, an ideal model would leave no temporal structure in the time series of forecast residuals. These can be checked by creating ACF and PACF plots of the residual error time series. The presence of serial correlation in the residual errors suggests further opportunity for using this information in the model.

## Summary

- About the ARIMA model and the 3 steps of the general Box-Jenkins Method.
-  How to use ACF and PACF plots to choose the p and q parameters for an ARIMA model.
-  How to use overfitting and residual errors to diagnose a fit ARIMA model.