# Autoregressive Integrated Moving Average Model
An ARIMA model is a class of statistical models for analyzing and forecasting time series data.

It explicitly caters to a suite of standard structures in time series data, and as such provides a simple yet powerful method for making skillful time series forecasts.

ARIMA is an acronym that stands for AutoRegressive Integrated Moving Average. It is a generalization of the simpler AutoRegressive Moving Average and adds the notion of integration.

This acronym is descriptive, capturing the key aspects of the model itself. Briefly, they are:

#### AR: Autoregression. A model that uses the dependent relationship between an observation and some number of lagged observations.

#### I: Integrated. The use of differencing of raw observations (e.g. subtracting an observation from an observation at the previous time step) in order to make the time series stationary.

#### MA: Moving Average. A model that uses the dependency between an observation and a residual error from a moving average model applied to lagged observations.

Each of these components are explicitly specified in the model as a parameter. A standard notation is used of ARIMA(p,d,q) where the parameters are substituted with integer values to quickly indicate the specific ARIMA model being used.

The parameters of the ARIMA model are defined as follows:

##### p: The number of lag observations included in the model, also called the lag order.
##### d: The number of times that the raw observations are differenced, also called the degree of differencing.
##### q: The size of the moving average window, also called the order of moving average.

A linear regression model is constructed including the specified number and type of terms, and the data is prepared by a degree of differencing in order to make it stationary, i.e. to remove trend and seasonal structures that negatively affect the regression model.

A value of 0 can be used for a parameter, which indicates to not use that element of the model. This way, the ARIMA model can be configured to perform the function of an ARMA model, and even a simple AR, I, or MA model.

Adopting an ARIMA model for a time series assumes that the underlying process that generated the observations is an ARIMA process. This may seem obvious, but helps to motivate the need to confirm the assumptions of the model in the raw observations and in the residual errors of forecasts from the model.

In this article, we will cover:

1. Methods used for converting non-stationary data into stationary data,
2. The ARIMA model,
3. The SARIMA model,

A real-world example of predicting the stock price of Microsoft,
Some hyper-parameter tuning to make the model more robust.
So, in machine learning, when the data is not in a Gaussian distribution we typically employ transformations like BOX-COX, or LOG. Similarly, when we have non-stationary time series data, there are two types of techniques to convert into stationary time series:

1. Differencing
2. Transformations

# Differencing
Differencing is one of the most important strategies to make a time series stationary. How does it work? 

Let me give you the intuition:

y' = yt - yt-1 .....................................................1st order Differentiation 

Differencing says instead of predicting yt directly try to predict the gap between yt and yt-1:

Differencing is very similar to differentiation. Yt’ is nothing but yt  – yt-1. If the time series is non-stationary, taking the difference is a great way. Now, instead of predicting yt  predict Yt’

Because we can predict  Yt’ we can compute yt as:

yt  =  Yt’ + yt-1   (Reconstruction)

Research has shown that Yt’ is typically more likely to be stationary than yt itself. Given 

y1 , y2, y3, y4, . . . . . . . . . . . yt, yt – 1, yt + 1  —-> Non stationary

Now we take the difference between y1 and y2 let’s call it y1’.

Similarly, we take the difference between y2 and y3 let’s call it y2’.

So on yt’.

y1’, y2’, y3’, . . . . . . . . . . . . yt’   —->  1st order difference (more likely to be stationary)

Instead of using 1st  order differentiation values, why not use 2nd order differentiation values?

Y1’’ , y2’’, y3’’, . . . . . . . . . . . . yt’’ 

y" = y't - y't-1      2nd  order differentiation.

If we plugin the formula for  yt’ in 2nd order differentiation, we get:

y" = yt - 2yt-1 + yt-2

Now we can construct y1 from the above equation as:

yt = y"+2yt-1 - yt-2

If your time series data is non-stationary, use differencing. It’s an extremely powerful technique. If the 1st order differentiation doesn’t work, you can use 2nd order differencing.

We can do Dth order differencing. This is the hyper-parameter here. We need to experiment with  1st, 2nd, 3rd so on to find the best value.

# Log-Transform

Differencing works very well with practical time series data. But it can’t be applied to all time series. There are other methods, like Log transform, just like there are different methods to convert data into a Gaussian distribution. Log transforms are some of the most popular transforms used.

In Log transforms, instead of modeling yt points, try to apply the logarithmic function to a point, and model the time series with the new log-transformed data. Let’s called the new point ỹt(y Tilda)

ỹt = log(yt)

All of these methods are from the 1970s, before modern computers took off or modern machine learning came into existence. The above two are the most popular methods to convert time series into stationary data. Of course, there are plenty of other methods that you can employ depending on your use case.

# ARIMA(p,q,d)

ARIMA is an acronym that stands for AutoRegressive Integrated Moving Average. It’s a class of models that captures a suite of different standard temporal structures in time series data. It explicitly caters to a suite of standard structures in time-series data, and as such provides a simple, powerful method for making skillful time-series forecasts. It’s a generalization of the simpler AutoRegressive Moving Average, with the added notion of integration.


Auto Regressive<================================      ============================================>Moving Average
                                                "     "
                                                "     "
                                                "     "
                                                "     "
                                                "     "
                                                "     "
                                                 ARIMA
                                                   "
                                                   "
                                                   "
                                                   "
                                                   "
                                              Interegrated (Opposite to differencing)
                                              

# How does the ARIMA model work?
Simply put, we have 3 parameters in ARIMA(p,q,d). 

p is from Auto Regression, q is from Moving Averages, and d is from differencing. 

d can be any order of differencing. 

All three parameters are hyper-parameters that need to experiment and figure out which fits best, just like K in K-NN. If d = 2 instead of predicting yt we will use yt’’ to model.

Now we have ARMA(p,q). What is ARMA? ARMA is ARIMA without the I, the Integrating part. 

p corresponding to AR and q corresponding to MA. The model looks like this:

Here, μ is some constant + linear combination of the previous p + linear combination of the previous q errored terms + the error this time(𝜖t).

What happens if we take 𝜖t to the other side of the equation? This becomes:

This equation is the same as the previous equation. We have a constant + linear combination of previous P values + linear combination of previous errored terms.

Imagine how we can model this into Linear Regression? We’ll take the previous p values as features, and previous q errors as features, 𝝰i and 𝞱j will be linear regression. This is also a linear regression problem. We can say ARIMA is nothing but a linear regression.

In a nutshell, ARIMA is:

1. P, q, d are hyper-parameters,
2. ARIMA(p, q, d) is a linear regression model on previous p values and previous q errors post differencing d times,
3. Also know as the Box-Jenkins model(1976).
    

# References:

1. https://neptune.ai/blog/time-series-forecasting
2. https://machinelearningmastery.com/sarima-for-time-series-forecasting-in-python/
3. https://facebook.github.io/prophet/
4. https://en.wikipedia.org/wiki/Power_transform