## Autoregressive Model: Definition & The AR Process

**What is an Autoregressive Model?**

An autoregressive (AR) model predicts future behavior based on past behavior. It’s used for forecasting when there is some correlation between values in a time series and the values that precede and succeed them. You only use past data to model the behavior, hence the name autoregressive 


The AR process is an example of a stochastic process, which have degrees of uncertainty or randomness built in. The randomness means that you might be able to predict future trends pretty well with past data, but you’re never going to get 100 percent accuracy. Usually, the process gets “close enough” for it to be useful in most scenarios.



In [48]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sktime.utils.plotting import plot_correlations, plot_acf, plot_pacf, plot_series

## Lets build an AR(1) process

In [50]:
ar_process = lambda y_intercept, alpha, lag, noise : y_intercept + alpha*lag + noise

def build_ar_process(num_timesteps=300, intercept=0, lag_coef=0.9, p=1):
    rng = np.random.RandomState(seed=42)
    noise = rng.normal(size=num_timesteps)

    y = np.zeros(num_timesteps)

    for i in range(p, num_timesteps):
        y[i] = ar_process(intercept, lag_coef, y[i-p], noise[i-p])
    
    return y, noise





ar_1, white_noise = build_ar_process(p=1)

index = pd.date_range(start='2000-01-01', periods=300)
df = pd.DataFrame(data={'noise':white_noise, 'AR-1':ar_1}, index=index)

# AR-1 process
df.head()

Unnamed: 0,noise,AR-1
2000-01-01,0.496714,0.0
2000-01-02,-0.138264,0.496714
2000-01-03,0.647689,0.308778
2000-01-04,1.52303,0.925589
2000-01-05,-0.234153,2.35606


In [20]:
df.plot(y='AR-1', figsize=(15,4))

<img src='./plots/AR-1-plot.png'>

In [21]:
df.plot(y='noise', figsize=(15,4))

<img src='./plots/whitenoise.png'>

## Analysis


## what is white noise ?

A time series is white noise if the variables are independent and identically distributed with a mean of zero.

## Why Does it Matter?
White noise is an important concept in time series analysis and forecasting.

It is important for two main reasons:

* `Predictability`: If your time series is white noise, then, by definition, it is random. You cannot reasonably model it and make predictions.
* `Model Diagnostics` : The series of errors from a time series forecast model should ideally be white noise.
Model Diagnostics is an important area of time series forecasting.

Time series data are expected to contain some white noise component on top of the signal generated by the underlying process


## Is your Time Series White Noise?
Your time series is probably NOT white noise if one or more of the following conditions are true:

* Is the mean/level non-zero?
* Does the mean/level change over time?
* Does the variance change over time?
* Do values correlate with lag values?

Some tools that you can use to check if your time series is white noise are:

* Create a lag plot
* Calculate summary statistics. Check the mean and variance of the whole series against the mean and variance of meaningful contiguous blocks of values in the series (e.g. days, months, or years).
* Create an autocorrelation plot. Check for gross correlation between lagged variables.

In [71]:
# lets create a log plot

pd.plotting.lag_plot(series=df['noise'], lag=1);
plt.title('Lag plot of whitenoise');

<img src='./plots/whitenoise-lag-plot.png'>

### White-noise has no predictive information in past values and there are no correlation at any two points in time.

In [72]:
fig, ax = plt.subplots(nrows=2, ncols=3, figsize=(16, 8))

ax = ax.ravel()

for i,frame in enumerate(ax):
    pd.plotting.lag_plot(series=df['noise'], lag=i+1, ax=frame)

fig.suptitle('White-noise has no predictive information in past values');

<img src='./plots/whitenoise-lag-plot-4x3.png'>

### we can create a correlogram and check for any autocorrelation with lag variables.

In [73]:
ax = plot_correlations(df['noise']);
ax[0].set_constrained_layout(False)
ax[0].autofmt_xdate()

<img src='./plots/correlogram-whitenoise.png'>

## AR(1) process

#### An AR(1) autoregressive process is one in which the current value is based on the immediately preceding value, while an AR(2) process is one in which the current value is based on the previous two values. An AR(0) process is used for white noise and has no dependence between the terms


### Example of an Autoregressive Model
Autoregressive models are based on the assumption that past values have an effect on current values. For example, an investor using an autoregressive model to forecast stock prices would need to assume that new buyers and sellers of that stock are influenced by recent market transactions.

### `AR-(1) = intercept + lag_coef * lag-value + noise `

### Lets understand how an AR process behave for different values of lag_coef and y_intercept

lets build an AR process using `lag_coef` = `0.9` and `intercept`= `0`

In [74]:
ar_1, noise = build_ar_process(intercept=0, lag_coef=0.9, p=1)

# lets create a dataframe and store the results
df = pd.DataFrame(data=ar_1, columns=['AR-1'], index=pd.date_range(start='2012-01-01', periods=300))

# lets plot it
df.plot(y=['AR-1'], figsize=(15,4));

<img src='./plots/AR-1-plot-0.9-0.png'>

### Is there any predictive information in this timeseries ?
* By visual inspection its not that easy to answer this.
    * We need need sophisticated methods like
        * ACF
        * PACF
        * CCF
        * etc...

### AR - 1 

* By definition its predictive
    * `AR-(1) = intercept + lag_coef * lag-value + noise `
    * Our ar-process use a lag_coef = 0.9
        * This means its highy correlated with its past value
    

In [79]:
pd.plotting.lag_plot(series=df['AR-1'], lag=1);
plt.title('AR-1 LAG PLOT ');

<img src='./plots/AR-1-plot-LAG-plot.png'>

## Set Lag coef of AR-1 to `0.1` and intercept to `0`


When we set the lag coef to `0.1` out timeseries become very noisy

In [87]:
ar_1, noise = build_ar_process(intercept=0, lag_coef=0.1, p=1)

# lets store the results
df['AR-1 0.1'] = ar_1

# lets plot it
df.plot(y=['AR-1 0.1'], figsize=(15,4));

<img src='./plots/AR-1-plot-0.1-0.png'>

### When we set lag coef to `0.1` the AR-1 process become very noisy

we can see that there is not correlation 

In [88]:
pd.plotting.lag_plot(series=df['AR-1 0.1'], lag=1);
plt.title('LAG plot showing no correlation');

<img src='./plots/LAG-plot -no-correlation.png'>

## What will happen if we set the intercept term to a number greater than 1 ?

#### The timeseries settles at a new-baseline. This follows from the fact that the mean of an AR1 process when `abs(lag_coef) < 1` is given by ` mean = intercept / (1 - lag_coef)`

In [89]:
ar_1, noise = build_ar_process(intercept=2, lag_coef=0.9, p=1)

# lets store the results
df['AR-1 0.9 2'] = ar_1

# lets plot it
df.plot(y=['AR-1 0.9 2'], figsize=(15,4));

<img src='./plots/AR-1-plot-0.9-2.png'>

## What will happen when the lag coef is greater that 1

Suppose we build an AR-process with lag-coef greater than 1 
* The time series will grow exponentially
* The time series will not be stationary 


### Set lag-coef to 1.5 

In [101]:
ar_1, noise = build_ar_process(intercept=0, lag_coef=1.5, p=1)

# lets store the results
df['AR-1 1.5 0'] = ar_1

# lets plot it
df.plot(y=['AR-1 1.5 0'], figsize=(15,4), linewidth=4, c='r');
plt.title('Series grow exponentially , we set lag_coef to 1.5 so it grows by 50% every time-step');


<img src='./plots/AR-1-plot-exponential-growth-when-lag-coef-greater-than-1.png'>

## What will happen when lag-coef is less than -1

Suppose we set lag-coef to -1.5
* Series grows exponentially 
* Series oscillate between positive and negative domain

In [104]:
ar_1, noise = build_ar_process(intercept=0, lag_coef=-1.5, p=1)

# lets store the results
df['AR-1 -1.5 0'] = ar_1

# lets plot it
df.plot(y=['AR-1 -1.5 0'], figsize=(15,4), linewidth=4, c='salmon');
plt.title('Series grow exponentially , we set lag_coef to -1.5 so it grows by 50% every time-step but it oscillates');

<img src='./plots/AR-1-plot-exponential-growth-and-oscillate-when-lag-coef-less-than-neg1.png'>

### What will happen when we set the lag_coef to 1 and intercept greater than zero

When we set lag-coef to 1 and intercept to 1
* We can see the timeseries growing linearly
* We are adding a intercept term and noise to the past   

In [110]:
ar_1, noise = build_ar_process(intercept=1, lag_coef=1, p=1)

# lets store the results
df['AR-1 1 1'] = ar_1

# lets plot it
df.plot(y=['AR-1 1 1'], figsize=(15,4), linewidth=4, c='b');
plt.title('Series grow linearly, because lag-coef=1 and intercept term=1');

<img src='./plots/AR-1-plot-linear-growth-when-lag-coef-is-1-and-intercept-greater-than-zero.png'>

## Summary 

### `AR-(1) = intercept + lag_coef * lag-value + noise `

* ### When `abs(lag-coef)` is greater than `1`
    *   #### The series will grow exponentially
    *   #### The series will `not be stationary`

* ### When `lag-coef` is equal to `1` and `intercept` term is greater than `0`
    *   #### The series will grow linearly
    *   #### The series will `not be stationary`

* ### When `abs(lag-coef)` is less than `1`
    *   #### The series will varies around the mean
    *   #### The series will be `stationary`

* ### When `lag-coef` is less than `0`
    *   #### The series will ocillates between positve and negative domain
    *   #### `Stationary` only if `abs(lag_coef)` is less than `1`

* ###  An `AR(1)` autoregressive process is one in which the current value is based on the immediately preceding value 
* ### An `AR(2)` process is one in which the current value is based on the previous two values. 
* ### An `AR(0)` process is used for white noise and has no dependence between the terms
* ### An `AR(P)` process is used for series that depends on multiple lagged values