# Autocorrelation or Serial Correlation

In this tutorial we will focus on:
- Definition, test statistic and understanding its importance;
- Using numpy or pandas to access its coefficient.

## 1) Definition:

- **Autocorrelation**, also called **serial correlation**, is the correlation / similarity between events / observations as a function of the time lag between them.


**More info about:** https://en.wikipedia.org/wiki/Autocorrelation

## 2) Test statistic - Durbin-Watson:

The null hypothesis of the test is that there is no serial correlation.

The Durbin-Watson test statistics is defined as:

$$\sum_{t=2}^T((e_t - e_{t-1})^2)/\sum_{t=1}^Te_t^2$$

The test statistic is approximately equal to 2*(1-r) where ``r`` is the sample autocorrelation of the residuals. Thus, for r == 0, indicating no serial correlation, the test statistic equals 2. This statistic will always be between 0 and 4. The closer to 0 the statistic, the more evidence for positive serial correlation. The closer to 4, the more evidence for negative serial correlation.

**More info about:** https://en.wikipedia.org/wiki/Durbin–Watson_statistic

## 3) Why is it important?

In many statistics model analysis, autocorrelation violates the independence assumption of the events/observations and results in biased estimations/inferences;

In regression models, autocorrelation of the residuals (errors) violates the ordinary least squares (OLS) assumption that the residual (error) terms are uncorrelated, meaning that the OLS estimators are no longer the Best Linear Unbiased Estimators (BLUE).

# 4) Coding:

Font: https://www.statsmodels.org/stable/_modules/statsmodels/stats/stattools.html#durbin_watson

In [4]:
import numpy as np

def dw_test(residuals, axis=0):
    r"""
    Calculates the Durbin-Watson test statistic

    Parameters
    ----------
    residuals : array_like

    Returns
    -------
    dw_test_statistic : float, array_like

    Notes
    -----
    Ho:= There is no serial correlation.
    H1:= There is serial correlation.
    
    The Durbin-Watson test statistics is defined as:

    .. math::

       \sum_{t=2}^T((e_t - e_{t-1})^2)/\sum_{t=1}^Te_t^2
    
    This statistic will always be between 0 and 4.
    For r == 0, indicating no serial correlation;
    The closer to 0 the statistic, the more evidence for positive serial correlation. 
    The closer to 4, the more evidence for negative serial correlation.
    """
    residuals = np.asarray(residuals)
    diff_residuals = np.diff(residuals, 1, axis=axis)
    dw_test = np.sum(diff_residuals**2, axis=axis) / np.sum(residuals**2, axis=axis)
    return dw_test