https://medium.com/data-science/non-stationarity-and-memory-in-financial-markets-fcef1fe76053

# Stationarity and Memory in Financial Markets

subtitle: Why you shouldn’t trust any stationarity test, and why memory has nothing to do with non-stationarity.

Stationarity and time series predictability, a special case of which is time series memory, are notions that are fundamental to the quantitative investment process. However, these are often misunderstood by practitioners and researchers alike, as attests Chapter 5 of the recent book Advances in Financial Machine Learning. I’ve had the pleasure to elucidate these misconceptions with some attendees of The Rise Of Machine Learning in Asset Management at Yale last week after the conference, but I’ve come to think that the problem is so widespread that it deserves a public discussion.

In this post I make a few poorly documented points about non-stationarity and memory in financial markets, some going against the econometrics orthodoxy. All arguments are backed by logic, maths, counter-examples and/or experiments with python code at the end.

The arguments made here can be divided into practical and technical arguments:

#### Technical Takeaways:

- It is impossible to test whether a time series is non-stationarity with a single path observed over a bounded time interval — no matter how long. Every statistical test of stationarity makes an additional assumption about the family of diffusions the underlying process belongs to. Thus, a null hypothesis rejection can either represent empirical evidence that the diffusion assumption is incorrect, or that the diffusion assumption is correct but the null hypothesis (e.g. the presence of a unit root) is false. The statistical test by itself is inconclusive about which scenario holds.
- Contrary to what is claimed in Advances in Financial Machine Learning, there is no “Stationarity vs. Memory Dilemma” (one has nothing to do with the other), and memory does not imply skewness or excess kurtosis.
- Iterated differentiation of a time series à la Box-Jenkins does not make a time series more stationary, it makes a time series more memoryless; a time series can be both memoryless and non-stationary.
- Crucially, non-stationarity but memoryless time series can easily trick (unit-root) stationarity tests.

The notions of memory and predictability of time series are tightly related, and we discussed the latter in our Yellow Paper. I’ll take this opportunity to share our approach to quantifying memory in time series.

#### Practical Takeaways:

- That markets (financial time series specifically) are non-stationary makes intuitive sense, but any attempt to prove it statistically is doomed to be flawed.
- Quantitative investment management needs stationarity, but not stationarity of financial time series, ‘stationarity’ or persistence of tradable patterns or alphas over (a long enough) time (horizon).


## Stationarity

Simply put, stationarity is the property of things that do not change over time.

    Quant Investment Managers Need Stationarity

At the core of every quantitative investment management endeavor is the assumption that there are patterns in markets that prevailed in the past, that will prevail in the future, and that one can use to make money in financial markets.

A successful search for those patterns, often referred to as alphas, and the expectation that they will persist over time, is typically required prior to deploying capital. Thus stationarity is a wishful assumption inherent to quantitative investment management.

    Stationarity In Financial Markets Is Self-Destructive

However, alphas are often victim of their own success. The better an alpha, the more likely it will be copied by competitors over time, and therefore the more likely it is to fade over time. Hence, every predictive pattern is bound to be a temporary or transient regime. How long the regime will last depends on the rigor used in the alpha search, and the secrecy around its exploitation.

The ephemerality of alphas is well documented; see for instance Igor Tulchinsky’s latest book, The Unrules: Man, Machines and the Quest to Master Markets, which I highly recommend.

In regards to the widespread perception that financial markets are highly non-stationary though, non-stationarity is often meant in a mathematical sense and usually refers to financial time series.

    Time Series Stationarity Can’t Be Disproved With One Finite Sample

In the case of time series (a.k.a. stochastic processes), stationarity has a precise meaning (as expected); in fact two.

A time series is said to be strongly stationary when all its properties are invariant by change of the origin of time, or time translation. A time series is said to be second-order stationary, or weakly stationary when its mean and auto-covariance functions are invariant by change of the origin of time, or time translation.

Intuitively, a stationary time series is a time series whose local properties are preserved over time. It is therefore not surprising that it has been a pivotal assumption in econometrics over the past few decades, so much so that it is often thought that practitioners ought to first make a time series stationary before doing any modeling, at least in the Box-Jenkins school of thought.

This is absurd for the simple reason that, (second order) stationarity, as a property, cannot be disproved from a single finite sample path. Yes, you read that right! Read on to understand why.

But before delving into an almost philosophical argument, let’s take a concrete example.

IMAGE

Let’s consider the plot above. Is this the plot of a stationary time series? If you were to answer simply based on this plot, you would probably conclude that it is not. But I’m sure you see the trick coming, so you would probably want to run a so-called ‘stationarity test’, perhaps one of the most widely used, the Augmented-Dickey-Fuller test. Here’s what you’d get if you were to do so (source code at the end):

ADF Statistic: 4.264155
p-Value: 1.000000
Critical Values:
	1%: -3.4370
	5%: -2.8645
	10%: -2.5683

As you can see, the ADF test can’t reject the null hypothesis that the time series is an AR that has a unit root, which would (kind of) confirm your original intuition.

Now, if I told you that the plot above is a draw from a Gaussian process with mean 100 and auto-covariance function

EQUATION

then I am sure you’d agree that it is indeed a draw from a (strongly) stationary time series. After all, both its mean and auto-covariance functions are invariant by time translation.

If you’re still confused, here’s the same draw over a much longer time horizon:

IMAGE

I’m sure you must be thinking that it looks more like what you’d expect from a stationary time series (e.g. it is visually mean-reverting). Let’s confirm that with our ADF test:

ADF Statistic: -4.2702
p-Value: 0.0005
Critical Values:
	1%: -3.4440
	5%: -2.8676
	10%: -2.5700

Indeed, we can reject the null hypothesis that the time series is non-stationarity at a 0.05% p-Value, which gives us strong confidence.

However, the process hasn’t changed between the two experiments. In fact even the random path used is the same, and both experiments have enough points (at least a thousand each). So what’s wrong?

Intuitively, although the first experiment had a large enough sample size, it didn’t span long enough a time interval to be characteristic of the underlying process, and there is no way we could have known that beforehand!

The takeaway is that it is simply impossible to test whether a time series is stationary from a single path observed over a finite time interval, without making any additional assumption.

Two assumptions are often made but routinely overlooked by practitioners and researchers alike, to an extent that results in misinformed conclusions; an implicit assumption and an explicit assumption.

### 1. The Implicit Assumption

Stationarity is a property of a stochastic process, not of a path. Attempting to test stationarity from a single path ought to implicitly rely on the assumption that the path at hand is sufficiently informative about the nature of the underlying process. As we saw above, this might not be the case and, more importantly, one has no way of ruling out this hypothesis. Because a path does not look mean-reverting does not mean that the underlying process is not stationary. You might not have observed enough data to characterize the whole process.

Along this line, any financial time series, whether it passes the ADF test or not, can always be extended into a time series that passes the ADF test (hint: there exist stationary stochastic processes whose space of paths are universal). Because we do not know what the future holds, strictly speaking, saying that financial time series are non-stationary is slightly abusive, at least as much so as saying that financial time series are stationary.

In the absence of evidence of stationarity, a time series should not be assumed to be non-stationary — we simply can’t favor one property over the other statistically. This works similarly to any logical reasoning about a binary proposition A: no evidence that A holds is never evidence that A does not hold.

Assuming that financial markets are non-stationarity might make more practical sense as an axiom than assuming that markets are stationary for structural reasons. For instance, it wouldn’t be far fetch to expect productivity, global population, and global output, all of which are related to stock markets, to increase over time. However, would not make more statistical sense, and it is a working hypothesis that we simply cannot invalidate (in insolation) in light of data.

### 2. The Explicit Assumption

Every statistical test of stationarity relies on an assumption on the class of diffusions in which the underlying process’ diffusion must lie. Without this, we simply cannot construct the statistic to use for the test.

Commonly used (unit root) tests typically assume that the true diffusion is an Autoregressive or AR process, and test the absence of a unit root as a proxy for stationarity.

The implication is that such tests do not have as null hypothesis that the underlying process is non-stationary, but instead that the underlying process is a non-stationary AR process!

Hence, empirical evidence leading to reject the null hypothesis could point to either the fact that the underlying process is not an AR, or that it is not stationary, or both! Unit root tests by themselves are not enough to rule out the possibility that the underlying process might not be an AR process.

The same holds for other tests of stationarity that place different assumptions on the underlying diffusion. Without a model there is no statistical hypothesis test, and no statistical hypothesis test can validate the model assumption on which it is based.

    Seek Stationary Alphas, Not Stationary Time Series

Given that we cannot test whether a time series is stationary without making an assumption on its diffusion, we are faced with two options:

- Make an assumption on the diffusion and test stationarity
- Learn a predictive model, with or without assuming stationarity

The former approach is the most commonly used in the econometrics literature because of the influence of the Box-Jenkins method, whereas the latter is more consistent with the machine learning spirit consisting of flexibly learning the data generating distribution from observations.

Modeling financial markets is hard, very hard, as markets are complex, almost chaotic systems with very low signal-to-noise ratios. Any attempt to properly characterize market dynamics — for instance by attempting to construct stationary transformations — as a requirement for constructing alphas, is brave, counterintuitive, and inefficient.

Alphas are functions of market features that can somewhat anticipate market moves in absolute or relative terms. To be trusted, an alpha should be expected to be preserved over time (i.e. be stationary in a loose sense). However, whether the underlying process itself is stationary or not (in the mathematical sense) is completely irrelevant. Value, size, momentum and carry are some examples of well documented trading ideas that worked for decades, and are unrelated to the stationarity of price or returns series.

But enough with stationarity, let’s move on to the nature of memory in markets.

## Memory

Intuitively, a time series should be thought to have memory when its past values are related to its future values.

To illustrate a common misunderstanding about memory, let’s consider a simple but representative example. In Advances in Financial Machine Learning, the author argues that

“Most economic analyses follow one of two paradigms:

- Box-Jenkins: returns are stationary, however memory-less
- Engle-Ganger: Log-prices have memory, however they are non-stationary, and co-integration is the trick that make regression work on non-stationary time series […]”

To get the best of both words, the author suggests constructing the weighted moving average process

EQUATION

whose coefficients are determined based on the notion of fractional differentiation with a fixed-window, as an alternative to log-returns (order 1 differentiation on log-prices). The author recommends choosing the smallest degree of fractional differentiation 0 < d < 1 for which the moving average time series passes the ADF stationarity test (at a given p-Value).

The whole approach begs a few questions:

- Is there really a dilemma between stationarity and memory?
- How can we quantify memory in time series so as to confirm whether or not they are memoryless?
- Assuming we could find a stationary moving average transformation with a lot of memory, how would that help us generate better alphas?

    Quantifying Memory

Intuitively, it is easy to see that moving average processes exhibit memory by construction (consecutive observations of a moving average are bound to be related as they are computed in part using the same observations of the input time series). However, not every time series that has memory is a moving average. To determine whether stationary time series have memory, one ought to have a framework for quantifying memory in any time series. We’ve tackled this problem in our Yellow Paper, and here’s a brief summary.

The qualitative question guiding any approach to measuring memory in time series is the following. Does knowing the past inform us about the future? Said differently, does knowing all past values until now reduce our uncertainty about the next value of the time series?

A canonical measure of uncertainty in a random variable is its entropy, when it exists.

EQUATION

Similarly, the uncertainty left in a random variable after observing another random variable is typically measured by the conditional entropy.

A candidate measure of the memory in a time series is therefore the uncertainty reduction about a future value of the time series that can be achieved by observing all past values, in the limit case of an infinite number of such past values. We call this the measure of auto-predictability of a time series.

EQUATION

When it exists, the measure of auto-predictability is always non-negative, and is zero if and only if all samples of the time series across time are mutually independent (i.e. the past is unrelated to the future, or the time series is memoryless).

In the case of stationary time series, PR({X}) always exists and is given by the difference between the entropy of any observation and the entropy rate of the time series.

EQUATION

In our Yellow Paper, we propose a maximum-entropy based approach for estimating PR({X}). The following plot illustrates how much memory there is in stocks, futures and currencies.

IMAGE

    Memory Has Nothing To Do With Stationarity

A direct consequence of the discussion above is that a time series can both be stationary, and have a lot of memory. One does not preclude the other and, in fact, one is simply not related to the other.

Indeed, in the case of stationary Gaussian processes, it can be shown that the measure of auto-predictability reads

EQUATION

It’s worth noting that PR({X})=0 if and only if the power spectrum is constant, that is, the time series is a stationary Gaussian white noise, otherwise PR({X})>0. A stationary white noise doesn’t lack memory because it is stationary, it lacks memory because it is, well […], a white noise!

The more uneven the power spectrum is, the more memory there is in the time series. The flatter the auto-covariance function, the steeper the power spectrum, and therefore the higher the measure of auto-predictability, and the more memory the time series has. An example such flat auto-covariance function is the Squared-Exponential covariance function

EQUATION

in the limit where the input length scale parameter l goes to infinity.

IMAGE

In short, there is no stationarity vs. memory dilemma. The confusion in practitioners’ minds comes from a misunderstanding of what goes on during iterated differentiation, as advocated by the Box-Jenkins methodology. More on that in the following section.

    Memory Has Nothing To Do With Skewness/Kurtosis

Another misconception about memory (see for instance Chapter 5, page 83 of the aforementioned book) is that there is “skewness and excess kurtosis that comes with memory”. This is also incorrect. As previously discussed it is possible to generate time series that are Gaussian (hence neither skewed nor leptokurtic), stationary, and have arbitrarily long memories.

## Iterated Differentiation, Stationarity And Memory

    Iterated Differentiation Does Not Make A Time Series More Stationary, It Makes A Time Series More Memoryless!

Differentiation of (discrete-time) time series, in the Backshift Operator sense, works much like differentiation of curves learned in high-school.

The more we keep differentiating a curve, the more likely the curve will undergo a discontinuity/abrupt change (unless of course it is infinitely differentiable).

Intuitively, in the same vein, the more a time series is differentiated in the backshift operator sense, the more shocks (in a stochastic sense) the time series will undergo, and therefore the closer its samples will get to being mutually independent, but not necessarily identically distributed!

Once a time series has been differentiated enough times that it has become memoryless (i.e. it has mutually independent samples), it is essentially a random walk, although not necessarily a stationary one. We can always construct a non-stationary time series that, no matter how many times it is differentiated, will never become stationary. Here’s an example:

EQUATION

Its order-1 differentiation is completely memoryless as increments of the Wiener process are independent.

EQUATIONS

Its variance function g(t) is time-varying, and therefore {y} is non-stationary.

Similarly, the order-(d+1) differentiation of {y} is both memoryless and non-stationary for every d>0. Specifically, subsequent iterated differentiations read

EQUATION

and their time-dependent variance functions read

EQUATION

This expression clearly explodes in t for every d, and does not converge in d for any t. In other words, consecutive differentiations do not even-out the variance function, and therefore do not make this time series more stationary!

    A Random Walk, Stationary Or Not, Would Typically Pass Most Unit-Root Tests!

The confusion in practitioners’ minds about iterated differentiation and stationarity stems from the fact that most unit root tests will conclude that a memoryless time series is stationary, although this is not necessarily the case.

Let’s consider the ADF test for instance.

EQUATION

If a time series {y} is memoryless but not stationary, the Ordinary Least Square (OLS) fit underpinning the ADF test cannot result in a perfect fit. How would this departure be accounted for by OLS with a large enough sample? As the time series is memoryless, OLS will typically find evidence that γ is close to 1, so that the ADF test ought to reject the null hypothesis that γ=0, to conclude that the time series does not have a unit root (i.e. is a stationary AR). The time-varying variance of {y} will typically be observed by the stationary noise term {e}.

To illustrate this point, we generate 1000 random draws uniformly at random between 0 and 1, and we use these draws as standard deviations of 1000 independently generated mean-zero Gaussians. The result is plotted below.

IMAGE

An ADF test run on this sample clearly reject the null hypothesis that the time series is a draw from an AR with unit root, as can be seen from the statistic below.

ADF Statistic: -34.0381
p-Value: 0.0000
Critical Values:
	1%: -3.4369
	5%: -2.8644
	10%: -2.5683

At this point, practitioners often jump to the conclusion that the time series ought to be stationary, which is incorrect.

As previously discussed, a time series that is not a non-stationary AR is not necessarily stationary; it is either not an AR time series at all, or it is an AR that is stationary. In general the ADF test itself is inconclusive about which of the two assertions holds. In this example however, we know that the assumption that is incorrect is not non-stationarity, it is the AR assumption.

## Concluding Thoughts

Much attention has been devoted to the impact AI can have on the investment management industry in the media, with articles riding the AI hype, warning about the risk of backtest overfitting, making the case that the signal-to-noise ratio in financial markets rules out an AI revolution, or even arguing that AI has been around in the industry for decades.

In these media coverages machine learning is often considered to be a static field, exogenous to the finance community, a set of general methods developed by others. However, the specificities of the asset management industry warrant the emergence of new machine learning methodologies, crafted with a finance-first mindset from the ground up, and questioning long-held dogmas. One of the biggest hurdles to the emergence of such techniques is perhaps the widespread misunderstanding of simple but fundamental notions, such as stationarity and memory, that are at the core of the research process.

# ====================================================================
# ====================================================================
# ====================================================================
# ====================================================================
# ====================================================================


https://medium.com/data-science/when-a-time-series-only-quacks-like-a-duck-10de9e165e

# When A Time Series Only Quacks Like A Duck

subtitle: Testing for Stationarity Before Running Forecast Models. With Python. And A Duckling Picture.

ADF, KPSS, OSCB, and CH tests for stationarity and for a stable seasonal pattern; and how to deal with them if they provide contradictory results.

To avoid a trap that could lead to a deficient forecast model, we will apply the ADF and the KPSS tests in parallel to check if the time series not only quacks like a duck, but also waddles like waterfowl is supposed to. We will also run the OCSB and CH tests to check if seasonal differencing is required.

Our source consists of 1200 months of historical temperature records for the small (and entirely fictional) town of Lower Tidmarsh, East Dakotahoma. The Lower Tidmarsh town archive was destroyed by a kitchen fire in the 1980s before (or, as some residents told us, because) the volunteer fire brigade came to the rescue. The temperature records had to be reconstructed by interviewing the two centennial residents. The time series is synthetic, consisting of a sinusoidal seasonal component that mirrors the harsh winters and moderate summers in East Dakota; a global warming trend over the past century; and a white noise component representing the estimation uncertainty.

You can download the small Temp.csv source file (~33 kB) from Google Drive via the link shown above. The Jupyter notebook is available via the second link.

## 0. Dependencies

```python
# Stationarity

import pandas as pd
import datetime as dt
import numpy as np
import matplotlib.pyplot as plt

import pmdarima as pmd

import statsmodels.api as sm
from statsmodels.tsa.stattools import adfuller, kpss
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from sklearn.metrics import mean_squared_error

from math import sqrt
import warnings
import sys


ALPHA = 0.05                        # significance level

warnings.filterwarnings("ignore")
```

## 1. Data Processing

```python
# read the source data file
df = pd.read_csv("Temp.csv")
df
```

Download the source data file Temp.csv.

IMAGE

Pandas imports .csv date columns as objects/strings. Therefore, we convert the dates to datetime, set an index, and derive year and month from the index.


```python
# convert objects/strings to datetime and numbers; set datetime index
df = df.dropna()
df.columns = ["idx", "Date", "Temp"]             # rename columns
df.Date = df["Date"]            
df.Date = pd.to_datetime(df.Date)       # convert imported object/string to datetime
df.set_index(df.Date, inplace=True)     # set Date as index
df["year"] = df.index.year
df["month"] = df.index.month

df["idx"] = np.int64(df["idx"])

df.info()
```

Let’s create a pandas pivot table to look at the source data in tabular form.

```python
# tabular view of our source data via pivot table
pivot = pd.pivot_table(
    df, values='Temp', index='month', columns='year', 
    aggfunc='mean', margins=True, margins_name="Avg", fill_value="")
pivot.transpose()
```

We use the pivot table to compute the 10-year rolling average temperature that will iron out the short-term fluctuations of seasonal peaks and dips, and then create a chart to study the long-term trend, if there is any.

The plot shows a rising trend — a first indication that our time series is not stationary.

```python
# use the pivot table aggregations to plot the 10-year rolling average temperature and see the trend

year_avg = pd.pivot_table(df, values='Temp', index='year', aggfunc='mean')
year_avg['10 Years RA'] = year_avg['Temp'].rolling(10).mean()

year_avg[['Temp','10 Years RA']].plot(figsize=(20,6))
plt.title('Average Temperature')
min_Y = df['year'].min()
max_Y = df['year'].max()
plt.xticks([x for x in range(max_Y, min_Y, -10)])
plt.show()
```

Before we can feed the temperature into a forecast model such as SARIMA, we need to test it for stationarity.

We may be tempted to just kick off some kind of grid-search for suitable hyperparameters and then leave it to the auto-tuning process to identify the model with the lowest Akaike information criterion. But this can lead to the forecast quality trap mentioned earlier.

- The information criteria represent the objective we want to minimize with respect to the autoregressive AR and moving average MA terms;
- whereas the order of differencing must be determined in advance, by running tests for stationarity.



## 2. Testing for Stationarity
## 2.1 Stationarity and Differencing

### Stationarity

“A stationary time series is one whose properties don’t depend on the time at which the series is observed.” (Hyndman: 8.1 Stationarity and differencing | Forecasting: Principles and Practice (2nd ed) (otexts.com))

A time series is stationary if its mean, variance, and autocorrelation structure do not change over time. If they are not time-invariant, the properties we use today to prepare a forecast would be different from the properties we would observe tomorrow. A process that is not stationary would elude our methods for using past observations to predict the future development. The time series itself does not need to remain a flat, constant line in past and future periods to be deemed stationary — but the patterns that determine its changes over time need to be stationary to make its future behavior predictable.

The time series needs to exhibit:

- time-invariant mean
- time-invariant variance
- time-invariant autocorrelations

Time series with observations that are not stationary a priori can often be transformed to reach stationarity.

### Inconstant Mean

A series that shows a robust upward or downward trend does not have a constant mean. But if its data points tend to revert to the trendline after disturbances, the time series is trend-stationary. A transformation such as de-trending may turn it into a stationary time series that can be used in the forecast model. If the trend follows a predictable pattern, we can fit a trendline to the observations and then subtract it before we feed the de-trended series into the forecast model. Alternatively, we may be able to insert a datetime index into the model as an additional independent variable.

If these de-trending measures do not suffice to realize a constant mean, we can investigate if the differences from one observation to the next have a constant mean.

By differencing the time series — taking the difference between an observation y(t) and an earlier observation y(t-n) — we could obtain a stationary (mean-reverting) series of the changes.

A time series in which any observation only depends on one or more of its predecessors (separated by a few lags), plus or minus some random error, is called a random walk. The differences between observations have a zero mean, apart from the error term, which by definition has a zero mean itself if it does not contain a signal with valuable information for the forecast. Random walks may exhibit long phases of apparent trends, up or down, followed by unpredictable changes of direction. A constant average trend requires one order of differencing.

A time series in which the differences between neighboring observations have a non-zero mean will tend to drift upwards (positive mean) or downwards (negative mean). We difference a time series with drift to get a series with constant mean.

Some time series require two rounds of differencing. The changes between observations are not constant (no constant “speed” between observations), but the change rate may be stable (constant “acceleration” or “deceleration”). If two rounds of differencing do not suffice to make a time series stationary, a third round is rarely justifiable. Rather, the properties of the time series should be investigated more closely.

A time series with seasonality will exhibit patterns that repeat after a constant number of periods: temperatures in January differ from those in July, but January temperatures will be at a similar level between years. Seasonal differencing takes the difference between an observation and its predecessor that is S lags removed, with S being the number of periods in a full season, like 12 months in a year or 7 days in a week.

If both the trend and the seasonal pattern are relatively time-invariant, the differenced time series (first-differenced with respect to the trend; and seasonally-differenced with respect to the seasonality) will have an approximately constant mean.

### Inconstant variance

If the time series takes on the shape of an expanding or narrowing funnel, then its observations fluctuate around its trend with an increasing or decreasing variance over time. Its variance is not time-invariant.

By taking the logarithm of the observations, their square root, or by applying a Box-Cox-transformation, we may be able to stabilize the variance through transformations. After the forecast, we could reverse these transformations.

### Inconstant autocorrelation structure

The correlation and covariance between two observations y(t) and y(t-1), for any given t, do not remain constant over time. For stationarity, the autocorrelations should be time-invariant.

### PSA #1: Determine Stationarity Before Fitting A Model

The required order of differencing is a parameter that should be determined in advance, before fitting a forecast model to the data. A tuning algorithm can test any combinations of hyperparameters against a chosen benchmark such as the Akaike information criterion. But some of the hyperparameters may neutralize each other’s effects. A hyperparameter search in a SARIMA model will trade autoregressive AR and moving-average MA terms for changes in the order of differencing.

“It is important to note that these information criteria tend not to be good guides to selecting the appropriate order of differencing (d) of a model, but only for selecting the values of p and q. This is because the differencing changes the data on which the likelihood is computed, making the AIC values between models with different orders of differencing not comparable. So we need to use some other approach to choose d, and then we can use the AICc to select p and q.” (Hyndman, 8.6 Estimation and order selection | Forecasting: Principles and Practice (2nd ed) (otexts.com)).

Thus, if a hyperparameter search attempts to determine the order of differencing in parallel with the other parameters, we may obtain an inferior forecast model. The search would find an order of differencing that apparently minimizes AIC or BIC. But it may have missed a model that could lead to more accurate predictions despite its higher AIC. The search algorithm is unaware that its objective, the information criterion, cannot compare models with different orders of differencing.

Either the tuning algorithm should apply hypothesis tests to determine the appropriate order of differencing before it starts a grid search for the other hyperparameters; or the data scientist pins down the order of differencing and then limits the grid search to the remaining parameters such as the AR and MA terms.

### PSA #2: Conduct Parallel Tests for Stationarity

To find out if differencing is required, we can run four tests to obtain objective results, which a visual inspection of charts may miss:

- Augmented Dickey-Fuller ADF
- Kwiatkowski-Phillips-Schmidt-Shin KPSS
- Osborn-Chui-Smith-Birchenhall OCSB for seasonal differencing
- Canova-Hansen CH for seasonal differencing

I will skip some other unit root tests, such as Phillips-Peron.

These tests may return contradictory results in quite a few cases. The following example will demonstrate that ADF and KPSS should be evaluated in parallel, not in isolation. Many of us — I included, when I prepared my first forecasts — are used to rely on the ADF test as our default for stationarity tests; others prefer the KPSS test. Few among us, I suppose, routinely apply and then compare both tests to decide on differencing.



## 2.2 Augmented Dickey-Fuller Test (pmdarima) — Quacks Like A Duck?

- Null hypothesis: the series contains a unit root: it is not stationary.
- Alternative hypothesis: there is no unit root.
- Low p-values are preferable. If the test returns a p-value below the chosen significance level (e.g. 0.05), we reject the null and conclude that the series does not contain a unit root.
- If the ADF test does not find a unit root, but the KPSS test does, the series is difference-stationary: it still requires differencing.
- The pmdarima tests, both ADF and KPSS, provide as outputs the p-value; and a Boolean value that is the answer to the the question: “Should we difference?”

```python
# pmdarima - ADF test - should we difference?
# ADF null hypothesis: the series is not stationary 
def ADF_pmd(x):
    adf_test = pmd.arima.stationarity.ADFTest(alpha=ALPHA)
    res = adf_test.should_diff(x)
    conclusion = "non-stationary" if res[0] > ALPHA else "stationary"
    resdict = {"should we difference? ":res[1], "p-value ":res[0], "conclusion":conclusion}
    return resdict

# call the ADF test:
resADF = ADF_pmd(df["Temp"])

# print test result dictionary:
print("ADF test result for original data:")
[print(key, ":", value) for key,value in resADF.items()]
```

## 2.3 Kwiatkowski-Phillips-Schmidt-Shin Test (KPSS) (pmdarima) — But It Does Not Walk Like A Duck?

- Null hypothesis: the series is stationary around a deterministic trend (trend-stationary).
- Note that the KPSS test swaps the null hypothesis and alternative hypothesis, compared to the ADF test.
- Alternative hypothesis: the series has a unit root. It is non-stationary.
- High p-values are preferable. If the test returns a p-value above the chosen significance level (e.g. 0.05), we conclude that it appears to be (at least trend-)stationary.
- If the KPSS test does not find a unit root, but the ADF test does, the series is trend-stationary: it requires differencing (or other transformations such as de-trending) to remove the trend.

```python
# pmdarima - KPSS test -  should we difference?
# null hypothesis: the series is at least trend stationary 
def KPSS_pmd(x):
    kpss_test = pmd.arima.stationarity.KPSSTest(alpha=ALPHA)
    res = kpss_test.should_diff(x)
    conclusion = "not stationary" if res[0] <= ALPHA else "stationary"
    resdict = {"should we difference? ":res[1], "p-value ":res[0], "conclusion":conclusion}
    return resdict

# call the KPSS test:
resKPSS = KPSS_pmd(df["Temp"])

# print test result dictionary:
print("KPSS test result for original data:")
[print(key, ":", value) for key,value in resKPSS.items()]
```

## 2.4 Compare the ADF and KPSS Test Results (pmdarima)

```python
# compare ADF and KPSS result
test_values = zip(resADF.values(), resKPSS.values())
dict_tests = dict(zip(resADF.keys(), test_values))
df_tests = pd.DataFrame().from_dict(dict_tests).transpose()
df_tests.columns = ["ADF", "KPSS"]
print("Stationarity test results for original data:")
df_tests
```

Thus, the pmdarima tests return conflicting results.

IMAGE



## 2.5 Order of Differencing Recommended by ADF and KPSS

pmdarima also offers a method that returns the recommended order of first-differencing.

The recommendations are contradictory as well, because the same ADF and KPSS tests are used to derive them.

But we will come back to these orders of differencing later, when we will wrap up our findings and decide how to proceed.

```python
# pmdarima also offers methods that suggest the order of first differencing, based on either ADF or the KPSS test

n_adf = pmd.arima.ndiffs(df["Temp"], test="adf")
n_kpss = pmd.arima.ndiffs(df["Temp"], test="kpss")
n_diffs = {"ADF ndiff":n_adf, "KPSS ndiff":n_kpss}
print("recommended order of first differencing for original data:")
[print(key, ":", value) for key,value in n_diffs.items()]
```

Let’s check with the statsmodels.stattools tests if this is just a quirk in the pmdarima algorithm (hint: it is not).



## 2.6 Augmented Dickey-Fuller Test (stattools) — Quacks Like A Duck?

- We use the adfuller test of statsmodels.stattools to obtain additional information compared to the pmdarima tests.
- Null hypothesis: the series contains a unit root, it is not stationary.
- Alternative hypothesis: there is no unit root.
- Low p-values are preferable. If the test returns a p-value below the chosen significance level (e.g. 0.05), we reject the null and conclude that the series does not contain a unit root. It appears to be stationary.
- If the ADF test does not find a unit root, but the KPSS test does, the series is difference-stationary: it requires differencing.

```python
# We apply the ADF and KPSS tests of statsmodels.stattools:


# statsmodels - ADF test
# null hypothesis: There is a unit root and the series is NOT stationary 
# Low p-values are preferable
# get results as a dictionary
def ADF_statt(x):
     adf_test = adfuller(x, autolag="aic")
     t_stat, p_value, _, _, _, _  = adf_test
     conclusion = "non-stationary (unit root)" if p_value > ALPHA else "stationary"
     res_dict = {"ADF statistic":t_stat, "p-value":p_value, "should we difference?": (p_value > ALPHA), "conclusion": conclusion}
     return res_dict


# call the ADF test:
resADF = ADF_statt(df["Temp"])

# print dictionary of test results:
print("ADF test result for original data:")
# [print(key, ":", f'{value:.3f}') for key,value in resADF.items()]
[print(key, ":", value) for key,value in resADF.items()]
```

## 2.7 Kwiatkowski-Phillips-Schmidt-Shin Test (KPSS) (stattools) — Does Not Walk Like A Duck?

- Null hypothesis: the series is stationary around a deterministic trend (trend-stationary).
- Alternative hypothesis: the series has a unit root. It is non-stationary.
- High p-values are preferable. If the test returns a p-value above the chosen significance level (e.g. 0.05), we conclude that it appears to be at least trend-stationary.
- If the KPSS test does not find a unit root, but the ADF test does, the series is trend-stationary: it requires differencing or other transformations to remove the trend.

```python
# statsmodels - KPSS test
# more detailed output than pmdarima
# null hypothesis: There series is (at least trend-)stationary 
# High p-values are preferable
# get results as a dictionary
def KPSS_statt(x):
     kpss_test = kpss(x)
     t_stat, p_value, _, critical_values  = kpss_test
     conclusion = "stationary" if p_value > ALPHA else "not stationary"
     res_dict = {"KPSS statistic":t_stat, "p-value":p_value, "should we difference?": (p_value < ALPHA), "conclusion": conclusion}
     return res_dict


# call the KPSS test:
resKPSS = KPSS_statt(df["Temp"])

# print dictionary of test results:
# [print(key, ":", f'{value:.3f}') for key,value in resKPSS.items()]
print("KPSS test result for original data:")
[print(key, ":", value) for key,value in resKPSS.items()]
```

## 2.8 Compare the ADF and KPSS results — ADF quacks like a duck, but KPSS does not walk like waterfowl

```python
# compare ADF and KPSS result
test_values = zip(resADF.values(), resKPSS.values())
dict_tests = dict(zip(resADF.keys(), test_values))
df_tests = pd.DataFrame().from_dict(dict_tests).transpose()
df_tests.columns = ["ADF", "KPSS"]
print("Stationarity Tests for original data, before differencing:")
df_tests
```

## 2.9 Difference or Don’t Difference?

- So the ADF test does not find a unit root even though the chart above shows a clear upward trend.
- The KPSS test reports that the series is not stationary.

How do we deal with the conflict? Is the KPSS test always correct?

## 2.10 Visual Plausibility Check: Decomposition

```python
# decomposition - let's decompose the time series, so we can clearly see its rising trend
# which confirms that the series is not stationary

from statsmodels.stats.stattools import durbin_watson

def plot_stationarity(y, lags):
   
    y = pd.Series(y)

    # decompose the time series into trend, seasonality and residuals
    decomp = sm.tsa.seasonal_decompose(y)
    # decomp.plot()
    # plt.show()
    trend = decomp.trend
    seas = decomp.seasonal
   

    fig = plt.figure()
    fig.set_figheight(10)
    fig.set_figwidth(18)
    

    ax1 = plt.subplot2grid((3, 3), (0, 0), colspan=2)
    ax2 = plt.subplot2grid((3, 3), (1, 0))
    ax3 = plt.subplot2grid((3, 3), (1, 1))
    #ax4 = plt.subplot2grid((3, 3), (1, 1))
    ax5 = plt.subplot2grid((3, 3), (2, 0))
    ax6 = plt.subplot2grid((3, 3), (2, 1))

    y.plot(ax=ax1)
    ax1.set_title("Rolling 12-Month Temperature")
    ax1.set_title("Temperature")

    trend.plot(ax=ax2)
    ax2.set_title("Trend Component")

    seas.plot(ax=ax3)
    ax3.set_title("Seasonal Component")

    # resid.plot(ax=ax4)
    # ax4.set_title("Residual Component")
    
    plot_acf(y, lags=lags, zero=False, ax=ax5);
    plot_pacf(y, lags=lags, zero=False, ax=ax6);

    plt.tight_layout()


# get the plots for the time series before differencing
plot_stationarity(df["Temp"], 10)

```

- The trend chart does not show a constant mean, but rather an upward trend. The series cannot be stationary.
- The autocorrelation plot shows high and persistent autocorrelations in its ACF and PACF charts, with seasonal oscillations. The series cannot be stationary if it exhibits stable seasonality.



## 2.11 First-Difference: Reaching Stationarity

We apply the differencing method .diff() to the original time series; and then check for stationarity with both ADF and KPSS.

```python
# ADF and KPSS tests after differencing:

n_diff = max(n_adf, n_kpss)   
df_diff1 = df["Temp"].diff(n_diff).dropna()

resADF = ADF_statt(df_diff1)
resKPSS = KPSS_statt(df_diff1)
test_values = zip(resADF.values(), resKPSS.values())
dict_tests = dict(zip(resADF.keys(), test_values))
df_tests = pd.DataFrame().from_dict(dict_tests).transpose()
df_tests.columns = ["ADF", "KPSS"]

print("Stationary after 1 round of first-differencing?")
df_tests
```

ADF and KPSS agree that the differenced series is stationary. The differenced series not only quacks like a duck, it also walks like one.

```python
# plot the differenced series
plot_stationarity(df_diff1, 25)
```



## 2.12 Stationary — But What About The Seasonality?

We have applied first-differences and received favorable test results from ADF and KPSS. Though the ACF plot still shows seasonal fluctuations.

Let’s run the OCSB and CH tests to decide if we need a helping of seasonal differencing as well.

The pmdarima implementations of both tests return the recommended orders of seasonal differencing.

Osborn-Chui-Smith-Birchenhall OCSB Test:

- Null hypothesis: the series contains a seasonal unit root
- It uses a Dickey-Fuller type regression. (ocsb: OCSB test in seastests: Seasonality Tests (rdrr.io) )

Canova-Hansen Test for Seasonal Stability:

- Null hypothesis: the seasonal pattern is stable over time

```python
# time series before first differencing
# OCSB test that returns the recommended order of seasonal differencing:
n_ocsb = pmd.arima.OCSBTest(m=12).estimate_seasonal_differencing_term(df["Temp"])


# CH test that returns the recommended order of seasonal differencing:
n_ch = pmd.arima.CHTest(m=12).estimate_seasonal_differencing_term(df["Temp"])


# seasonal differencing recommendation:
print("time series before first differencing -")
n_seasdiffs = {"recommended order of seasonal differencing":"", "OCSB recommendation":n_ocsb, "nCH recommendation":n_ch}
[print(key, ":", value) for key,value in n_seasdiffs.items()]
```

2.12a When we investigate the original data, we observe another conflict, this time about seasonal differencing:

- The OCSB does not identify a need for seasonal differencing, similar to the ACF for first differencing.
- The CH test does recommend 1 order of seasonal differencing, similar to KPSS for first differencing.

2.12b When we run OCSB and CH on the first-differenced data we have generated in chapter 2.11, then OCSB and CH agree that first-differencing has obviated the need for any seasonal differencing.

```python
# time series after first differencing
# OCSB test that returns the recommended order of seasonal differencing:
n_ocsb = pmd.arima.OCSBTest(m=12).estimate_seasonal_differencing_term(df_diff1)


# CH test that returns the recommended order of seasonal differencing:
n_ch = pmd.arima.CHTest(m=12).estimate_seasonal_differencing_term(df_diff1)


# seasonal differencing recommendation:
print("time series after first differencing -")
n_seasdiffs = {"recommended order of seasonal differencing":"", "OCSB recommendation":n_ocsb, "nCH recommendation":n_ch}
[print(key, ":", value) for key,value in n_seasdiffs.items()]
```

Conversely, if OCSB or CH had suggested to difference, we would have created a seasonally differenced series by appending the .diff(12) method to the original series.

Syntax for differencing in pandas: If y is the variable that represents the series of undifferenced data, then:

- y.diff(1) for first-differencing
- y.diff(12) for seasonal differencing if the seasonality has a periodicity of 12 months. The recommended order of seasonal differencing would be multiplied by the periodicity of 12 to inform the pandas function .diff() about the number of lags it should use to jump from end of the seasonal period to the preceding end.
- y.diff(1).diff(12) or y.diff(12).diff(1) — for combining both first- and seasonal differencing in a one-liner. The sequence of first- and seasonal differencing is not relevant — the results would be the same.
- Rules for identifying ARIMA models (duke.edu):
- “Rule 12: If the series has a strong and consistent seasonal pattern, then you must use an order of seasonal differencing (otherwise the model assumes that the seasonal pattern will fade away over time).
- However, never use more than one order of seasonal differencing or
- more than 2 orders of total differencing (seasonal+nonseasonal).”



## 2.13 ADF and KPSS Conflicts — How Do We Deal With Them?

If the ADF and KPSS tests return conflicting results, how do we proceed: difference or don’t difference?

As a general rule:

- Neither the ADF test nor the KPSS test will confirm or disconfirm stationarity in isolation. Run both tests to decide if you should difference.
- If a least one of the tests claims to have found non-stationarity, you should difference. An unambiguous confirmation of duckiness (stationarity) requires that both tests confirm the quacking and the waddling.

A more specific explanation:

There are 4 possible combinations of KPSS and ADF test results

- If KPSS and ADF agree that the series is stationary (KPSS with high p-value, ADF with low p-value): Consider it stationary. No need to difference it.
- ADF finds a unit root; but KPSS finds that the series is stationary around a deterministic trend (ADF and KPSS with high p-values). Then, the series is trend-stationary and it needs to be detrended. Difference it. Alternatively, a transformation may rid it of its trend.
- ADF does not find a unit root; but KPSS claims that it is non-stationary (ADF and KPSS with low p-values). Then, the series is difference-stationary. Difference it.
- If KPSS and ADF agree that the series is non-stationary (KPSS with low p-value; ADF with high p-value): Consider it non-stationary. Difference it.

Let’s translate this heuristic to Python:

For first-differencing, we take the higher of the orders which ADF and KPSS recommend.

EQUATION CODE

For seasonal differencing, we take the higher of the orders which OCSB and CH recommend. To avoid over-differencing, we should check if first-order differencing already arrives at stationarity.

```python
# seasonal differencing: combine the OCSB and CH test results
n_sdiff = max(n_ocsb, n_ch)




# ADF and KPSS tests after first differencing AND seasonal differencing:
# use the larger recommended order of first and seasonal differencing, respectively

if n_diff * n_sdiff != 0:                            # both first and seasonal differencing orders?
    df_diff2 = df["Temp"].diff(n_diff).diff(n_sdiff).dropna()
elif n_diff + n_sdiff != 0:                         # either first or seasonal differencing recommended, but not both?
    df_diff2 = df["Temp"].diff(max(n_diff,n_sdiff)).dropna()
else:                                               # neither first nor seasonal orders >0          
    df_diff2 = df["Temp"]



resADF = ADF_statt(df_diff2)
resKPSS = KPSS_statt(df_diff2)
test_values = zip(resADF.values(), resKPSS.values())
dict_tests = dict(zip(resADF.keys(), test_values))
df_tests = pd.DataFrame().from_dict(dict_tests).transpose()
df_tests.columns = ["ADF", "KPSS"]
df_tests
```

```python
# after first AND seasonal differencing: compare ADF, KPSS, OCSB and CH results
n_adf = pmd.arima.ndiffs(df_diff2, test="adf")
n_kpss = pmd.arima.ndiffs(df_diff2, test="kpss")
n_ocsb = pmd.arima.OCSBTest(m=12).estimate_seasonal_differencing_term(df_diff2)
n_ch = pmd.arima.OCSBTest(m=12).estimate_seasonal_differencing_term(df_diff2)

print("after 1 round of differencing - do we need more?")
n_diffs = {"recommended additional differencing":"", "ADF first":n_adf, "KPSS first":n_kpss, 
    "OCSB seasonal":n_ocsb, "CH seasonal":n_ch}
[print(key, ":", value) for key,value in n_diffs.items()]
```

After one round of differencing, the code runs all four tests again — ADF, KPSS, OCSB, and CH — to confirm if additional differencing might be required. In our example, all four tests agree that the 1 order of first-differencing we have applied in chapter 2.11 was enough to arrive at a stationary time series — which we can now hand over to a forecast model.