# Stationary timeseries

## Setup

In [None]:
import numpy as np
import pandas as pd
import warnings
import matplotlib.pyplot as plt
from matplotlib import rcParams

%matplotlib inline
plt.style.use('fivethirtyeight')
rcParams['axes.labelsize'] = 14
rcParams['xtick.labelsize'] = 12
rcParams['ytick.labelsize'] = 12
rcParams['text.color'] = 'G'
rcParams['figure.figsize'] = 16,8

warnings.filterwarnings('ignore')

In [None]:
def plot(ts, stat=False):
    plt.plot(x, ts)
    if stat:
        plt.plot(x,pd.Series(ts).rolling(25).mean(), '--o', linewidth=2)
        plt.plot(x,pd.Series(ts).rolling(25).std(), '--r', linewidth=3)
        plt.plot(x,-pd.Series(ts).rolling(25).std(), '--r', linewidth=3)
    plt.show()

## Creating an artifical timeseries
We create different components and add them up:

- a trend: logarithmic
- noise: gaussian distributed
- seasonality: sinoid

In [None]:
x = np.arange(400)
ts = np.random.normal(loc=0, scale=.5, size=x.shape)
trend = np.log(x/100)
trend[0] = 0
seasonality = np.sin(x/10)

# Types of stationary

### strict stationary
Mean, variance & covariance are independent from time (not a function of _t_). Most outlier detections mathematically require this!

In [None]:
plot(ts)

### trend stationary
A timeseries with a **time-dependent mean** is considered to be trend stationary, that means the absence of a unit root. Therefore, KPSS tests for trend stationary.

In [None]:
plot(ts + trend)

### difference stationary
If a timeserience can be made strict stationary by differencing, it is called difference stationary. ADF tests for this property.

In [None]:
plot(ts+seasonality)

# 1. Augmented-Dickey-Fuller Test (ADF)

## Theory
If the test statistic is less than the critical value, we can reject the null hypothesis 
=> 
The series is difference stationary! Use **differencing to make series stationary**.

**Null Hypothesis**: The series has a unit root (value of a =1)

**Alternate Hypothesis:** The series has no unit root.

Further Resources:
- [Statistical Background](https://en.wikipedia.org/wiki/Augmented_Dickey%E2%80%93Fuller_test)
- [Python usage](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.adfuller.html)
- [R usage](https://www.rdocumentation.org/packages/aTSA/versions/3.1.2/topics/adf.test)

## Code

In [None]:
from statsmodels.tsa.stattools import adfuller

adfuller((ts+trend), autolag='AIC')

# 2. Kwiatkowski-Phillips-Schmidt-Shin Test (KPSS)

## Theory
The null and alternate hypothesis for the KPSS test are opposite that of the ADF test, which often creates confusion.

If the test statistic is less than the critical value, we fail to reject the null hypothesis (series is stationary).
=> 
The series is difference stationary! Use **differencing to make series stationary**.

**Null Hypothesis**: The process is trend stationary.

**Alternate Hypothesis:** The series has a unit root (series is not stationary).

Further Resources:
- [Statistical Background](https://www.statisticshowto.com/kpss-test/)
- [Python usage](https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.kpss.html)
- [R usage](https://www.rdocumentation.org/packages/tseries/versions/0.10-47/topics/kpss.test)

## Code

In [None]:
from statsmodels.tsa.stattools import kpss

kpss(ts+seasonality)

## All stacked effects 

In [None]:
plot(ts + trend + seasonality)

### ADF

In [None]:
adfuller(ts+trend+seasonality)

### KPSS

In [None]:
kpss(ts+trend+seasonality)

# 3. Conclusions

_ADF_ means, that the ADF-Test concludes, that the series is stationary.

_KPSS_ means, that the KPSS-Test concludes, that the series is stationary.

_$\neg$ADF_ means, that the ADF-Test concludes, that the series is **not** stationary

_ADF $\land$ KPSS_ means, that ADF **and** KPSS conclude stationary.

- **$\neg$ADF $\land$ $\neg$KPSS** $\Rightarrow$ Series not stationary
- **ADF $\land$ KPSS** $\Rightarrow$ Series is stationary
- **ADF $\land$ $\neg$KPSS** $\Rightarrow$ Series is **difference** stationary, use differencing to render timeseries (strict) stationary
- **$\neg$ADF $\land$ KPSS** $\Rightarrow$ Series is **trend** stationary, remove trend to render timeseries (strict) stationary

# 4. Methods for rendering a timeseries stationary

## 4.1 Transforming

Applying functions and/or combinations of functions to the timeseries itself, e.g.
- log(ts)
- exp(ts)
- 1/ts
- sqrt(ts)
- etc.

In [None]:
plot(np.exp(ts + trend)/(ts+trend))

## 4.2 Differencing

#### ts'(t) = ts(t) - ts(t-1)

Where ts(t) is the value of ts at timestamp t.

In [None]:
plt.plot(x[:-1],(ts+seasonality)[1:] - (ts+seasonality)[:-1])
plt.show()

## 4.3 Seasonal Differencing

#### ts'(t) = ts(t) - ts(t-n)

We add a lag _n_ to the shift which shall be at the same value we assume the seasonality to be.

In [None]:
lag = int(round(np.pi * 10 * 2, 0))
plt.plot(x,np.pad((ts+seasonality)[lag:] - (ts+seasonality)[:-lag], lag)[:-lag])
plt.show()

# 5. a real example


In [None]:
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

#reading the dataset
train = pd.read_csv('AirPassengers.csv')

#preprocessing
train.timestamp = pd.to_datetime(train.Month , format = '%Y-%m')
train.index = train.timestamp
train.drop('Month',axis = 1, inplace = True)

#looking at the first few rows
train.tail()

In [None]:
train['#Passengers'].plot(figsize=(12,8))

## Simple differencing

In [None]:
train['#Passengers_diff'] = train['#Passengers'].diff()
train['#Passengers_diff'].dropna().plot(figsize=(12,8))

## log-transform and differencing

In [None]:
train['#Passengers_log'] = np.log(train['#Passengers'])
train['#Passengers_log_diff'] = train['#Passengers_log'].diff()
train['#Passengers_log_diff'].dropna().plot(figsize=(12,8))