stationarity essentially insures that the probability laws that govern the behavior do not change over time

A process $\{y_t\}$ is strictly stationary if $p(y_{t_1} \cdots, y_{y_n}) = p(y_{t_{1-k}},\cdots, y_{t_{n-k}})$ for all time points $t_1,\cdots ,t_n$ and all choices of time lag $k$, IE the $y$'s are marginally identically distributed so $E(y_t)=E[y_{t-k}]$ and $var(y_t)=var[y_{t-k}]$ for all $t,k$ and the variance is constant over time

if a process is strictly stationary and has finite variance then the covariance function depends only on the time lag$\gamma_k=Cov(y_t,y_{t-k})$ and $\rho=corr(y_t,y_{t-k})$

\begin{align}
\gamma_0&=Var(y_t) \quad & \rho_0&=1\\
\gamma_k&=\gamma_{-k} \quad & \rho_k&=\rho_{-k}\\
|\gamma_k|&\leq \gamma_0 \quad &  |\rho_k|&\leq1
\end{align}



### weak stationarity

1. the mean function is constant over time
2. $\gamma_{t,t-k}=\gamma_{0,k}$



## Tests of Stationarity

### Parametric Tests
#### Dickey-Fuller Test

an $AR(1)$ model is $y_t=\rho y_{t-1}+u_t$ where $\rho$ is a coefficient and $u$ is an error term, a unit root is present if $\rho=1$

the regression model can be written as

$$\Delta y_t=(\rho-1)y_{t-1}+u_t=\delta y_{t-1}+u_t$$ where $\Delta$ is the difference operator

1. test for unit root
$$\Delta y_t=\delta y_{t-1}+u_t$$

2. test for unit root with drift
$$\Delta y_t=a_0+\delta y_{t-1}+u_t$$

3. test for unit root with drift and deterministic time trend
$$\Delta y_t=a_0+a_1t+\delta y_{t-1}+u_t$$

the intuition if $y$ is stationary then it tends to return to the mean.

### Augmented Dickey-Fuller
same testing procedure as DF but model is

$$\Delta y_y=\alpha + \beta t +\gamma y_{t-1}+\\delta_1\Delta y_{t-1} + \cdots + \delta_{p-1}\Delta y_{t-p+1} +\epsilon_t$$

where $\alpha$ is a constant $\beta$ is the coefficient on a time trend and $p$ is the lag order. if $\alpha=\beta=0$ its a random walk if $\beta=0$ its a random walk with drift
the augmented test includes more lags and allows higher order autoregressive processes so lag p has to be determined when applying the test, you could test down from higher orders and look at AIC


In [33]:
import pandas as pd
from statsmodels.tsa.stattools import adfuller
aapl = pd.read_csv('apple.csv', index_col=0)['bidprice']
aapl_diff = aapl.diff().dropna()
df = adfuller(aapl_diff)
df

(-12.198594201379608,
 1.2380729620898548e-22,
 11,
 1531,
 {'1%': -3.4346284441307184,
  '5%': -2.863429668579316,
  '10%': -2.5677760318409732},
 5455.747461327768)

## Kwiatkowski-Phillips-Schmidt-Shin test for stationarity. (KSPSS)

test breaks up series into 3 parts a deterministic trend $\beta t$ a random walk $rt$ and a stationary error $\epsilon t$ with equation

$$x_t=r_t+\beta t + \epsilon_t$$ where $r_t=r_{t-1}+u_t$ where $u\sim(o,\sigma^2)$

the null hypothesis is $\sigma^2=0$ and alt is $\sigma^2>0$

note in this test the null assumes stationarity around a mean or linear trend while the alternative is the presence of a unit root

In [43]:
from statsmodels.tsa.stattools import kpss
kps2 = kpss(aapl_diff, regression='c', nlags='auto' )
kps2_p = kps2[1]



## Fractionally Differentiated Features

let $B$ be the backshift operator applied to matrix of real valued features $\{X\}$ where $B^kX_t=X_{t-1}$ for any $k\geq 0$  EG $(1-B)^2=1-2B+B^2$ where $B^2X_t==X_{t-2}$ so that $(1-B)^2=X_t-2X_{t-1}+X_{t-2}

note that $(x+y)^n=\sum_{k=0}^n {n \choose{k}}x^ky^{n-k}$ for a real number $d$ $(2+x)^d=\sum_{k=0}^\inf{d \choose{k}}x^k$ the binomial series

in the fractional model $d$ can be a real number with the following binomial expansion

\begin{align}
(1-b)^D=\sum_{K=O}^\inf {d \choose{k}}(-B)^k &=\sum_{k=0}^\inf\frac{\prod_{i=1}^{k-1}(d-i)}{k!}(-B)^k\\
&=1-db+\frac{d(d-1)}{2!}B^2-\frac{d(d-1)(d-2)}{3!}B^3+\cdots \\
\end{align}

the arithmatic series consists

$$X_t=\sum_{k=0}^\inf \omega_k X_{t-k}$$ where

$$\omega = \left\{1,-d, \frac{d(d-1)}{2!},-\frac{d(d-1)(d-2)}{3!}, \cdots (-1)^k\prod_{i=0}^{k-1}\frac{d-i}{k!}\right \}$$

when $d$ is a positive integer memoory beyond $k>d$ is gone

weights are generated iteratively $\omega_k-\omega_{k-1}\frac{d-k+1}{k}$

In [147]:
import numpy as np
from statsmodels.tsa.stattools import adfuller
import pandas as pd
import matplotlib
%matplotlib inline
aapl = pd.read_csv('apple.csv', index_col=0, usecols=['time','bidprice'])
aaplcumsum = aapl.cumsum()


def getWeights_FFD(d,thres):
    w,k=[1.],1
    while True:
        w_=-w[-1]/k*(d-k+1)
        if abs(w_)<thres:
            break
        w.append(w_)
        k+=1
    return np.array(w[::-1]).reshape(-1,1)

def frac_diff_FFD(series, d, thres=1e-5):
    w =getWeights_FFD(d,thres)
    width,df = len(w)-1,{}
    for name in series.columns:
        seriesF,df_=series[[name]].fillna(method='ffill').dropna(), pd.Series()
        for iloc1 in range(width, seriesF.shape[0]):
            loc0, loc1 = seriesF.index[iloc1-width], seriesF.index[iloc1]
            if not np.isfinite(series.loc[loc1,name]): #exclude na
                continue
            df_[loc1]=np.dot(w.T,seriesF.loc[loc0:loc1])[0,0]
        df[name]=df_.copy(deep=True)
    df = pd.concat(df,axis=1)
    return df

def min_d_that_passes_ADF(series, thres=0.05, method='adfuller'):
        df1 = np.log(series)
        if method is 'adfuller':
            out = pd.DataFrame(columns=['adfStat','pVal','lags','nObs','95% CI'])
            for d in np.linspace(0,2,22):
                df2 = frac_diff_FFD(series,d,thres=0.05)
                adf = adfuller(df2, maxlag=1, regression='c', autolag=None)
                out.loc[d]=list(adf[:4]) + [adf[4]['5%']]
                if adf[1]<=thres:
                    return df2, out, d
        elif method == 'kpss':
            out = pd.DataFrame(columns=['kpsStat','pVal'])
            storage_ffdf = [] #to store ffd series since test shows when we reject null we want series right before that
            storage_d = []
            for d in np.linspace(0,2,22):
                df2 = frac_diff_FFD(series,d,thres=0.05)
                kps = kpss(df2, regression='c')
                out.loc[d]=list(kps[:2])
                if kps[1]>=thres:
                    return storage_ffdf[-1], out, storage_d[-1]
                else:
                    storage_ffdf.append(df2)
                    storage_d.append(d)
        return 'no values satisfied adf'


ffd, out, d = min_d_that_passes_ADF(aapl, method='kpss')

#from this we can pass the adf test with a fractional differnce of 0.3




In [148]:
out


Unnamed: 0,kpsStat,pVal
0.0,5.763688,0.01
0.095238,5.757774,0.01
0.190476,5.753088,0.01
0.285714,5.747455,0.01
0.380952,5.745171,0.01
0.47619,5.741053,0.01
0.571429,5.73298,0.01
0.666667,5.736031,0.01
0.761905,5.712809,0.01
0.857143,5.622101,0.01


In [149]:
d


1.9047619047619047