# Stationarity in Time Series Data

## Stationarity
In certain situations, you may need to make stationary your data, though there are no hard and fast rules.  You can achieve stationarity when you remove trend and seasonality such that you have constant mean and variance. In particular data domains and situations, your time series should be stationary before applying any analysis. Be aware that "weakly" stationary can also be acceptable. You can always work with the original data and compare results against the data that has been made stationary. 

Let's check for stationarity of our transformed datasets.

In [1]:
import pandas as pd
from pandas import read_csv
from matplotlib import pyplot

## Augmented Dickey-Fuller Test

To model a time series, it can be stationary or weakly stationary. The distribution of the data does not change over time.  The series has zero trend, the variance is constant, and the autocorrelation is constant. The augmented Dickey-Fuller Test is a statistical test for non-stationarity. The null hypothesis is that the time series is non-stationary due to trend.  

In [2]:
from statsmodels.tsa.stattools import adfuller

The more negative the test-statistic, then it is more likely that the data is stationary.  For the p-value, if it is small such that it is below 0.05, then we reject the null hypothesis. This means we assume that the time series must be stationary. For the critical values, if we want a p-value of 0.05 or below, our test statistic needs to be below the corresponding critical value.

In [3]:
# Example 1:  Vacation data, first difference

vacation = pd.read_csv("~/Desktop/section_3/vacation_firstdiff.csv", index_col='Month', parse_dates=True)
vacation.head()

Unnamed: 0_level_0,first_diff
Month,Unnamed: 1_level_1
2004-04-01,-7.0
2004-05-01,10.0
2004-06-01,9.0
2004-07-01,-2.0
2004-08-01,-17.0


In [4]:
# Run test
vacation_result = adfuller(vacation['first_diff'])

# Print test statistic
print(vacation_result[0])

# Print p-value
print(vacation_result[1])

# Print critical values (critical test statistics)
# Critical values for the test statistic at the 1 %, 5 %, and 10 % levels.
print(vacation_result[4])

-3.640502950097702
0.005032272908021572
{'1%': -3.4687256239864017, '5%': -2.8783961376954363, '10%': -2.57575634100705}


The p-value is 0.005, which is less than  the 0.05 threshold. The absolute value of test statistic exceeds the absolute value of -2.878. We can reject the null hypothesis that our time series is non-stationary. Therefore, the series must be stationary. 

In [5]:
# Example 2:  Furniture data, percentage change

furniture = pd.read_csv("~/Desktop/section_3/furn_pctchange.csv", index_col='Month', parse_dates=True)
furniture.head()

Unnamed: 0_level_0,furniture_pct_change
Month,Unnamed: 1_level_1
1992-02-01,0.0198
1992-03-01,0.069088
1992-04-01,-0.002419
1992-05-01,0.033839
1992-06-01,0.022829


In [6]:
# Run test
furniture_result = adfuller(furniture['furniture_pct_change'])

# Print test statistic
print(furniture_result[0])

# Print p-value
print(furniture_result[1])

# Print critical values
print(furniture_result[4])

-3.264232903927293
0.016554954891937087
{'1%': -3.4514843502727306, '5%': -2.8708485956333556, '10%': -2.571729625657462}


The p-value is 0.0165, which is less than the 0.05 threshold.  We can reject the null hypothesis of non-stationarity. 

In [7]:
# Example 3:  Bank of America stock price data, percentage change

bac = pd.read_csv("~/Desktop/section_3/bac_return.csv", index_col='Date', parse_dates=True)
bac.head()

Unnamed: 0_level_0,return
Date,Unnamed: 1_level_1
1990-01-14,-0.05
1990-01-21,0.035088
1990-01-28,-0.028249
1990-02-04,-0.002907
1990-02-11,-0.017493


In [8]:
# Run test
bac_result = adfuller(bac['return'])

# Print test statistic
print(bac_result[0])

# Print p-value
print(bac_result[1])

# Print critical values
print(bac_result[4])

-9.48447333786571
3.795233165450951e-16
{'1%': -3.434634049963598, '5%': -2.863432142744973, '10%': -2.5677773493449725}


The result of the Augmented Dickey-Fuller Test shows that the p-value is below the 5% threshold and the test statistic's absolute value exceeds the absolute value of -2.861.  We can reject the null hypothesis of non-stationarity. Therefore, the series must be stationary.

In [9]:
# Example 4:  J.P. Morgan stock price data, percentage change

jpm = pd.read_csv("~/Desktop/section_3/jpm_return.csv", index_col='Date', parse_dates=True)
jpm.head()

Unnamed: 0_level_0,return
Date,Unnamed: 1_level_1
1990-01-03,0.033333
1990-01-04,0.004032
1990-01-05,0.004017
1990-01-08,0.0
1990-01-09,-0.032


In [10]:
# Run test
jpm_result = adfuller(jpm['return'])

# Print test statistic
print(jpm_result[0])

# Print p-value
print(jpm_result[1])

# Print critical values
print(jpm_result[4])

-15.350414128259912
3.72669397900783e-28
{'1%': -3.43122570930995, '5%': -2.8619269969903733, '10%': -2.5669759941147903}


Again, we can reject the null hypothesis. The series is stationary.

In [11]:
# Example 5:  Average Temperature of St. Louis data, difference from the mean (anomaly)

temp = pd.read_csv("~/Desktop/section_3/temp_diffmean.csv", index_col='Date', parse_dates=True)
temp.head()

Unnamed: 0_level_0,diff_mean
Date,Unnamed: 1_level_1
1938-04-01,0.050412
1938-05-01,8.250412
1938-06-01,17.050412
1938-07-01,23.950412
1938-08-01,24.850412


In [12]:
# Run test
temp_result = adfuller(temp['diff_mean'])

# Print test statistic
print(temp_result[0])

# Print p-value
print(temp_result[1])

# Print critical values
print(temp_result[4])

-4.710506429028064
8.041048161620659e-05
{'1%': -3.437274090836024, '5%': -2.8645968274636933, '10%': -2.5683976306326097}


Again, we can reject the null hypothesis. The series is stationary.

All five series have stationarity from first differencing, taking the percentage change (calculating the return), or taking difference from the mean (anomaly). The AD-Fuller Test verifies this for us.

In [None]:
# End