## Check if stationary

Determine how often the dataset needs to be differentiated before it can be considered stationary.

based on: https://github.com/MKB-Datalab/time-series-analysis-with-SARIMAX-and-Prophet/blob/master/notebooks/01-Intro_time_series_tutorial.ipynb


### CONCLUSION

All datasets only need to be differentiated once to be stationary

In [2]:
from pathlib import Path
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

usable_data_folder = Path(r"C:\Users\Flin\OneDrive - TU Eindhoven\Flin\Flin\01 - Uni\00_Internship\Nokia\00_Programming\forecasting\datasets\train")

In [4]:
from statsmodels.tsa.stattools import adfuller

def adf_test(timeseries):
    print ('Results of Dickey-Fuller Test:')
    dftest = adfuller(timeseries, autolag='AIC')
    dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used'])
    for key,value in dftest[4].items():
        dfoutput['Critical Value (%s)'%key] = value
    print (dfoutput)

from statsmodels.tsa.stattools import kpss
def kpss_test(timeseries):
    print ('Results of KPSS Test:')
    kpsstest = kpss(timeseries, regression='c', nlags="auto")
    kpss_output = pd.Series(kpsstest[0:3], index=['Test Statistic','p-value','Lags Used'])
    for key,value in kpsstest[3].items():
        kpss_output['Critical Value (%s)'%key] = value
    print (kpss_output)

def obtain_adf_kpss_results(timeseries, max_d):
    """ Build dataframe with ADF statistics and p-value for time series after applying difference on time series
    
    Args:
        time_series (df): Dataframe of univariate time series  
        max_d (int): Max value of how many times apply difference
        
    Returns:
        Dataframe showing values of ADF statistics and p when applying ADF test after applying d times 
        differencing on a time-series.
    
    """
    
    results=[]

    for idx in range(max_d):
        adf_result = adfuller(timeseries, autolag='AIC')
        kpss_result = kpss(timeseries, regression='c', nlags="auto")
        timeseries = timeseries.diff().dropna()
        if adf_result[1] <=0.05:
            adf_stationary = True
        else:
            adf_stationary = False
        if kpss_result[1] <=0.05:
            kpss_stationary = False
        else:
            kpss_stationary = True
            
        stationary = adf_stationary & kpss_stationary
            
        results.append((idx,adf_result[1], kpss_result[1],adf_stationary,kpss_stationary, stationary))
    
    # Construct DataFrame 
    results_df = pd.DataFrame(results, columns=['d','adf_stats','p-value', 'is_adf_stationary','is_kpss_stationary','is_stationary' ])
    
    return results_df

def lazy_obtain_adf_kpss_results(path, max_d=5):
    df = pd.read_csv(path)
    timeseries = df[["y"]]
    return obtain_adf_kpss_results(timeseries, max_d)

## RESIDENTIAL WITH PV

In [6]:
fn = r"residential_with_pv\h=2_residential_2018_WITH_PV_SFH13_2018.csv" # r"industrial\h=2_industrial_2016_LG_1.csv"
path = usable_data_folder / fn

dfout = lazy_obtain_adf_kpss_results(path)
dfout


look-up table. The actual p-value is smaller than the p-value returned.

look-up table. The actual p-value is greater than the p-value returned.

look-up table. The actual p-value is greater than the p-value returned.

look-up table. The actual p-value is greater than the p-value returned.

look-up table. The actual p-value is greater than the p-value returned.



Unnamed: 0,d,adf_stats,p-value,is_adf_stationary,is_kpss_stationary,is_stationary
0,0,2.3126219999999998e-30,0.01,True,False,False
1,1,0.0,0.1,True,True,True
2,2,0.0,0.1,True,True,True
3,3,0.0,0.1,True,True,True
4,4,0.0,0.1,True,True,True


## RESIDENTIAL WITHOUT PV

In [7]:
fn = r"residential_no_pv\h=2_residential_2018_NO_PV_SFH8_2018.csv" # r"industrial\h=2_industrial_2016_LG_1.csv"
path = usable_data_folder / fn

dfout = lazy_obtain_adf_kpss_results(path)
dfout


look-up table. The actual p-value is greater than the p-value returned.

look-up table. The actual p-value is greater than the p-value returned.

look-up table. The actual p-value is greater than the p-value returned.

look-up table. The actual p-value is greater than the p-value returned.



Unnamed: 0,d,adf_stats,p-value,is_adf_stationary,is_kpss_stationary,is_stationary
0,0,0.0,0.045651,True,False,False
1,1,0.0,0.1,True,True,True
2,2,0.0,0.1,True,True,True
3,3,0.0,0.1,True,True,True
4,4,0.0,0.1,True,True,True


## INDUSTRIAL

In [8]:
fn = r"industrial\h=2_industrial_2016_LG_1.csv"
path = usable_data_folder / fn

dfout = lazy_obtain_adf_kpss_results(path)
dfout


look-up table. The actual p-value is smaller than the p-value returned.

look-up table. The actual p-value is greater than the p-value returned.

look-up table. The actual p-value is greater than the p-value returned.

look-up table. The actual p-value is greater than the p-value returned.

look-up table. The actual p-value is greater than the p-value returned.



Unnamed: 0,d,adf_stats,p-value,is_adf_stationary,is_kpss_stationary,is_stationary
0,0,0.0,0.01,True,False,False
1,1,0.0,0.1,True,True,True
2,2,0.0,0.1,True,True,True
3,3,0.0,0.1,True,True,True
4,4,0.0,0.1,True,True,True
