Augmented Dickey-Fuller (ADF) Test

 - Augmented Dickey Fuller test (ADF Test) is a common statistical test used to test whether a given Time series is stationary or not. It is one of the most commonly used statistical test when it comes to analyzing the stationary of a series.

In [156]:
import statsmodels.tsa.stattools as ts
import statsmodels.api as sm
from metatrader.utils.factory import MetatraderFactory
metatrader = MetatraderFactory.get_metatrader()
import pandas as pd
from datetime import datetime
import plotly.express as px
import plotly.graph_objects as go
import numpy as np
from numpy import cumsum, log, polyfit, sqrt, std, subtract 
from numpy.random import randn
import matplotlib.pyplot as plt 
import matplotlib.dates as mdates
import pprint

mt5 = metatrader.connect()

Connected to Metatrader successfully.


In [157]:
symbol_1 = 'USDIDX'
timeframe = mt5.TIMEFRAME_D1
df1 = pd.DataFrame(mt5.copy_rates_from_pos(symbol_1, timeframe, 0,1000))
df1['time'] = pd.to_datetime(df1['time'], unit='s')

#euro= euro[0:37000]

symbol_2 = 'EURUSD'
timeframe = mt5.TIMEFRAME_D1
df2 = pd.DataFrame(mt5.copy_rates_from_pos(symbol_2, timeframe, 0,1000))
df2['time'] = pd.to_datetime(df2['time'], unit='s')

#pound = pound[0:37000]


display(df1, df2)

Unnamed: 0,time,open,high,low,close,tick_volume,spread,real_volume
0,2019-12-18,96.775,97.041,96.720,96.925,1163,7,0
1,2019-12-19,96.930,97.056,96.806,96.960,1322,7,0
2,2019-12-20,96.955,97.346,96.955,97.260,1222,7,0
3,2019-12-23,97.275,97.401,97.166,97.240,1021,7,0
4,2019-12-24,97.235,97.386,97.196,97.241,615,7,0
...,...,...,...,...,...,...,...,...
995,2023-10-23,105.994,106.143,105.312,105.389,14532,9,0
996,2023-10-24,105.394,106.133,105.153,106.064,15331,9,0
997,2023-10-25,106.044,106.387,105.955,106.369,14296,9,0
998,2023-10-26,106.379,106.755,106.349,106.434,17094,9,0


Unnamed: 0,time,open,high,low,close,tick_volume,spread,real_volume
0,2020-04-24,1.07820,1.08197,1.07269,1.07981,95430,1,0
1,2020-04-27,1.08208,1.08602,1.08112,1.08291,94053,1,0
2,2020-04-28,1.08290,1.08883,1.08096,1.08326,77067,1,0
3,2020-04-29,1.08325,1.08855,1.08176,1.08751,74504,1,0
4,2020-04-30,1.08753,1.09724,1.08328,1.09560,84526,1,0
...,...,...,...,...,...,...,...,...
995,2023-10-23,1.05886,1.06776,1.05713,1.06697,43378,2,0
996,2023-10-24,1.06688,1.06941,1.05830,1.05947,44643,2,0
997,2023-10-25,1.05945,1.06065,1.05600,1.05662,42281,2,0
998,2023-10-26,1.05656,1.05689,1.05215,1.05639,47565,2,0


In [158]:
display(ts.adfuller(df1['close'], 1),
ts.adfuller(df2['close'], 1))

(-1.0175344025187563,
 0.7468154304244174,
 1,
 998,
 {'1%': -3.4369193380671, '5%': -2.864440383452517, '10%': -2.56831430323573},
 1297.1908736162538)

(-1.0895062409387968,
 0.7192385009832837,
 0,
 999,
 {'1%': -3.4369127451400474,
  '5%': -2.864437475834273,
  '10%': -2.568312754566378},
 -7745.940834645742)

The first value is the calculated test-statistic, while the second value is the p-value. The fourth is the number of data points in the sample. The fifth value, the dictionary, contains the critical values of the test-statistic at the 1, 5 and 10 percent values respectively.


<h2>
The goal of the Hurst Exponent is to provide us with a scalar value that will help us to identify (within the limits of statistical estimation) whether a series is mean reverting, random walking or trending.
</h2>

In [159]:
def hurst_exponent(time_series):
    """
    Calculate the Hurst exponent of a time series using the R/S method.

    Args:
    time_series (numpy.ndarray): The time series data.

    Returns:
    hurst (float): The estimated Hurst exponent.
    """
    lags = range(2, len(time_series) // 2)
    time_series = time_series if not isinstance(time_series, pd.Series) else time_series.to_list()

    tau = [np.std(np.subtract(time_series[lag:], time_series[:-lag])) for lag in lags]

    hurst = np.polyfit(np.log(lags), np.log(tau), 1)[0]

    return hurst



In [160]:
# Create a Gometric Brownian Motion, Mean-Reverting and Trending Series
gbm = log(cumsum(randn(100000))+1000)
mr = log(randn(100000)+1000)
tr = log(cumsum(randn(100000)+1)+1000)
# Output the Hurst Exponent for each of the above series # and the price of Amazon (the Adjusted Close price) for
# the ADF test given above in the article
#print("Hurst(GBM) - random walk: %s" % hurst_exponent(gbm))
#print("Hurst(MR) - mean reverting: %s" % hurst_exponent(mr))
#print("Hurst(TR) - trending: %s" % hurst_exponent(tr)) 


In [161]:
df = pd.DataFrame(index=df1.index)
df['time'] = df1['time']
df[f"{symbol_1}"] = df1["close"]
df[f"{symbol_2}"] = df2["close"]

df


Unnamed: 0,time,USDIDX,EURUSD
0,2019-12-18,96.925,1.07981
1,2019-12-19,96.960,1.08291
2,2019-12-20,97.260,1.08326
3,2019-12-23,97.240,1.08751
4,2019-12-24,97.241,1.09560
...,...,...,...
995,2023-10-23,105.389,1.06697
996,2023-10-24,106.064,1.05947
997,2023-10-25,106.369,1.05662
998,2023-10-26,106.434,1.05639


In [162]:
fig = px.line(data_frame=df, x='time', y=symbol_1)

fig.add_scatter(x=df['time'], y=df[f'{symbol_2}'])

fig.show()


The behavior of DatetimeProperties.to_pydatetime is deprecated, in a future version this will return a Series containing python datetime objects instead of an ndarray. To retain the old behavior, call `np.array` on the result


The behavior of DatetimeProperties.to_pydatetime is deprecated, in a future version this will return a Series containing python datetime objects instead of an ndarray. To retain the old behavior, call `np.array` on the result



In [163]:
px.scatter(df, x=symbol_1, y=symbol_2)

In [164]:

res = sm.OLS(df[f'{symbol_1}'].tolist(), df[f'{symbol_2}'].tolist()).fit()
beta_hr = res.params[0]

beta_hr

88.08936796318773

In [165]:
df['res'] = df[f'{symbol_1}'] - beta_hr * df[f'{symbol_2}']

df

Unnamed: 0,time,USDIDX,EURUSD,res
0,2019-12-18,96.925,1.07981,1.805220
1,2019-12-19,96.960,1.08291,1.567143
2,2019-12-20,97.260,1.08326,1.836311
3,2019-12-23,97.240,1.08751,1.441931
4,2019-12-24,97.241,1.09560,0.730288
...,...,...,...,...
995,2023-10-23,105.389,1.06697,11.400287
996,2023-10-24,106.064,1.05947,12.735957
997,2023-10-25,106.369,1.05662,13.292012
998,2023-10-26,106.434,1.05639,13.377273


In [166]:
px.line(df, x='time', y='res')


The behavior of DatetimeProperties.to_pydatetime is deprecated, in a future version this will return a Series containing python datetime objects instead of an ndarray. To retain the old behavior, call `np.array` on the result



In [167]:
cadf = ts.adfuller(df["res"])
pprint.pprint(cadf)

(-0.5190766048525833,
 0.8882397576064727,
 15,
 984,
 {'1%': -3.437013049776705,
  '10%': -2.5683363157264196,
  '5%': -2.864481711583566},
 1869.778430017299)
