In [8]:
import yfinance as yf
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import statsmodels


import warnings
warnings.filterwarnings('ignore')

ticker = "^NSEI"
data = yf.download(ticker, start="2018-01-01", end="2024-01-01")

[*********************100%%**********************]  1 of 1 completed


In [9]:
#ADF Test
# Threshold p-value is assumed to be 0.05 (sometimes also assumed to be 0.01)

from statsmodels.tsa.stattools import adfuller

adftest = adfuller(data['Close'],autolag='AIC')
print("ADF Statistic:", adftest[0])
print("p-value:", adftest[1])
print("Number of lags:", adftest[2])
print("Number of observations:", adftest[3])
print("Critical Values:", adftest[4])

ADF Statistic: 0.13285143289767753
p-value: 0.9682337504483467
Number of lags: 7
Number of observations: 1469
Critical Values: {'1%': -3.4348093353507494, '5%': -2.863509503599295, '10%': -2.5678185447142}


Note since p-value is greater than the assumed threshold, the null hypothesis cannot be rejected and therefore the time series is not stationary.
If p-value is less than the the threshold then we can reject the null hypothesis and conclude that the series is stationary. Lesser the ADF statistic, more the reason to reject the hypothesis (and is compared with critical values like at 1% and 5%).

In [10]:
#KPSS Test

from statsmodels.tsa.stattools import kpss

kpsstest = kpss(data['Close'], regression='c', nlags="auto")
print("KPSS Statistic:", kpsstest[0])
print("p-value:", kpsstest[1])
print("Number of lags:", kpsstest[2])
print("Critical Values:", kpsstest[3])

KPSS Statistic: 5.1955078334484135
p-value: 0.01
Number of lags: 25
Critical Values: {'10%': 0.347, '5%': 0.463, '2.5%': 0.574, '1%': 0.739}


Note since p-value is less than the assumed threshold, the null hypothesis can be rejected and therefore the time series is not stationary. Furthermore, in order to reject the null hypothesis, the test statistic should be greater than the provided critical values.
Here in KPSS Test, if p-value is greater than the the threshold then we cannot reject the null hypothesis and the series is stationary.

In [11]:
from statsmodels.stats.diagnostic import acorr_ljungbox

lb_test = acorr_ljungbox(data['Close'], lags=7)
print(lb_test)

        lb_stat  lb_pvalue
1   1471.227105        0.0
2   2934.532379        0.0
3   4390.078653        0.0
4   5838.280229        0.0
5   7279.103192        0.0
6   8712.206832        0.0
7  10138.104821        0.0


Since p-value is less than 0.05, we can reject null hypothesis and hence say that the data are not independently distributed, i.e. they exhibit serial correlation.

In [12]:
#Granger Casuality Test

from statsmodels.tsa.stattools import grangercausalitytests

grangercausalitytests(data[['Close','Open']], maxlag=[3])


Granger Causality
number of lags (no zero) 3
ssr based F test:         F=5.3162  , p=0.0012  , df_denom=1467, df_num=3
ssr based chi2 test:   chi2=16.0246 , p=0.0011  , df=3
likelihood ratio test: chi2=15.9381 , p=0.0012  , df=3
parameter F test:         F=5.3162  , p=0.0012  , df_denom=1467, df_num=3


{3: ({'ssr_ftest': (5.316161542771404, 0.00120698296968318, 1467.0, 3),
   'ssr_chi2test': (16.024585100296626, 0.0011208989187401236, 3),
   'lrtest': (15.938105702960456, 0.0011676028882839993, 3),
   'params_ftest': (5.31616154277193, 0.0012069829696823308, 1467.0, 3.0)},
  [<statsmodels.regression.linear_model.RegressionResultsWrapper at 0x18c180802c0>,
   <statsmodels.regression.linear_model.RegressionResultsWrapper at 0x18c19d52e10>,
   array([[0., 0., 0., 1., 0., 0., 0.],
          [0., 0., 0., 0., 1., 0., 0.],
          [0., 0., 0., 0., 0., 1., 0.]])])}

The F test statistic turns out to be 5.3162 and corresponding p-value is 0.0012. Since p-value < alpha (=0.05) we can reject null hypothesis of the test and conclude that knowing the time series data['Close'] is useful for predicting the value of time series data['Open'] at a later time period and vice versa.

In [13]:
# Durbin Watson Test

from statsmodels.stats.stattools import durbin_watson

dw_statistic = durbin_watson(data['Close'])
dw_statistic

9.76394909527016e-05

In accordance with logic, since durbin watson statistic is very very less it is clear that the Close and Open prices are highly positively autocorrelated time series. This statistic lies between 0 and 4, 2 representing no autocorrelation.