In [5]:
import numpy as np
import pandas as pd
import plotly.express as px
import matplotlib.pyplot as plt

df = pd.read_csv('^NSEI.csv',index_col=0,parse_dates=True)
df.head()

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2007-09-18,4494.100098,4551.799805,4481.549805,4546.200195,4546.200195,0.0
2007-09-19,4550.25,4739.0,4550.25,4732.350098,4732.350098,0.0
2007-09-20,4734.850098,4760.850098,4721.149902,4747.549805,4747.549805,0.0
2007-09-21,4752.950195,4855.700195,4733.700195,4837.549805,4837.549805,0.0
2007-09-24,4837.149902,4941.149902,4837.149902,4932.200195,4932.200195,0.0


In [10]:
from statsmodels.tsa.stattools import adfuller

In [18]:
df.dropna(inplace=True)

## Stationarity Test - ADF

**Null Hypothesis:** Adj Close values are stationary
**Alternate Hypothesis**:Adj Close values are not stationary

In [19]:
adf = adfuller(df['Adj Close'], maxlag=1)

In [23]:
adf

(0.7175798372421207,
 0.9901822541056554,
 0,
 3351,
 {'1%': -3.4323029442926765,
  '5%': -2.862402896767871,
  '10%': -2.567229336991118},
 40081.36097684344)

In [24]:
print('t-value',adf[0])

t-value 0.7175798372421207


Generally, the desired result is to have a higher t value. Higher t-value means the test-statistics also called t-score, indicate that a large difference exists between the two sample sets. The smaller the t-value, the more similarity exists between the two sample sets. A large t-score indicates that the groups are different. The larger the absolute value of the t-value, the smaller the p-value, and the greater the evidence against the null hypothesis.

In [25]:
print('p-value',adf[1])

p-value 0.9901822541056554


A p value is used in hypothesis testing to help you support or reject the null hypothesis. **The p value is the evidence against a null hypothesis. The smaller the p-value, the stronger the evidence that you should reject the null hypothesis.**

P values are expressed as decimals although it may be easier to understand what they are if you convert them to a percentage. For example, a p value of 0.0254 is 2.54%. This means there is a 2.54% chance your results could be random (i.e. happened by chance). That’s pretty tiny. On the other hand, a large p-value of .9(90%) means your results have a 90% probability of being completely random and not due to anything in your experiment. Therefore, the smaller the p-value, the more important (“significant“) your results.

In [26]:
print('The number of lags used',adf[2])

The number of lags used 0


In [27]:
print('Number of observation',adf[3])

Number of observation 3351


In [28]:
print('Critical values',adf[4])

Critical values {'1%': -3.4323029442926765, '5%': -2.862402896767871, '10%': -2.567229336991118}


**If the test statistic is lower than the critical value, accept the hypothesis or else reject the hypothesis.**

## Understanding t-Tests and Critical Values

![tvalue.png](tvalue.png)

In [29]:
print('Best Information Criterion if autolag is set to none',adf[5])

Best Information Criterion if autolag is set to none 40081.36097684344
