In [1]:
import numpy as np 
import pandas as pd
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

/kaggle/input/tesla-stock-data-updated-till-28jun2021/TSLA.csv


## Dickey Fuller Test
Dicker fuller test use autoregressive model and optimize an information criterion across multiple different lag values. <br>
This test is a statistical test to check for stationarity in time series. This is a type of unit root test, through which we find if the time series is having any unit root.
<br>
### Unit Root
is a feature of time series that indicate any stochastic trend in the time series that drives it away from mean value. Presence of unit root makes the time series not stationary.
<br>
There are three main versions of Dickey-Fuller test with slightly different regression models:
1. **Dickey-Fuller Test**,assume no itercept or trend in the data.
2. **Dickey-Fuller with intercept**, adds an intercept term(constant value) to the model.
3. **Augmented Dickey-Fuller (ADF)**, extension of Dickey-Fuller that includes lagged differences of the dependent variable to account for autocorrelation

<div style="font-size: 1.5em;">
Stationarity is a very important step in ARIMA. The first step is to determine the number of differences required to make the series stationary because the model cannot forecast on non-stationary data.
</div>


$$
\LARGE
\Delta y_{t} = \mu + \delta y_{t-1} + \varepsilon_{t}
$$


<br> <br>
where:<br>
$\mu$: Constant<br>
$\delta$: Coefficient +<br>
$y_{t-1}$: Value in the time lag of 1 <br>
$\varepsilon$: Error component <br> 

The formula is like this: <br>

$$
\LARGE
\begin{array}{c}
t_{\hat{\delta}} = \frac{\hat{\delta}}{SE(\hat{\delta})}
\end{array}
$$
<br>



## Augmented Dickey Fuller Test
Extension of the normal Dickey-Fullter test for more complex model than AR(1). ADF used for larger sized set of time series models which can be more complicated.
<br>
This test under the following assumption: <br>
1. **Null Hypothesis $(H_{0})$:** There exists a unit root in the time series and it is non-stationary. **Unit root = 1** or **$\delta=0$**.
2. **Alternate Hypothesis $(H_{1})$:** There exist no unit root in the time series and it is stationary. **Unit root < 1** or **$\delta<0$**.

## Condition to reject H0 and accept H1
If the test statistic is less thanthe critical value or if the p-value is less than pre-specified significance level (ex:0.05), then the null hypothesis is rejected and the time series is considered stationary. <br>
If the test is greater than the critical value, the null hypothesis can't be rejected thus making the time series non-stationary. <br>
Critical value found from the Dicker Fuller table (similar to t-table used for t-test)

## ADF Test using Python

In [2]:
import statsmodels.api as sm
import matplotlib.pyplot as plt
%matplotlib inline
df = pd.read_csv('/kaggle/input/tesla-stock-data-updated-till-28jun2021/TSLA.csv')

In [3]:
df.head()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,2010-06-29,3.8,5.0,3.508,4.778,4.778,93831500
1,2010-06-30,5.158,6.084,4.66,4.766,4.766,85935500
2,2010-07-01,5.0,5.184,4.054,4.392,4.392,41094000
3,2010-07-02,4.6,4.62,3.742,3.84,3.84,25699000
4,2010-07-06,4.0,4.0,3.166,3.222,3.222,34334500


In [4]:
df.shape

(2956, 7)