<a href="https://colab.research.google.com/github/Rattapon-Insa/time-series/blob/main/ARIMA_on_NIFTY_50_Stock_Market_Data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
! mkdir ~/.kaggle

In [None]:
! cp kaggle.json ~/.kaggle/

In [None]:
! chmod 600 ~/.kaggle/kaggle.json

In [None]:
!kaggle datasets download -d rohanrao/nifty50-stock-market-data

In [None]:
! unzip nifty50-stock-market-data.zip

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
df = pd.read_csv('HCLTECH.csv')

In [None]:
df.head(10)

In [None]:
df['Date'] =  pd.to_datetime(df['Date'])
df.dropna()
df.set_index(df['Date'], inplace = True)
df = df[['Prev Close']]['2013-01-01':'2013-12-02']


## Visualizing data
To apply Arima model on time series data it is very important to check if my data is stationary or not. If the dataset is not stationary we must make it stationary. I will also discuss in details about the tests we perform for making our dataset stationary. 

Let us visualize first:

In [None]:
plt.figure(figsize= (6,4))
plt.plot(df)

In the case above dataset doesn’t seem to be stationary. These are the below thumb rules to follow to identify if dataset is stationary or not.

Mean Roll Over test:

CHECK_1: Verify if rolling mean is constant.

CHECK_2: Verify if rolling Standard Deviation is constant

CHECK_3: The data must not be inter related with time


In [None]:
def rolling_plot(df):
  rolling_mean = df.rolling(window = 12).mean()
  rolling_sd = df.rolling(window = 12).std()
  plt.figure(figsize= (10,10))
  plt.plot(df, color = 'red', label = 'Raw')
  plt.plot(rolling_mean, color = 'blue', label = 'mean')
  plt.plot(rolling_sd, color = 'black', label = 'Standard deviation')
  plt.legend()

In [None]:
rolling_plot(df)

## Augmented Dick Fuller Test:

This uses P-test and is simple. Just verify if the value of p for your dataset is less than equal to 0.5 or not . It works on hypothesis testing.

H0: It suggests the data is not stationary

H1: It suggests data is stationary.

If value of p≤0.05 then the dataset becomes stationary. In short just remember get the value of P low as much as you could.

In [None]:
from statsmodels.tsa.stattools import adfuller

def adfuller_test(df):
  data_test = df.iloc[:,0].values
  ad1 = adfuller(data_test,autolag = 'AIC')
  names = ['adf', 'pvalue', 'used lags', 'nobs', 'critical values']
  cnt = 0
  for x,y in zip(ad1,names):
    print(y, " is ", x)

In [None]:
adfuller_test(df)

P value confirm to be higher than 0.05. Rolling mean is obviously not stationary. Transfromation will be needed.

In [None]:
df1 = np.log(df)

In [None]:
df_diff = df - df.shift(1)
df_diff = df_diff.dropna()

In [None]:
df_diff2= df_diff - df_diff.shift(1)
df_diff2 = df_diff2.dropna()

In [None]:
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

plt.figure(figsize = (300,100))

#Original Data
fig, axes = plt.subplots(3,2, sharex= True)
axes[0,0].plot(df.values); axes[0,0].set_title('Original')
plot_acf(df.values,ax = axes[0,1])

#1st differencing
axes[1,0].plot(df_diff.values); axes[1,0].set_title('1st differencing')
plot_acf(df_diff.values,ax = axes[1,1])

#2nd differencing
axes[2,0].plot(df_diff2.values); axes[2,0].set_title('2nd differencing')
plot_acf(df_diff2.values,ax = axes[2,1])


plt.show()

In [None]:
plt.rcParams.update({'figure.figsize':(9,3), 'figure.dpi': 120})

fig, axes = plt.subplots(1,2, sharex= True)
axes[0].plot(df_diff.values); axes[0].set_title('1st differencing')
axes[1].set(xlim=(-1,30))
plot_acf(df_diff.values,ax = axes[1])

plt.show()

In [None]:
plt.rcParams.update({'figure.figsize':(9,3), 'figure.dpi': 120})

fig, axes = plt.subplots(1,2, sharex= True)
axes[0].plot(df_diff.values); axes[0].set_title('1st differencing')
axes[1].set(xlim=(-1,30))
plot_pacf(df_diff.values,ax = axes[1])

plt.show()

In [None]:
from statsmodels.tsa.arima_model import ARIMA

## 1,1,0 ARIMA model

model = ARIMA(df.values, order = (1,1,0))
model_fit = model.fit(disp = 0)
print(model_fit.summary())

In [None]:
# plot residual errors

residuals = pd.DataFrame(model_fit.resid)
fig, ax = plt.subplots(1,2)
residuals.plot(title = 'Residual', ax = ax[0])
residuals.plot(kind = 'kde', title = 'Density', ax = ax[1])

plt.show()

In [None]:
# Actual vs Fitted

model_fit.plot_predict(dynamic = False)
plt.show()