In [17]:
import pmdarima
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.stattools import adfuller


ModuleNotFoundError: No module named 'pmdarima'

In [15]:
# Read the dataset
df_stock = pd.read_csv('PSEI(2014-2024).csv')
df_stock['Date'] = pd.to_datetime(df_stock['Date'])
df_stock = df_stock.drop(columns=['Open', 'High', 'Low', 'Adj Close', 'Volume']).dropna().sort_values(by='Date').set_index('Date')
df_stock

Unnamed: 0_level_0,Close
Date,Unnamed: 1_level_1
2014-04-01,6514.720215
2014-04-02,6587.720215
2014-04-03,6587.080078
2014-04-04,6561.200195
2014-04-07,6614.399902
...,...
2024-03-21,6963.220215
2024-03-22,6881.970215
2024-03-25,6853.100098
2024-03-26,6898.169922


## Stationarity
Subtract the previous value from the current value. Now if we just difference once, we might not get a stationary series so we might need to do that multiple times.

And the minimum number of differencing operations needed to make the series stationary needs to be imputed into our ARIMA model.

### ADF Test
We'll use the Augumented Dickey Fuller (ADF) test to check if the price series is stationary.

The null hypothesis of the ADF test is that the time series is non-stationary. So, if the p-value of the test is less than the significance level (0.05) then we can reject the null hypothesis and infer that the time series is indeed stationary.

So, in our case, if the p-value > 0.05 we'll need to find the order of differencing.

In [16]:
# Check if the price series is stationary
result = adfuller(df_stock)
print(f'ADF Statistic: {result[0]}')
print(f'p-value: {result[1]}')
result

ADF Statistic: -2.781051502805296
p-value: 0.061031817544298185


(-2.781051502805296,
 0.061031817544298185,
 5,
 2430,
 {'1%': -3.4330439182185093,
  '5%': -2.862730143690387,
  '10%': -2.5674035621263696},
 27916.73108860164)