
# time-series-project

1. Install yahoo finance library (yfinance)
2. Retrieve stock price data
3. Choose any ticker symbol and you can choose “Open” or “Close” price for this project<sup>[1](#myfootnote1)</sup> <sup>[2](#myfootnote2)</sup>
4. Analyse time series<sup>[3](#myfootnote3)</sup>
5. Check for seasonality
6. Train time series model
7. Find the best hyperparameter values of (P, D, Q) with the smallest RMSE for building an ARIMA model<sup>[4](#myfootnote4)</sup>
8. Predict price for the next time period (Maybe two Timestamps)
9. Save timestamp, prediction and actual price to a database

❗ BONUS: Deploy your model to Streamlit.  
a. The app should pull fresh data from the web through yfinance  
b. The user should be able to define a forecasting window (how far ahead they want the forecast)  

❗❗ Extra Super Hard BONUS!! Allow the user to choose the ticker symbol to look up and get predictions for.   
HINT: you'll need to refit the model. This may be hard without determining again the P, D, and Q

<a name="myfootnote1">1</a>: The program should only run between 8am and 4pm ET  
<a name="myfootnote2">2</a>: The data period and interval should be 1 year and 1 hour respectively  
<a name="myfootnote3">3</a>: Choose the last 50 records as your test data for the purpose of saving time  
<a name="myfootnote4">4</a>: Please set the hyperparameter values of (P, D, Q) within these ranges also for the purpose of saving time: For “P”: 1 ≦ P ≦ 3 For “D”: 0 ≦ D ≦ 2 For “Q”: 0 ≦ Q ≦ 2


In [1]:
# Base Imports

import matplotlib.pyplot as plt

%matplotlib inline
%config InlineBackend.figure_formats='retina'

plt.style.use("ggplot")

import numpy as np
import pandas as pd

import warnings

from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tools.sm_exceptions import InterpolationWarning

warnings.simplefilter("ignore", InterpolationWarning)

from sklearn.preprocessing import power_transform
from sklearn import set_config

set_config(transform_output="pandas")


# ADF test is used to determine the presence of unit root in the series, and hence helps in understand if the series is stationary or not. The null and alternate hypothesis of this test are:
# Null Hypothesis: The series has a unit root.
# Alternate Hypothesis: The series has no unit root.
# If the null hypothesis fails to be rejected, this test may provide evidence that the series is non-stationary.
def adf_test(timeseries, autolag="AIC"):
    from statsmodels.tsa.stattools import adfuller

    # "AIC" (Akaike Information Criterion):
    # This option selects the lag length that minimizes the AIC.
    # The AIC is a measure that balances the goodness of fit of a statistical model with its complexity.
    # Lower AIC values generally indicate a better model.
    # "BIC" (Bayesian Information Criterion):
    # Similar to AIC, this option chooses the lag length that minimizes the BIC.
    # The BIC also balances model fit and complexity, but it tends to penalize complex models more heavily than the AIC.
    # "t-stat" (t-statistic):
    # This method starts with the maximum lag length (maxlag) and iteratively reduces it.
    # It stops when the t-statistic of the last lag becomes statistically significant (typically at a 5% significance level).
    # This approach focuses on the statistical significance of the individual lags.
    dftest = adfuller(
        timeseries.dropna(), 
        autolag=autolag, )
    dfoutput = pd.Series(
        dftest[0:4],
        index=[
            "Test Statistic",
            "p-value",
            "#Lags Used",
            "Number of Observations Used",
        ],
    )
    for key, value in dftest[4].items():
        dfoutput["Critical Value (%s)" % key] = value

    dfoutput[autolag] = dftest[5]

    dfoutput.name = "Results of Dickey-Fuller Test"

    if (dftest[1] < 0.05) & (dftest[4]["5%"] > dftest[0]):
        print(
            "\u001b[32mConclusion: The data is stationary (Reject the null hypothesis)\u001b[0m"
        )
    else:
        print(
            "\x1b[31mConclusion: The data is not stationary (Fail to reject the null hypothesis)\x1b[0m"
        )
    return dfoutput


# KPSS is another test for checking the stationarity of a time series. The null and alternate hypothesis for the KPSS test are opposite that of the ADF test.
# Null Hypothesis: The process is trend stationary.
# Alternate Hypothesis: The series has a unit root (series is not stationary).
def kpss_test(timeseries, regression="c"):
    from statsmodels.tsa.stattools import kpss

    kpsstest = kpss(timeseries.dropna(), regression=regression, nlags="auto")
    kpssoutput = pd.Series(
        kpsstest[0:3], index=["Test Statistic", "p-value", "Lags Used"]
    )
    for key, value in kpsstest[3].items():
        kpssoutput["Critical Value (%s)" % key] = value

    kpssoutput.name = "Results of KPSS Test"

    if (kpsstest[1] < 0.05) & (kpsstest[3]["5%"] > kpsstest[0]):
        print(
            "\x1b[31mConclusion: The data is not stationary (Reject the null hypothesis)\x1b[0m"
        )

    else:
        print(
            "\u001b[32mConclusion: The data is stationary (Fail to reject the null hypothesis)\u001b[0m"
        )

    return kpssoutput


def add_stl_plot(fig, res, legend):
    """Add 3 plots from a second STL fit"""
    axs = fig.get_axes()
    comps = ["trend", "seasonal", "resid"]
    for ax, comp in zip(axs[1:], comps):
        series = getattr(res, comp)
        if comp == "resid":
            ax.plot(series, marker="o", linestyle="none")
        else:
            ax.plot(series)
            if comp == "trend":
                ax.legend(legend, frameon=False)


In [2]:
# # Install yahoo finance library (yfinance)
# # %conda install yfinance

# import yfinance as yf
# import os

# # Define the cache directory path
# cache_dir = os.path.expanduser("~/.yfinance_cache")

# # Create the directory if it doesn't exist
# if not os.path.exists(cache_dir):
#     os.makedirs(cache_dir)

# # Set the cache location
# yf.set_tz_cache_location(cache_dir)

# # # Now you can use yfinance functions
# # dat = yf.Ticker("MSFT")
# # dat.info
# # dat.calendar
# # dat.analyst_price_targets
# # dat.quarterly_income_stmt
# # dat.history(period='1mo')
# # dat.option_chain(dat.options[0]).calls

In [12]:
import yfinance as yf
import pandas as pd
from pandas.tseries.holiday import USFederalHolidayCalendar
from pandas.tseries.offsets import CustomBusinessDay
import pytz

def generate_trading_hours(start_date, end_date, frequency,
                           start_time="09:30", end_time="16:00",
                           holidays=USFederalHolidayCalendar(),
                           timezone='America/New_York'):
    """
    Generates a DatetimeIndex of trading hours for non-holiday business days,
    with a specified frequency and custom trading hours.

    Args:
        start_date (str or datetime): Start date for the DatetimeIndex.
        end_date (str or datetime): End date for the DatetimeIndex.
        frequency (str): Pandas frequency string (e.g., '5min', '30min', 'h').
        start_time (str, optional): Start time of the trading day (HH:MM).
            Defaults to '09:30'.
        end_time (str, optional): End time of the trading day (HH:MM).
        holidays (HolidayCalendar, optional): A HolidayCalendar instance
            defining the holidays.  Defaults to US Federal holidays.
        timezone (str, optional): Timezone for the DatetimeIndex.
            Defaults to 'America/New_York'.

    Returns:
        DatetimeIndex: A DatetimeIndex with the trading hours.
    """
    # 1. Define CustomBusinessDay with holidays
    trading_days = CustomBusinessDay(calendar=holidays)

    # 2. Generate business days
    dates = pd.date_range(start=start_date, end=end_date, freq=trading_days)

    # 3. Create time ranges for each day, localized to the specified timezone
    all_times = []
    for day in dates:
        start = pd.Timestamp(day.strftime('%Y-%m-%d') + ' ' + start_time, tz=timezone)
        end = pd.Timestamp(day.strftime('%Y-%m-%d') + ' ' + end_time, tz=timezone)
        day_times = pd.date_range(start=start, end=end, freq=frequency, tz=timezone)  # Use 'h'
        all_times.extend(day_times)

    return pd.DatetimeIndex(all_times)


if __name__ == '__main__':
    # 1. Define the ticker and date range
    ticker = 'AAPL'  # Example: Apple Inc.
    end_date = pd.to_datetime('2025-04-03')
    start_date = (end_date - pd.DateOffset(years=1))
    end_date = end_date.strftime('%Y-%m-%d')

    # 2. Download data from yfinance
    data = yf.download(ticker, start=start_date, end=end_date)
    data = data.tz_localize('America/New_York')  # Localize yfinance data.

    # 3. Generate the 1-hour trading hour index, with timezone
    hourly_trading_hours = generate_trading_hours(start_date, end_date, 'h', timezone='America/New_York')  # Use 'h'

    # 4. Resample in two steps: Daily first, then hourly
    # 4.1 Resample to daily, forward fill to get the last price of the day
    if data.index.tz is not None:
        daily_data = data.tz_convert('America/New_York').resample('D').ffill()
    else:
        daily_data = data.resample('D').ffill()
        daily_data.index = daily_data.index.tz_localize('America/New_York')

    # 4.2 Create a daily DatetimeIndex with the same date range as hourly_trading_hours
    daily_index_for_reindex = pd.to_datetime(hourly_trading_hours.date).unique()
    daily_index_for_reindex = daily_index_for_reindex.tz_localize('America/New_York') #make sure timezone is correct

    # 4.3 Reindex daily_data to the daily index
    intermediate_resampled_data = daily_data.reindex(daily_index_for_reindex)

    # 4.4 Resample to hourly
    resampled_data = intermediate_resampled_data.reindex(hourly_trading_hours).ffill() #changed

    # 5. Print the resampled data
    print("Resampled Data (first 20 rows):")
    print(resampled_data.head(20))
    print("Resampled Data Index (first 20 values):")
    print(resampled_data.index[:20])

    # Print for debugging
    print("Yfinance Data (last 5 rows):\n", data.tail().to_string())
    print("Yfinance Data Index (last 5 values):", data.index[-5:])
    print("Daily data (last 5 rows):\n", daily_data.tail().to_string())
    print("Daily data index (last 5 values):", daily_data.index[-5:])
    print("Intermediate Resampled Data (last 5 rows):\n", intermediate_resampled_data.tail().to_string())
    print("Intermediate Resampled Data Index (last 5 values):\n", intermediate_resampled_data.index[-5:])
    print("Trading Hours Index (last 5 values):", hourly_trading_hours[-5:])
    print("Daily index for reindex (last 5 values)", daily_index_for_reindex[-5:])


[*********************100%***********************]  1 of 1 completed

Resampled Data (first 20 rows):
Price                     Close High  Low Open Volume
Ticker                     AAPL AAPL AAPL AAPL   AAPL
2024-04-03 09:30:00-04:00   NaN  NaN  NaN  NaN    NaN
2024-04-03 10:30:00-04:00   NaN  NaN  NaN  NaN    NaN
2024-04-03 11:30:00-04:00   NaN  NaN  NaN  NaN    NaN
2024-04-03 12:30:00-04:00   NaN  NaN  NaN  NaN    NaN
2024-04-03 13:30:00-04:00   NaN  NaN  NaN  NaN    NaN
2024-04-03 14:30:00-04:00   NaN  NaN  NaN  NaN    NaN
2024-04-03 15:30:00-04:00   NaN  NaN  NaN  NaN    NaN
2024-04-04 09:30:00-04:00   NaN  NaN  NaN  NaN    NaN
2024-04-04 10:30:00-04:00   NaN  NaN  NaN  NaN    NaN
2024-04-04 11:30:00-04:00   NaN  NaN  NaN  NaN    NaN
2024-04-04 12:30:00-04:00   NaN  NaN  NaN  NaN    NaN
2024-04-04 13:30:00-04:00   NaN  NaN  NaN  NaN    NaN
2024-04-04 14:30:00-04:00   NaN  NaN  NaN  NaN    NaN
2024-04-04 15:30:00-04:00   NaN  NaN  NaN  NaN    NaN
2024-04-05 09:30:00-04:00   NaN  NaN  NaN  NaN    NaN
2024-04-05 10:30:00-04:00   NaN  NaN  NaN  NaN    




In [13]:
data

Price,Close,High,Low,Open,Volume
Ticker,AAPL,AAPL,AAPL,AAPL,AAPL
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
2024-04-03 00:00:00-04:00,168.852707,169.877865,167.787743,167.996748,47691700
2024-04-04 00:00:00-04:00,168.026611,171.112033,168.026611,169.489689,53704400
2024-04-05 00:00:00-04:00,168.783035,169.589226,168.155991,168.792983,42055200
2024-04-08 00:00:00-04:00,167.658356,168.404831,167.449351,168.235632,37425500
2024-04-09 00:00:00-04:00,168.872604,169.280681,167.558816,167.907162,42451200
...,...,...,...,...,...
2025-03-27 00:00:00-04:00,223.850006,224.990005,220.559998,221.389999,37094800
2025-03-28 00:00:00-04:00,217.899994,223.809998,217.679993,221.669998,39818600
2025-03-31 00:00:00-04:00,222.130005,225.619995,216.229996,217.009995,65299300
2025-04-01 00:00:00-04:00,223.190002,223.679993,218.899994,219.809998,36412700


In [5]:
daily_data

Price,Close,High,Low,Open,Volume
Ticker,AAPL,AAPL,AAPL,AAPL,AAPL
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
2024-04-03 00:00:00-04:00,168.852707,169.877865,167.787743,167.996748,47691700
2024-04-04 00:00:00-04:00,168.026611,171.112033,168.026611,169.489689,53704400
2024-04-05 00:00:00-04:00,168.783035,169.589226,168.155991,168.792983,42055200
2024-04-06 00:00:00-04:00,168.783035,169.589226,168.155991,168.792983,42055200
2024-04-07 00:00:00-04:00,168.783035,169.589226,168.155991,168.792983,42055200
...,...,...,...,...,...
2025-03-29 00:00:00-04:00,217.899994,223.809998,217.679993,221.669998,39818600
2025-03-30 00:00:00-04:00,217.899994,223.809998,217.679993,221.669998,39818600
2025-03-31 00:00:00-04:00,222.130005,225.619995,216.229996,217.009995,65299300
2025-04-01 00:00:00-04:00,223.190002,223.679993,218.899994,219.809998,36412700


In [6]:
intermediate_resampled_data

Price,Close,High,Low,Open,Volume
Ticker,AAPL,AAPL,AAPL,AAPL,AAPL
2024-04-03 00:00:00-04:00,168.852707,169.877865,167.787743,167.996748,47691700.0
2024-04-04 00:00:00-04:00,168.026611,171.112033,168.026611,169.489689,53704400.0
2024-04-05 00:00:00-04:00,168.783035,169.589226,168.155991,168.792983,42055200.0
2024-04-08 00:00:00-04:00,167.658356,168.404831,167.449351,168.235632,37425500.0
2024-04-09 00:00:00-04:00,168.872604,169.280681,167.558816,167.907162,42451200.0
...,...,...,...,...,...
2025-03-28 00:00:00-04:00,217.899994,223.809998,217.679993,221.669998,39818600.0
2025-03-31 00:00:00-04:00,222.130005,225.619995,216.229996,217.009995,65299300.0
2025-04-01 00:00:00-04:00,223.190002,223.679993,218.899994,219.809998,36412700.0
2025-04-02 00:00:00-04:00,223.889999,225.190002,221.020004,221.320007,35905900.0


In [7]:
resampled_data

Price,Close,High,Low,Open,Volume
Ticker,AAPL,AAPL,AAPL,AAPL,AAPL
2024-04-03 09:00:00-04:00,,,,,
2024-04-03 10:00:00-04:00,,,,,
2024-04-03 11:00:00-04:00,,,,,
2024-04-03 12:00:00-04:00,,,,,
2024-04-03 13:00:00-04:00,,,,,
...,...,...,...,...,...
2025-04-03 11:00:00-04:00,,,,,
2025-04-03 12:00:00-04:00,,,,,
2025-04-03 13:00:00-04:00,,,,,
2025-04-03 14:00:00-04:00,,,,,


# Choose stock

In [8]:
# Retrieve stock price data
# Choose any ticker symbol and you can choose “Open” or “Close” price for this project

# hoping to build model for
# tickers = ['AAPL', 'MSFT', 'GOOGL', 'AMZN', 'JNJ', 'XOM', 'V', 'WMT', 'NVDA', 'PG',
#             'LLY', 'CVX', 'MA', 'HD', 'PFE', 'ABBV', 'MRK', 'KO', 'PEP', 'AVGO', 'ORCL',
#             'TMO', 'AZN', 'CSCO', 'DHR', 'MCD', 'ABT', 'TMUS', 'ACN', 'NEE', 'VZ', 'TTE',
#             'LIN', 'DIS', 'PM', 'BMY', 'CMCSA', 'SCHW', 'UPS', 'TXN', 'RTX', 'COP']
# will start with 'AAPl'


# 1. Define the ticker and date range
tickers = ["AAPL"]
end_date = '2025-04-03' # set this to today()
start_date = (pd.to_datetime(end_date) - pd.DateOffset(years=1)).strftime('%Y-%m-%d')  # 1 year before

# 2. Download data from yfinance
yf_dl = yf.download(
    tickers,
    start=start_date,
    end=end_date,
    actions=False,
    threads=True,
    ignore_tz=None,
    group_by="column",
    auto_adjust=True,
    back_adjust=False,
    repair=False,
    keepna=False,
    progress=True,
    # period="1y",
    # interval="1h",
    prepost=False,
    proxy=None,
    rounding=False,
    timeout=10,
    session=None,
    multi_level_index=False,
)

# 3. Generate the 1-hour trading hour index
hourly_trading_hours = generate_trading_hours(start_date, end_date, 'h')

# 4. Resample the yfinance data to 1-hour intervals, aligning with trading hours
#   -  Important:  We use .reindex() to align the yfinance data to our
#      trading hour index.  This will introduce NaN values where the
#      yfinance data doesn't exactly match our trading hour times.
#   -  Then, we resample to hourly, and fill the NaNs using ffill (forward fill)
#      so that each hour has the last available price.
yf_dl = yf_dl.reindex(hourly_trading_hours).resample('h').ffill()

yf_dl.head(21)

[*********************100%***********************]  1 of 1 completed


Unnamed: 0,Close,High,Low,Open,Volume
2024-04-03 09:00:00-04:00,,,,,
2024-04-03 10:00:00-04:00,,,,,
2024-04-03 11:00:00-04:00,,,,,
2024-04-03 12:00:00-04:00,,,,,
2024-04-03 13:00:00-04:00,,,,,
2024-04-03 14:00:00-04:00,,,,,
2024-04-03 15:00:00-04:00,,,,,
2024-04-03 16:00:00-04:00,,,,,
2024-04-03 17:00:00-04:00,,,,,
2024-04-03 18:00:00-04:00,,,,,


In [9]:
# Analyse time series

# from pandas.tseries.offsets import Minute
# fifteen_min = Minute(n=15)

# yf_dl = yf_dl.resample(pd.tseries.offsets.BusinessHour).mean() # resample by business hour
# yf_dl.index.freq = pd.tseries.offsets.BusinessHour
# yf_dl.head(24)

In [10]:
timeseries = yf_dl["Close"]
timeseries

2024-04-03 09:00:00-04:00   NaN
2024-04-03 10:00:00-04:00   NaN
2024-04-03 11:00:00-04:00   NaN
2024-04-03 12:00:00-04:00   NaN
2024-04-03 13:00:00-04:00   NaN
                             ..
2025-04-03 11:00:00-04:00   NaN
2025-04-03 12:00:00-04:00   NaN
2025-04-03 13:00:00-04:00   NaN
2025-04-03 14:00:00-04:00   NaN
2025-04-03 15:00:00-04:00   NaN
Freq: h, Name: Close, Length: 8767, dtype: float64

In [11]:
from scipy.signal import periodogram, find_peaks

# Compute power spectrum
frequencies, power = periodogram(timeseries.dropna())
# Find peaks in the power spectrum
peaks, _ = find_peaks(power, height=0.1 * max(power))  # Tune height threshold
# Convert frequency to period
periods = 1 / frequencies[peaks]

periods

ValueError: max() iterable argument is empty

In [None]:
adf_test(timeseries).round(6)

In [None]:
kpss_test(timeseries)

In [None]:
adf_lags_used = adf_test(timeseries).iloc[2] # lags used by ADF test
adf_lags_used += 1 # need to add lag 0 to complete the period
period = adf_lags_used.astype(int) # convert to int

if period < 7:
    period = 35
else:
    period = period

In [None]:
from statsmodels.tsa.seasonal import STL

seasonal = period + ((period % 2) == 0)  # Ensure odd
seasonal_jump = int(np.ceil(0.15 * (period + 1)))
low_pass_jump = int(np.ceil(0.15 * (period + 1))) # must be a positive integer
trend_jump = int(np.ceil(0.15 * 1.5 * (period + 1)))

if low_pass_jump < 1:
    low_pass_jump = 1
else:
    low_pass_jump = low_pass_jump

if seasonal_jump < 1:
    seasonal_jump = 1
else:
    seasonal_jump = seasonal_jump



mod = STL(
    timeseries,
    period=period,
    seasonal=seasonal,
    seasonal_deg=1, 
    trend_deg=1, 
    low_pass_deg=1, 
    robust=True,
    seasonal_jump=seasonal_jump,
    trend_jump=trend_jump,
    low_pass_jump=low_pass_jump,
)

In [None]:
# Import seasonal_decompose
from statsmodels.tsa.seasonal import seasonal_decompose

# Check for seasonality
decomposition = seasonal_decompose(
    timeseries.dropna(), 
    model="additive", 
    filt=None, 
    period=period, 
    two_sided=False, 
    extrapolate_trend="freq"
)

seasonals = decomposition.seasonal
trends = decomposition.trend

no_seasonal_trend = timeseries - trends - seasonals

timeseries_cleaned = no_seasonal_trend.clip(lower=0)

timeseries_cleaned.plot()
# yf_dl.plot(y="Close")
# yf_dl.plot(y="Close_no_seasonal_trend")
# decomposition


In [None]:
adf_test(timeseries=timeseries_cleaned).round(6)

In [None]:
kpss_test(timeseries=timeseries_cleaned)

In [None]:
# Train time series model

from statsmodels.tsa.arima.model import ARIMA
from sklearn.model_selection import train_test_split

train, test = train_test_split(
    timeseries_cleaned, 
    test_size=period, 
    shuffle=False
)
display(train.tail(period))
print(train.shape)
display(test.head(period))
print(test.shape)


In [None]:
from statsmodels.tsa.stattools import acf, pacf


pacf_values = pacf(
    train, 
    nlags=None, 
    method="burg"
    )

acf_values = acf(
    train,
    adjusted=False, 
    nlags=None,
    qstat=False,
    fft=True,
    bartlett_confint=True,
    missing="none",
)

alpha = 0.05 # set our confidence at 5%
# iterate to set p and q as we would graphically
p = -1
for val in pacf_values:
    if np.abs(val) > alpha:
        p += 1
    else:
        break
if p > 0:
    p = p
else:
    p = 0

q = -1
for val in acf_values:
    if np.abs(val) > alpha:
        q += 1
    else:
        break
if q > 0:
    q = q
else:
    q = 0

In [None]:
# Find the best hyperparameter values of (P, D, Q) with the smallest RMSE for building an ARIMA model
# Please set the hyperparameter values of (P, D, Q) within these ranges also for the purpose of saving time:
# For “P”: 1 ≦ P ≦ 3 For “D”: 0 ≦ D ≦ 2 For “Q”: 0 ≦ Q ≦ 2

from sklearn.metrics import root_mean_squared_error
from statsmodels.tsa.arima.model import ARIMA

model_0 = ARIMA(
    train,
    exog=None,
    order=(p, 0, q),
    seasonal_order=(0, 0, 0, 0),
    trend="n",
    enforce_stationarity=True,
    enforce_invertibility=True,
    concentrate_scale=False,
    trend_offset=1,
    dates=None,
    freq=None,
    missing="none",
    validate_specification=True,
)

model_1 = ARIMA(
    train,
    exog=None,
    order=(p, 1, q),
    seasonal_order=(0, 0, 0, 0),
    trend="n",
    enforce_stationarity=True,
    enforce_invertibility=True,
    concentrate_scale=False,
    trend_offset=1,
    dates=None,
    freq=None,
    missing="none",
    validate_specification=True,
)

model_2 = ARIMA(
    train,
    exog=None,
    order=(p, 2, q),
    seasonal_order=(0, 0, 0, 0),
    trend="n",
    enforce_stationarity=True,
    enforce_invertibility=True,
    concentrate_scale=False,
    trend_offset=1,
    dates=None,
    freq=None,
    missing="none",
    validate_specification=True,
)

fitted_model_0 = model_0.fit()
fitted_model_1 = model_1.fit()
fitted_model_2 = model_2.fit()

In [None]:
# Predict price for the next time period (Maybe two Timestamps)
stock_forecast_0 = fitted_model_0.forecast(steps=len(test))
stock_forecast_1 = fitted_model_1.forecast(steps=len(test))
stock_forecast_2 = fitted_model_2.forecast(steps=len(test))

# test_results = pd.DataFrame(columns=["test", "test_forecast"])
# test_results["test"] = test
# test_results["test_forecast"] = stock_forecast

# Generate forecasts for the test period
# stock_forecast = fitted_model.forecast(steps=len(test))

# Create DataFrame with test and forecast values
test_results = pd.DataFrame(
    {"test": test, 
    "test_forecast_0": stock_forecast_0.iloc[0],
    "test_forecast_1": stock_forecast_1.iloc[0],
    "test_forecast_2": stock_forecast_2.iloc[0]
    }, index=test.index
)

In [None]:
# Plot actual vs predicted values
fig, ax = plt.subplots(figsize=(12, 6))
test_results["test"].plot(label="Actual", ax=ax)
test_results["test_forecast_0"].plot(label="Forecast_0", ax=ax, style="--", color="blue")
test_results["test_forecast_1"].plot(label="Forecast_1", ax=ax, style="--", color="green")
test_results["test_forecast_2"].plot(label="Forecast_2", ax=ax, style="--", color="yellow")

plt.title("ARIMA Forecast vs Actual Values")
plt.legend()
plt.show()

# test_results.plot();

# Results

In [None]:
print("The d = 0 root mean squared error is: ", root_mean_squared_error(test_results["test"], test_results["test_forecast_0"]))
print("The d = 1 root mean squared error is: ", root_mean_squared_error(test_results["test"], test_results["test_forecast_1"]))
print("The d = 2 root mean squared error is: ", root_mean_squared_error(test_results["test"], test_results["test_forecast_2"]))
print(f"The p is {p}")
print(f"The q is {q}")

In [None]:
# Save timestamp, prediction and actual price to a database