<a href="https://www.kaggle.com/code/allaboutdatascience/different-forecasting-methods-on-stock-price-data?scriptVersionId=197836674" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# <center><strong> Welcome to Data Gyani 

<center>Please Subscribe to my channel-  https://www.youtube.com/@DataGyani-in</center>
    
    
    ### <center>YouTube Video</center>

<center><iframe width="560" height="315" src="https://www.youtube.com/watch?v=BvMPAHRV48c" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe></center>


## Different Forecasting Methods using Facebook's Prophet Model

### What are we learning?

<table border="1" cellpadding="10">
  <thead>
    <tr>
      <th>Forecasting Method</th>
      <th>How It Works</th>
      <th>Best For</th>
      <th>Shortcomings</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Simple Forecasting</td>
      <td>
        Model is trained once using historical data and predicts multiple future steps without retraining.
      </td>
      <td>
        Short-term predictions; stationary data; short forecasting horizons.
      </td>
      <td>
        Error accumulation for long-term forecasts; no model adjustment for past mistakes.
      </td>
    </tr>
    <tr>
      <td>Recursive Forecasting</td>
      <td>
        Model predicts one time step ahead, and the prediction is fed back to predict subsequent steps in a recursive loop.
      </td>
      <td>
        Slow-moving trends; consistent relationships between data points over time.
      </td>
      <td>
        Error accumulation due to compounding prediction errors; poor performance over long horizons.
      </td>
    </tr>
    <tr>
      <td>Direct-Recursive Hybrid Forecasting</td>
      <td>
        A mix of direct forecasting for the first few steps and recursive forecasting for remaining steps.
      </td>
      <td>
        Medium to long-term forecasting, balancing trend capture and extension of forecasts.
      </td>
      <td>
        Reduces but does not eliminate error accumulation; increased model complexity.
      </td>
    </tr>
    <tr>
      <td>Rolling Window Forecasting</td>
      <td>
        Model is trained on a fixed-size window of recent data; window shifts as new data becomes available.
      </td>
      <td>
        Non-stationary data; situations with concept drift; when recent data is more relevant.
      </td>
      <td>
        High computational cost due to frequent retraining; window size tuning is critical.
      </td>
    </tr>
  </tbody>
</table>


In [1]:
!pip install yfinance
!pip install prophet

Collecting yfinance
  Downloading yfinance-0.2.43-py2.py3-none-any.whl.metadata (11 kB)
Collecting multitasking>=0.0.7 (from yfinance)
  Downloading multitasking-0.0.11-py3-none-any.whl.metadata (5.5 kB)
Collecting peewee>=3.16.2 (from yfinance)
  Downloading peewee-3.17.6.tar.gz (3.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.0/3.0 MB[0m [31m52.7 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l- \ | / done
[?25h  Getting requirements to build wheel ... [?25l- done
[?25h  Preparing metadata (pyproject.toml) ... [?25l- done
Downloading yfinance-0.2.43-py2.py3-none-any.whl (84 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.6/84.6 kB[0m [31m4.0 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading multitasking-0.0.11-py3-none-any.whl (8.5 kB)
Building wheels for collected packages: peewee
  Building wheel for peewee (pyproject.toml) ... [?25l- \ | / done
[?25h  Created whe

In [2]:
import yfinance as yf
import pandas as pd
import numpy as np
import datetime
from prophet import Prophet
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, mean_absolute_error
import plotly.graph_objs as go
from plotly.subplots import make_subplots
import warnings

warnings.filterwarnings("ignore")


In [3]:
# Define the stock symbol, start date, and end date
stock_symbol = 'AAPL'  # Using Apple share prices Just for example
start_date = '2010-01-01'
end_date = datetime.datetime.now().date()

# Fetch the stock data
stock_data = yf.download(stock_symbol, start=start_date, end=end_date)

# Display the data
stock_data.head()

[*********************100%***********************]  1 of 1 completed


Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2010-01-04,7.6225,7.660714,7.585,7.643214,6.454505,493729600
2010-01-05,7.664286,7.699643,7.616071,7.656429,6.465665,601904800
2010-01-06,7.656429,7.686786,7.526786,7.534643,6.362819,552160000
2010-01-07,7.5625,7.571429,7.466071,7.520714,6.351055,477131200
2010-01-08,7.510714,7.571429,7.466429,7.570714,6.393281,447610800


## Simple Prophet Model

#### 1. Process data for Prophet model

In [4]:

stock_data = stock_data.reset_index()  # Reset index to move Date to a column
prophet_data = stock_data[['Date', 'Close']].rename(columns={'Date': 'ds', 'Close': 'y'})  # Format for Prophet
prophet_data.tail()

Unnamed: 0,ds,y
3699,2024-09-16,216.320007
3700,2024-09-17,216.789993
3701,2024-09-18,220.690002
3702,2024-09-19,228.869995
3703,2024-09-20,228.199997


#### 2. Split data into train and validation

In [5]:

validation_size = 30  # Validation period size - Choose as per your requirement
train_data = prophet_data[:-validation_size]  # Training data
validation_data = prophet_data[-validation_size:]  # Validation data

#printing shape of our split
print("Shape of train_data:", train_data.shape)
print("Shape of validation_data:", validation_data.shape)

Shape of train_data: (3674, 2)
Shape of validation_data: (30, 2)


#### 3. Create Prophet model and train on training set

In [6]:
model = Prophet()
model.fit(train_data)

21:36:06 - cmdstanpy - INFO - Chain [1] start processing
21:36:08 - cmdstanpy - INFO - Chain [1] done processing


<prophet.forecaster.Prophet at 0x7ccab0e48af0>

#### 4. Check Model fitment on training data

In [7]:
# Predict on training data for model fitment using visualization
train_forecast = model.predict(train_data[['ds']])

# Evaluation metrics
rmse = round(np.sqrt(mean_squared_error(train_data['y'], train_forecast['yhat'])),2)
mae = round(mean_absolute_error(train_data['y'], train_forecast['yhat']),2)
mape = round((1- np.mean(np.abs((train_data['y'] - train_forecast['yhat']) / train_data['y'])))* 100,2)
mda = round((np.mean((np.sign(train_data['y'].diff()) == np.sign(train_forecast['yhat'].diff())).astype(int)) * 100),2)

# Print evaluation metrics
print(f"RMSE of Training: {rmse}")
print(f"MAE of Training: {mae}")
print(f"Accuracy(1-MAPE) of Training: {mape:.2f}%")
print(f"MDA of Training: {mda:.2f}%")

# Create a Plotly figure to visualize the model fit on training data
fig = make_subplots()

# Add actual data trace (training data)
fig.add_trace(go.Scatter(
    x=train_data['ds'], 
    y=train_data['y'], 
    mode='lines', 
    name='Actual (Training)',
    line=dict(color='red')
))

# Add forecasted data trace (training fitment)
fig.add_trace(go.Scatter(
    x=train_forecast['ds'], 
    y=train_forecast['yhat'], 
    mode='lines', 
    name='Forecast (Training Fit)',
    line=dict(color='blue', dash='dash')
))

# Update layout
fig.update_layout(
    title="Prophet Model Fit on Training Data",
    xaxis_title="Date",
    yaxis_title="Close Price",
    legend=dict(x=0.01, y=0.99),
    hovermode="x unified"
)

# Show the plot
fig.show()


RMSE of Training: 7.72
MAE of Training: 4.79
Accuracy(1-MAPE) of Training: 90.86%
MDA of Training: 52.31%


#### 5. Forecast for validation period

In [8]:
future = model.make_future_dataframe(periods=60)  # Create a dataframe to hold forecast
forecast_1 = model.predict(future)

# Extract the forecasted values for the validation period
forecast_validation_1 = forecast_1[['ds', 'yhat']].iloc[-validation_size:]


In [9]:
# Align forecast with actual trading days in validation set
forecast_filtered_1 = forecast_1[forecast_1['ds'].isin(validation_data['ds'])]

# Merge with the actual validation data for comparison
validation_data_1 = validation_data.merge(forecast_filtered_1, on='ds', how='left')

# Drop any rows with NaN values
validation_data_1.dropna(inplace=True)
validation_data_1[["ds", "y", "yhat"]].tail(10)


Unnamed: 0,ds,y,yhat
20,2024-09-09,220.910004,196.642779
21,2024-09-10,220.110001,196.410198
22,2024-09-11,222.660004,196.260898
23,2024-09-12,222.770004,196.038568
24,2024-09-13,222.5,195.836933
25,2024-09-16,216.320007,195.616907
26,2024-09-17,216.789993,195.391729
27,2024-09-18,220.690002,195.263089
28,2024-09-19,228.869995,195.074076
29,2024-09-20,228.199997,194.917607


In [10]:
# Evaluation metrics
rmse = round(np.sqrt(mean_squared_error(validation_data_1['y'], validation_data_1['yhat'])), 2)
mae = round(mean_absolute_error(validation_data_1['y'], validation_data_1['yhat']), 2)
mape = round((1 - np.mean(np.abs((validation_data_1['y'] - validation_data_1['yhat']) / validation_data_1['y'])))*100, 2)

# Mean Directional Accuracy (MDA)
mda = round(np.mean((np.sign(validation_data_1['y'].diff()) == np.sign(validation_data_1['yhat'].diff())).astype(int)) * 100, 2)

# Print evaluation metrics
print(f"RMSE of Validation: {rmse}")
print(f"MAE of Validation: {mae}")
print(f"Accuracy(1-MAPE) of Validation: {mape:.2f}%")
print(f"MDA of Validation: {mda:.2f}%")

# Plot the actual vs forecasted values for the validation period
def plot_forecast_vs_actual(validation_data_1):
    fig = go.Figure()

    # Add the actual closing prices to the plot
    fig.add_trace(go.Scatter(x=validation_data_1['ds'], 
                             y=validation_data_1['y'], 
                             mode='lines', 
                             name='Actual',
                             line=dict(color='red')))

    # Add the forecasted values (yhat) to the plot
    fig.add_trace(go.Scatter(x=validation_data_1['ds'], 
                             y=validation_data_1['yhat'], 
                             mode='lines', 
                             name='Forecast',
                             line=dict(color='blue', dash='dash')))

    # Set the layout of the plot
    fig.update_layout(title='Forecast vs Actuals for Validation Period',
                      xaxis_title='Date',
                      yaxis_title='Close Price',
                      legend_title='Legend')

    # Display the plot
    fig.show()

# Call the function to plot the forecast vs actual values
plot_forecast_vs_actual(validation_data_1)

RMSE of Validation: 27.33
MAE of Validation: 27.09
Accuracy(1-MAPE) of Validation: 87.90%
MDA of Validation: 50.00%


## Simple Recursive Method for forecasting using Prophet
This is simple and most commonly used forecasting technique where we use model to make one-step-ahead forecasts and then uses the forecasted values as inputs for future predictions.
#### Let's Dive in!!

In [11]:
# Train the Prophet model
def train_prophet(train_df):
    recursive_model = Prophet()
    return recursive_model.fit(train_df)

# Make predictions with recursive technique 
def make_predictions(recursive_model, df, forecast_period):
    # Create a future dataframe for trading days only (business days)
    future = pd.date_range(start=df['ds'].max(), periods=forecast_period + 1, freq='B')[1:]
    future = pd.DataFrame(future, columns=['ds'])
    predictions = pd.DataFrame()  # Initialize as an empty dataframe

    for i in range(forecast_period):
        forecast = recursive_model.predict(future.head(1))  # Only forecast the next point
        predictions = pd.concat([predictions, forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']]])

        next_point = pd.DataFrame({
            'ds': [future['ds'].iloc[0]],
            'y': [forecast['yhat'].iloc[0]]
        })
        df = pd.concat([df, next_point]).reset_index(drop=True)
        # print(df)

        # Recreate the future dataframe with business days (trading days)
        future = pd.date_range(start=df['ds'].max(), periods=2, freq='B')[1:]
        future = pd.DataFrame(future, columns=['ds'])

    return predictions


In [12]:
# Train Prophet model on training data
recursive_model = train_prophet(train_data)

21:36:11 - cmdstanpy - INFO - Chain [1] start processing
21:36:13 - cmdstanpy - INFO - Chain [1] done processing


In [13]:

# # Predict on training data for model fitment
# train_forecast_2 = recursive_model.predict(train_data[['ds']])

# # Evaluation metrics for training set
# rmse_train = round(np.sqrt(mean_squared_error(train_data['y'], train_forecast_2['yhat'])), 2)
# mae_train = round(mean_absolute_error(train_data['y'], train_forecast_2['yhat']), 2)
# mape_train = round((1 - np.mean(np.abs((train_data['y'] - train_forecast_2['yhat']) / train_data['y'])))*100, 2)
# mda_train = round(np.mean((np.sign(train_data['y'].diff()) == np.sign(train_forecast_2['yhat'].diff())).astype(int)) * 100, 2)

# # Print evaluation metrics for training set
# print(f"RMSE of Training: {rmse_train}")
# print(f"MAE of Training: {mae_train}")
# print(f"Accuracy(1-MAPE) of Training: {mape_train:.2f}%")
# print(f"MDA of Training: {mda_train:.2f}%")

# # Create a Plotly figure to visualize the model fit on training data
# fig = make_subplots()

# # Add actual data trace (training data)
# fig.add_trace(go.Scatter(
#     x=train_data['ds'], 
#     y=train_data['y'], 
#     mode='lines', 
#     name='Actual (Training)',
#     line=dict(color='red')
# ))

# # Add forecasted data trace (training fitment)
# fig.add_trace(go.Scatter(
#     x=train_forecast['ds'], 
#     y=train_forecast['yhat'], 
#     mode='lines', 
#     name='Forecast (Training Fit)',
#     line=dict(color='blue', dash='dash')
# ))

# # Update layout
# fig.update_layout(
#     title="Prophet Model Fit on Training Data",
#     xaxis_title="Date",
#     yaxis_title="Close Price",
#     legend=dict(x=0.01, y=0.99),
#     hovermode="x unified"
# )

# # Show the plot
# fig.show()

In [14]:
# Make predictions for the validation data using the recursive technique
forecast_period = validation_size  # Using the validation size for the forecast period
forecast_2 = make_predictions(recursive_model, train_data, forecast_period)

In [15]:
# Align forecast with actual trading days in the validation set
forecast_filtered_2 = forecast_2[forecast_2['ds'].isin(validation_data['ds'])]

# Merge forecast with the actual validation data for comparison
validation_data_2 = validation_data.merge(forecast_filtered_2, on='ds', how='left')

# Drop any rows with NaN values,if any
validation_data_2.dropna(inplace=True)

validation_data_2.tail(10)

Unnamed: 0,ds,y,yhat,yhat_lower,yhat_upper
19,2024-09-06,220.820007,196.803022,187.313187,206.96103
20,2024-09-09,220.910004,196.642779,186.307123,206.502907
21,2024-09-10,220.110001,196.410198,187.014246,206.487152
22,2024-09-11,222.660004,196.260898,186.299034,206.52327
23,2024-09-12,222.770004,196.038568,185.305597,206.140149
24,2024-09-13,222.5,195.836933,185.642436,205.27848
25,2024-09-16,216.320007,195.616907,186.449961,205.48157
26,2024-09-17,216.789993,195.391729,184.879418,205.224492
27,2024-09-18,220.690002,195.263089,185.876874,205.388563
28,2024-09-19,228.869995,195.074076,184.719998,204.809473


In [16]:

# Evaluation metrics for validation set
rmse_2 = round(np.sqrt(mean_squared_error(validation_data_2['y'], validation_data_2['yhat'])), 2)
mae_2 = round(mean_absolute_error(validation_data_2['y'], validation_data_2['yhat']), 2)
mape_2 = round((1 - np.mean(np.abs((validation_data_2['y'] - validation_data_2['yhat']) / validation_data_2['y'])))*100, 2)
mda_2 = round(np.mean((np.sign(validation_data_2['y'].diff()) == np.sign(validation_data_2['yhat'].diff())).astype(int)) * 100, 2)

# Print evaluation metrics
print(f"RMSE of Validation: {rmse_2}")
print(f"MAE of Validation: {mae_2}")
print(f"Accuracy(1-MAPE) of Validation: {mape_2:.2f}%")
print(f"MDA of Validation: {mda_2:.2f}%")

# Plot the actual vs forecasted values for the validation period
def plot_forecast_vs_actual(validation_data_2):
    fig = go.Figure()

    # Add the actual closing prices to the plot
    fig.add_trace(go.Scatter(x=validation_data_2['ds'], 
                             y=validation_data_2['y'], 
                             mode='lines', 
                             name='Actual',
                             line=dict(color='red')))

    # Add the forecasted values (yhat) to the plot
    fig.add_trace(go.Scatter(x=validation_data_2['ds'], 
                             y=validation_data_2['yhat'], 
                             mode='lines', 
                             name='Forecast',
                             line=dict(color='blue', dash='dash')))

    # Set the layout of the plot
    fig.update_layout(title='Forecast vs Actuals for Validation Period',
                      xaxis_title='Date',
                      yaxis_title='Close Price',
                      legend_title='Legend')

    # Display the plot
    fig.show()

# Call the function to plot the forecast vs actual values
plot_forecast_vs_actual(validation_data_2)


RMSE of Validation: 27.11
MAE of Validation: 26.87
Accuracy(1-MAPE) of Validation: 87.99%
MDA of Validation: 48.28%


## Direct-Recursive Hybrid

In [17]:

# Train the Prophet model
def train_prophet(train_df):
    recursive_direct_model = Prophet()
    return recursive_direct_model.fit(train_df)

# Train a Linear Regression model (for fine-tuning)- This is where hybrid gets initialized
def train_direct_model(train_df):
    X = np.arange(len(train_df)).reshape(-1, 1)  # Create an index feature
    y = train_df['y'].values
    direct_model = LinearRegression()
    direct_model.fit(X, y)
    return direct_model

# Make predictions with recursive technique combined with a direct fine-tuning model
def make_hybrid_predictions(recursive_direct_model, direct_model, df, forecast_period):
    future = pd.date_range(start=df['ds'].max(), periods=forecast_period + 1, freq='B')[1:]
    future = pd.DataFrame(future, columns=['ds'])
    predictions = pd.DataFrame()  # Initialize as an empty dataframe

    for i in range(forecast_period):
        forecast = recursive_direct_model.predict(future.head(1))  # Only forecast the next point
        next_point = pd.DataFrame({
            'ds': [future['ds'].iloc[0]],
            'y': [forecast['yhat'].iloc[0]]
        })
        
        # Add fine-tuning using the direct model
        X_future = np.array([[len(df) + i]])  # Use the updated index for fine-tuning
        fine_tuned_value = direct_model.predict(X_future)[0]
        next_point['y'] = (next_point['y'] + fine_tuned_value) / 2  # Average both predictions

        predictions = pd.concat([predictions, forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']]])
        df = pd.concat([df, next_point]).reset_index(drop=True)

        # Recreate the future dataframe with business days (trading days)
        future = pd.date_range(start=df['ds'].max(), periods=2, freq='B')[1:]
        future = pd.DataFrame(future, columns=['ds'])

    return predictions


In [18]:

# Train Prophet model on training data
recursive_model = train_prophet(train_data)

# Train Linear Regression (Direct Model) on training data
direct_model = train_direct_model(train_data)

# Predict on training data for model fitment
train_forecast_3 = recursive_model.predict(train_data[['ds']])



21:36:24 - cmdstanpy - INFO - Chain [1] start processing
21:36:25 - cmdstanpy - INFO - Chain [1] done processing


In [19]:

# # Evaluation metrics for training set
# rmse_train_3 = round(np.sqrt(mean_squared_error(train_data['y'], train_forecast_3['yhat'])), 2)
# mae_train_3 = round(mean_absolute_error(train_data['y'], train_forecast_3['yhat']), 2)
# mape_train_3 = round((1 - np.mean(np.abs((train_data['y'] - train_forecast_3['yhat']) / train_data['y'])))*100, 2)
# mda_train_3 = round(np.mean((np.sign(train_data['y'].diff()) == np.sign(train_forecast_3['yhat'].diff())).astype(int)) * 100, 2)

# # Print evaluation metrics for training set
# print(f"RMSE of Training: {rmse_train_3}")
# print(f"MAE of Training: {mae_train_3}")
# print(f"Accuracy(1-MAPE) of Training: {mape_train_3:.2f}%")
# print(f"MDA of Training: {mda_train_3:.2f}%")

# # Create a Plotly figure to visualize the model fit on training data
# fig = make_subplots()

# # Add actual data trace (training data)
# fig.add_trace(go.Scatter(
#     x=train_data['ds'], 
#     y=train_data['y'], 
#     mode='lines', 
#     name='Actual (Training)',
#     line=dict(color='red')
# ))

# # Add forecasted data trace (training fitment)
# fig.add_trace(go.Scatter(
#     x=train_forecast_3['ds'], 
#     y=train_forecast_3['yhat'], 
#     mode='lines', 
#     name='Forecast (Training Fit)',
#     line=dict(color='blue', dash='dash')
# ))

# # Update layout
# fig.update_layout(
#     title="Prophet Model Fit on Training Data",
#     xaxis_title="Date",
#     yaxis_title="Close Price",
#     legend=dict(x=0.01, y=0.99),
#     hovermode="x unified"
# )

# # Show the plot
# fig.show()



In [20]:
# Make predictions for the validation data using the hybrid technique
forecast_period = validation_size  # Using same code as above
forecast_3 = make_hybrid_predictions(recursive_model, direct_model, train_data, forecast_period)

# Align forecast with actual trading days in the validation set
forecast_filtered_3 = forecast_3[forecast_3['ds'].isin(validation_data['ds'])]

# Merge forecast with the actual validation data for comparison
validation_data_3 = validation_data.merge(forecast_filtered_3, on='ds', how='left')

# Drop any rows with NaN values, if any
validation_data_3.dropna(inplace=True)

validation_data_3.tail(10)


Unnamed: 0,ds,y,yhat,yhat_lower,yhat_upper
19,2024-09-06,220.820007,196.803022,186.837983,206.174211
20,2024-09-09,220.910004,196.642779,186.918807,207.103208
21,2024-09-10,220.110001,196.410198,186.560991,205.910803
22,2024-09-11,222.660004,196.260898,186.280079,205.616506
23,2024-09-12,222.770004,196.038568,186.648401,205.461134
24,2024-09-13,222.5,195.836933,185.987235,206.38847
25,2024-09-16,216.320007,195.616907,186.305886,204.986933
26,2024-09-17,216.789993,195.391729,185.441691,205.368353
27,2024-09-18,220.690002,195.263089,185.223739,205.5618
28,2024-09-19,228.869995,195.074076,185.529833,205.359513


In [21]:

# Evaluation metrics for validation set
rmse_3 = round(np.sqrt(mean_squared_error(validation_data_3['y'], validation_data_3['yhat'])), 2)
mae_3 = round(mean_absolute_error(validation_data_3['y'], validation_data_3['yhat']), 2)
mape_3 = round((1 - np.mean(np.abs((validation_data_3['y'] - validation_data_3['yhat']) / validation_data_3['y'])))*100, 2)
mda_3 = round(np.mean((np.sign(validation_data_3['y'].diff()) == np.sign(validation_data_3['yhat'].diff())).astype(int)) * 100, 2)

# Print evaluation metrics
print(f"RMSE of Validation: {rmse_3}")
print(f"MAE of Validation: {mae_3}")
print(f"Accuracy(1-MAPE) of Validation: {mape_3:.2f}%")
print(f"MDA of Validation: {mda_3:.2f}%")

# Plot the actual vs forecasted values for the validation period
def plot_forecast_vs_actual(validation_data_3):
    fig = go.Figure()

    # Add the actual closing prices to the plot
    fig.add_trace(go.Scatter(x=validation_data_3['ds'], 
                             y=validation_data_3['y'], 
                             mode='lines', 
                             name='Actual',
                             line=dict(color='red')))

    # Add the forecasted values (yhat) to the plot
    fig.add_trace(go.Scatter(x=validation_data_3['ds'], 
                             y=validation_data_3['yhat'], 
                             mode='lines', 
                             name='Forecast',
                             line=dict(color='blue', dash='dash')))

    # Set the layout of the plot
    fig.update_layout(title='Forecast vs Actuals for Validation Period',
                      xaxis_title='Date',
                      yaxis_title='Close Price',
                      legend_title='Legend')

    # Display the plot
    fig.show()

# Call the function to plot the forecast vs actual values
plot_forecast_vs_actual(validation_data_3)


RMSE of Validation: 27.11
MAE of Validation: 26.87
Accuracy(1-MAPE) of Validation: 87.99%
MDA of Validation: 48.28%


### Rolling Window Method

In [22]:

# Define window size
window_size = 1440  # choose right Rolling window for training


# Train Prophet model on the rolling window data
def train_prophet(train_df):
    model = Prophet()
    return model.fit(train_df)

# Function for rolling window prediction
def rolling_window_forecast(data, window_size, validation_size):
    rolling_predictions = pd.DataFrame()
    
    for i in range(validation_size):
        # Define rolling window range
        train_end = len(data) - validation_size + i
        train_start = train_end - window_size
        train_data = data.iloc[train_start:train_end]
        
        # Train the model on the current window
        model = train_prophet(train_data)
        
        # Forecast the next day
        future = pd.DataFrame({'ds': [data['ds'].iloc[train_end]]})
        forecast = model.predict(future)
        
        # Store predictions
        rolling_predictions = pd.concat([rolling_predictions, forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']]])
        
        print(f"Rolling Window {i+1}/{validation_size} | Train Start: {train_start}, Train End: {train_end}")
        
    return rolling_predictions



In [23]:
# Apply rolling window forecasting
rolling_predictions = rolling_window_forecast(prophet_data, window_size, validation_size)

# Combine rolling predictions with validation data
validation_data = prophet_data[-validation_size:].reset_index(drop=True)
validation_data_4 = validation_data.merge(rolling_predictions, on='ds', how='left')

# Drop NaN values if any
validation_data_4.dropna(inplace=True)
validation_data_4.tail()

21:36:37 - cmdstanpy - INFO - Chain [1] start processing
21:36:38 - cmdstanpy - INFO - Chain [1] done processing
21:36:38 - cmdstanpy - INFO - Chain [1] start processing


Rolling Window 1/30 | Train Start: 2234, Train End: 3674


21:36:39 - cmdstanpy - INFO - Chain [1] done processing
21:36:39 - cmdstanpy - INFO - Chain [1] start processing


Rolling Window 2/30 | Train Start: 2235, Train End: 3675


21:36:40 - cmdstanpy - INFO - Chain [1] done processing
21:36:40 - cmdstanpy - INFO - Chain [1] start processing


Rolling Window 3/30 | Train Start: 2236, Train End: 3676


21:36:41 - cmdstanpy - INFO - Chain [1] done processing
21:36:41 - cmdstanpy - INFO - Chain [1] start processing


Rolling Window 4/30 | Train Start: 2237, Train End: 3677


21:36:42 - cmdstanpy - INFO - Chain [1] done processing
21:36:42 - cmdstanpy - INFO - Chain [1] start processing


Rolling Window 5/30 | Train Start: 2238, Train End: 3678


21:36:43 - cmdstanpy - INFO - Chain [1] done processing
21:36:43 - cmdstanpy - INFO - Chain [1] start processing


Rolling Window 6/30 | Train Start: 2239, Train End: 3679


21:36:44 - cmdstanpy - INFO - Chain [1] done processing
21:36:44 - cmdstanpy - INFO - Chain [1] start processing


Rolling Window 7/30 | Train Start: 2240, Train End: 3680


21:36:45 - cmdstanpy - INFO - Chain [1] done processing
21:36:45 - cmdstanpy - INFO - Chain [1] start processing


Rolling Window 8/30 | Train Start: 2241, Train End: 3681


21:36:46 - cmdstanpy - INFO - Chain [1] done processing
21:36:46 - cmdstanpy - INFO - Chain [1] start processing


Rolling Window 9/30 | Train Start: 2242, Train End: 3682


21:36:47 - cmdstanpy - INFO - Chain [1] done processing
21:36:47 - cmdstanpy - INFO - Chain [1] start processing


Rolling Window 10/30 | Train Start: 2243, Train End: 3683


21:36:48 - cmdstanpy - INFO - Chain [1] done processing
21:36:48 - cmdstanpy - INFO - Chain [1] start processing


Rolling Window 11/30 | Train Start: 2244, Train End: 3684


21:36:49 - cmdstanpy - INFO - Chain [1] done processing
21:36:49 - cmdstanpy - INFO - Chain [1] start processing


Rolling Window 12/30 | Train Start: 2245, Train End: 3685


21:36:50 - cmdstanpy - INFO - Chain [1] done processing
21:36:50 - cmdstanpy - INFO - Chain [1] start processing


Rolling Window 13/30 | Train Start: 2246, Train End: 3686


21:36:51 - cmdstanpy - INFO - Chain [1] done processing
21:36:51 - cmdstanpy - INFO - Chain [1] start processing


Rolling Window 14/30 | Train Start: 2247, Train End: 3687


21:36:52 - cmdstanpy - INFO - Chain [1] done processing
21:36:52 - cmdstanpy - INFO - Chain [1] start processing


Rolling Window 15/30 | Train Start: 2248, Train End: 3688


21:36:53 - cmdstanpy - INFO - Chain [1] done processing
21:36:53 - cmdstanpy - INFO - Chain [1] start processing


Rolling Window 16/30 | Train Start: 2249, Train End: 3689


21:36:54 - cmdstanpy - INFO - Chain [1] done processing
21:36:54 - cmdstanpy - INFO - Chain [1] start processing


Rolling Window 17/30 | Train Start: 2250, Train End: 3690


21:36:55 - cmdstanpy - INFO - Chain [1] done processing
21:36:55 - cmdstanpy - INFO - Chain [1] start processing


Rolling Window 18/30 | Train Start: 2251, Train End: 3691


21:36:56 - cmdstanpy - INFO - Chain [1] done processing
21:36:56 - cmdstanpy - INFO - Chain [1] start processing


Rolling Window 19/30 | Train Start: 2252, Train End: 3692


21:36:57 - cmdstanpy - INFO - Chain [1] done processing
21:36:57 - cmdstanpy - INFO - Chain [1] start processing


Rolling Window 20/30 | Train Start: 2253, Train End: 3693


21:36:58 - cmdstanpy - INFO - Chain [1] done processing
21:36:58 - cmdstanpy - INFO - Chain [1] start processing


Rolling Window 21/30 | Train Start: 2254, Train End: 3694


21:36:59 - cmdstanpy - INFO - Chain [1] done processing
21:36:59 - cmdstanpy - INFO - Chain [1] start processing


Rolling Window 22/30 | Train Start: 2255, Train End: 3695


21:37:00 - cmdstanpy - INFO - Chain [1] done processing
21:37:00 - cmdstanpy - INFO - Chain [1] start processing


Rolling Window 23/30 | Train Start: 2256, Train End: 3696


21:37:01 - cmdstanpy - INFO - Chain [1] done processing
21:37:01 - cmdstanpy - INFO - Chain [1] start processing


Rolling Window 24/30 | Train Start: 2257, Train End: 3697


21:37:02 - cmdstanpy - INFO - Chain [1] done processing
21:37:02 - cmdstanpy - INFO - Chain [1] start processing


Rolling Window 25/30 | Train Start: 2258, Train End: 3698


21:37:03 - cmdstanpy - INFO - Chain [1] done processing
21:37:03 - cmdstanpy - INFO - Chain [1] start processing


Rolling Window 26/30 | Train Start: 2259, Train End: 3699


21:37:04 - cmdstanpy - INFO - Chain [1] done processing
21:37:04 - cmdstanpy - INFO - Chain [1] start processing


Rolling Window 27/30 | Train Start: 2260, Train End: 3700


21:37:05 - cmdstanpy - INFO - Chain [1] done processing
21:37:05 - cmdstanpy - INFO - Chain [1] start processing


Rolling Window 28/30 | Train Start: 2261, Train End: 3701


21:37:06 - cmdstanpy - INFO - Chain [1] done processing
21:37:06 - cmdstanpy - INFO - Chain [1] start processing


Rolling Window 29/30 | Train Start: 2262, Train End: 3702


21:37:07 - cmdstanpy - INFO - Chain [1] done processing


Rolling Window 30/30 | Train Start: 2263, Train End: 3703


Unnamed: 0,ds,y,yhat,yhat_lower,yhat_upper
25,2024-09-16,216.320007,208.584136,198.15165,219.181967
26,2024-09-17,216.789993,207.863348,197.824294,218.354912
27,2024-09-18,220.690002,207.590663,196.89117,217.156272
28,2024-09-19,228.869995,207.495277,197.109874,218.063079
29,2024-09-20,228.199997,207.692123,198.30377,217.413248


In [24]:


# Evaluation metrics for validation set
rmse_4 = round(np.sqrt(mean_squared_error(validation_data_4['y'], validation_data_4['yhat'])), 2)
mae_4 = round(mean_absolute_error(validation_data_4['y'], validation_data_4['yhat']), 2)
mape_4 = round((1 - np.mean(np.abs((validation_data_4['y'] - validation_data_4['yhat']) / validation_data_4['y'])))*100, 2)
mda_4 = round(np.mean((np.sign(validation_data_4['y'].diff()) == np.sign(validation_data_4['yhat'].diff())).astype(int)) * 100, 2)

# Print evaluation metrics
print(f"RMSE of Validation: {rmse_4}")
print(f"MAE of Validation: {mae_4}")
print(f"Accuracy(1-MAPE) of Validation: {mape_4:.2f}%")
print(f"MDA of Validation: {mda_4:.2f}%")

# Plot the actual vs forecasted values for the validation period
def plot_forecast_vs_actual(validation_data_4):
    fig = go.Figure()

    # Add the actual closing prices to the plot
    fig.add_trace(go.Scatter(x=validation_data_4['ds'], 
                             y=validation_data_4['y'], 
                             mode='lines', 
                             name='Actual',
                             line=dict(color='red')))

    # Add the forecasted values (yhat) to the plot
    fig.add_trace(go.Scatter(x=validation_data_4['ds'], 
                             y=validation_data_4['yhat'], 
                             mode='lines', 
                             name='Forecast',
                             line=dict(color='blue', dash='dash')))

    # Set the layout of the plot
    fig.update_layout(title='Forecast vs Actuals for Validation Period',
                      xaxis_title='Date',
                      yaxis_title='Close Price',
                      legend_title='Legend')

    # Display the plot
    fig.show()

# Call the function to plot the forecast vs actual values
plot_forecast_vs_actual(validation_data_4)


RMSE of Validation: 12.78
MAE of Validation: 12.33
Accuracy(1-MAPE) of Validation: 94.50%
MDA of Validation: 46.67%


#### Evaluation Metrics from Simple Model

In [25]:

# RMSE of Validation: 26.77
# MAE of Validation: 26.29
# Accuracy(1-MAPE) of Validation: 88.17%
# MDA of Validation: 56.67%

#### Evaluation Metrics from Recursive Model

In [26]:

# RMSE of Validation: 26.83
# MAE of Validation: 26.33
# Accuracy(1-MAPE) of Validation: 88.16%
# MDA of Validation: 55.17%

#### Evaluation Metrics from Direct-Recursive Model

In [27]:

# RMSE of Validation: 26.83
# MAE of Validation: 26.33
# Accuracy(1-MAPE) of Validation: 88.16%
# MDA of Validation: 55.17%


#### Evaluation Metrics from Rolling Window Method

In [28]:
# RMSE of Validation: 11.83
# MAE of Validation: 11.16
# Accuracy(1-MAPE) of Validation: 94.99%
# MDA of Validation: 53.33% 

- Time Series Data & Basic Modeling techniques- https://medium.com/@datagyani/how-to-analyze-time-series-data-dbb1567ffc0d
- GitHUB link- Different Forecasting Methods on Stock Price Data- https://github.com/Ashu2360/datagyani/blob/0a1bc07d574c214f42384b43cdb1b8a9cb0dbe24/stock_price_forecast_prophet.ipynb

Let me know in the comments if you're interested in a video on extensive testing techniques for forecasting models and how to generate future forecasts using various inference methods.

#### Topic of upcoming videos 

<strong>How to optimize Prophet model output using Hyperparameters and regressors extracted from the series? How can we use regressors like hyperparameter for forecasting? </strong>

<center>**************************** <em> It will definitely boost your model performance drastically. </em>   ****************************


## <center><strong> Don't Forget to Like, Subscribe and share 