# <center><strong> Welcome to Data Gyani 

## Advanced Forecasting using Facebook's Prophet

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Forecasting Comparison Table</title>
    <style>
        body {
            font-family: Arial, sans-serif;
            margin: 0;
            padding: 20px;
            background-color: #f4f4f9;
        }
        h1 {
            text-align: center;
            color: #333;
        }
        table {
            width: 80%;
            margin: 0 auto;
            border-collapse: collapse;
            background-color: #fff;
            box-shadow: 0 2px 8px rgba(0, 0, 0, 0.1);
        }
        th, td {
            padding: 12px 15px;
            text-align: center;
            border: 1px solid #ccc;
        }
        th {
            background-color: #4CAF50;
            color: white;
        }
        tr:nth-child(even) {
            background-color: #f2f2f2;
        }
        tr:hover {
            background-color: #ddd;
        }
        td {
            color: #333;
        }
    </style>
</head>
<body>

<h1>Comparison: Recursive vs Rolling (Sliding) Window Forecasting</h1>

<table>
    <thead>
        <tr>
            <th>Aspect</th>
            <th>Recursive Forecasting</th>
            <th>Rolling (Sliding) Window Forecasting</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>Model Training</td>
            <td>Trained once on the entire dataset</td>
            <td>Retrained at each step with the most recent data (window)</td>
        </tr>
        <tr>
            <td>Error Propagation</td>
            <td>High – errors can accumulate with each recursive step</td>
            <td>Low – uses actual observations for each new prediction</td>
        </tr>
        <tr>
            <td>Computational Cost</td>
            <td>Low – only trained once</td>
            <td>High – retraining required at each step</td>
        </tr>
        <tr>
            <td>Adaptability</td>
            <td>Does not adapt to changing trends in data</td>
            <td>Adapts to recent trends and seasonality</td>
        </tr>
        <tr>
            <td>Forecast Horizon</td>
            <td>Suitable for short horizons</td>
            <td>Suitable for both short and long horizons</td>
        </tr>
        <tr>
            <td>Complexity</td>
            <td>Simpler to implement</td>
            <td>More complex due to frequent retraining</td>
        </tr>
        <tr>
            <td>Window Size</td>
            <td>Not applicable</td>
            <td>Critical to set the right window size</td>
        </tr>
    </tbody>
</table>

</body>
</html>


In [1]:
# !pip install yfinance
# !pip install prophet


In [1]:
import yfinance as yf
import pandas as pd
import numpy as np
import datetime
import os
from prophet import Prophet
from sklearn.metrics import mean_squared_error, mean_absolute_error
import plotly.graph_objs as go
from plotly.subplots import make_subplots
from prophet.diagnostics import cross_validation, performance_metrics
from sklearn.metrics import mean_squared_error, mean_absolute_percentage_error
import plotly.graph_objects as go
from sklearn.model_selection import ParameterGrid
from statsmodels.stats.outliers_influence import variance_inflation_factor
import logging
np.random.seed(42)
logging.getLogger("cmdstanpy").disabled = True #  turn 'cmdstanpy' logs off
# Configure logging to suppress Prophet warnings
#logging.getLogger('cmdstanpy').setLevel(logging.CRITICAL)
logging.getLogger('prophet').disabled = True #  turn 'prophet' logs off
import warnings

warnings.filterwarnings("ignore")


In [2]:
# Define the stock symbol, start date, and end date
stock_symbol = 'AAPL'  # Using Apple share prices Just for example
start_date = '2010-01-01'
end_date = datetime.datetime.now().date()

# Fetch the stock data
stock_data = yf.download(stock_symbol, start=start_date, end=end_date)
stock_data = stock_data.reset_index()
# Display the data
stock_data.head()

[*********************100%***********************]  1 of 1 completed


Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,2010-01-04,7.6225,7.660714,7.585,7.643214,6.454505,493729600
1,2010-01-05,7.664286,7.699643,7.616071,7.656429,6.465666,601904800
2,2010-01-06,7.656429,7.686786,7.526786,7.534643,6.362819,552160000
3,2010-01-07,7.5625,7.571429,7.466071,7.520714,6.351058,477131200
4,2010-01-08,7.510714,7.571429,7.466429,7.570714,6.393282,447610800


In [3]:
stock_data.tail()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
3699,2024-09-16,216.539993,217.220001,213.919998,216.320007,216.320007,59357400
3700,2024-09-17,215.75,216.899994,214.5,216.789993,216.789993,45519300
3701,2024-09-18,217.550003,222.710007,217.539993,220.690002,220.690002,59894900
3702,2024-09-19,224.990005,229.820007,224.630005,228.869995,228.869995,66781300
3703,2024-09-20,229.970001,233.089996,227.619995,228.199997,228.199997,287134033


#### 2. Split data into train and validation

In [4]:
close=['Close']

forecast_period = 15  # Number of future periods to forecast
validation_period= 15
window_size=30

In [5]:
# Load and preprocess data
#The function removes rows with any missing values. It then iterates through the specified variables,
#creating a dictionary where each key-value pair corresponds to a DataFrame for a specific variable formatted for Prophet.

def load_and_preprocess_data(df, vars_list):
    df = df.dropna()
    return {var: df[['Date', var]].rename(columns={'Date': 'ds', var: 'y'}).dropna() for var in vars_list}


# Remove outliers and replace with mean of last 5 months
def remove_outliers(df):
    df['z_score'] = (df['y'] - df['y'].mean()) / df['y'].std()
    outliers = df[np.abs(df['z_score']) > 2.5]
    if len(outliers) > 0:
        print(f"Removed {len(outliers)} outliers.")
    else:
        print("No outliers found.")
    for index in outliers.index:
        df.loc[index, 'y'] = df['y'].iloc[max(0, index-5):index].mean()
    df = df.drop(columns=['z_score'])
    return df


In [6]:

# Split data into training and validation sets
def split_data(df, validation_size):
    train_size = len(df) - validation_size
    train = df.iloc[:train_size].copy()
    valid = df.iloc[train_size:].copy()
    return train, valid

def calculate_vif(df):
    vif_data = pd.DataFrame()
    vif_data["feature"] = df.columns
    vif_data["VIF"] = [variance_inflation_factor(df.values, i) for i in range(len(df.columns))]
    return vif_data


In [7]:
# Plot Final Forecast plot with extended historical data
def plot_final_forecast(train_df, forecast, var):
    fig = go.Figure()

    # Historical Data
    fig.add_trace(go.Scatter(x=train_df['ds'], y=train_df['y'], mode='lines', name='Historical Data'))

    # Determine the start of forecast period by finding the first date after the historical data
    last_hist_date = train_df['ds'].iloc[-1]
    forecast_subset = forecast[forecast['ds'] > last_hist_date]

    # Forecast Data 
    forecast_period = forecast_subset['ds']
    forecast_yhat = forecast_subset['yhat']
    forecast_yhat_lower = forecast_subset['yhat_lower']
    forecast_yhat_upper = forecast_subset['yhat_upper']

    fig.add_trace(go.Scatter(x=forecast_period, y=forecast_yhat, mode='lines', name='Forecast'))
    fig.add_trace(go.Scatter(x=forecast_period, y=forecast_yhat_lower, mode='lines', line=dict(color='gray'), showlegend=False))
    fig.add_trace(go.Scatter(x=forecast_period, y=forecast_yhat_upper, fill='tonexty', mode='lines', line=dict(color='gray'), name='Confidence Interval'))

    fig.update_layout(title=f"Final Forecast as Extension of Historical Data for {var}", xaxis_title='Date', yaxis_title='Value')
    fig.show()


In [8]:
# Save outputs
def save_outputs(var, forecast, val_forecast):
    validation_folder = 'Validation'
    os.makedirs(validation_folder, exist_ok=True)
    forecast_folder = 'final_forecast'
    os.makedirs(forecast_folder, exist_ok=True)
    forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].to_csv(os.path.join(forecast_folder, f'{var}_final_forecast.csv'), index=False)
    val_forecast.to_csv(os.path.join(validation_folder, f'{var}_validation_forecast.csv'), index=False)


In [16]:

# Train Prophet model with specified parameters
def train_prophet(train_df, **params):
    """
    Train a Prophet model using the provided training dataframe and parameters.
    """
    model = Prophet(**params)
    return model.fit(train_df)

# Perform rolling or sliding window forecasts
def make_rolling_predictions(model, train_df, forecast_period, window_size):
    """
    Use a rolling (sliding) window approach for forecasting. Train the model iteratively,
    sliding the window after each forecast period.
    """
    predictions = pd.DataFrame()

    for i in range(forecast_period):
        # Use only the most recent window_size data points for training
        rolling_train_df = train_df.iloc[max(0, len(train_df) - window_size):].copy()

        # Fit the model on the rolling window data
        model = train_prophet(rolling_train_df)

        # Create future dataframe for prediction
        future = model.make_future_dataframe(periods=1, freq='B', include_history=False)
        forecast = model.predict(future)
        
        # Store the forecasted results
        predictions = pd.concat([predictions, forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']]])

        # Add the next predicted point to the original dataframe (to slide the window)
        next_point = pd.DataFrame({
            'ds': [future['ds'].iloc[0]],
            'y': [forecast['yhat'].iloc[0]]
        })
        train_df = pd.concat([train_df, next_point]).reset_index(drop=True)

    return predictions

# Validate the model using rolling window forecasting
def validate_model_rolling(train_df, valid_df, window_size):
    """
    Validate the model using a rolling (sliding) window approach and calculate error metrics.
    """
    predictions = pd.DataFrame()

    for i in range(len(valid_df)):
        # Use a rolling window from training set
        rolling_train_df = train_df.iloc[max(0, len(train_df) - window_size):].copy()

        # Train model on the rolling window data
        model = train_prophet(rolling_train_df)

        # Create future dataframe for prediction
        future = pd.DataFrame({'ds': [valid_df['ds'].iloc[i]]})
        forecast = model.predict(future)

        # Store the forecasted results
        predictions = pd.concat([predictions, forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']]])

        # Slide the window by adding the true observation to training data
        next_point = pd.DataFrame({
            'ds': [valid_df['ds'].iloc[i]],
            'y': [valid_df['y'].iloc[i]]
        })
        train_df = pd.concat([train_df, next_point]).reset_index(drop=True)

    # Align and calculate metrics
    predictions = predictions[['ds', 'yhat']].set_index('ds')
    valid_df = valid_df[['ds', 'y']].set_index('ds')

    val_pred = valid_df[['y']].join(predictions[['yhat']], how='inner')
    val_pred = val_pred.reset_index()

    # Compute metrics
    rmse = round(np.sqrt(mean_squared_error(val_pred['y'], val_pred['yhat'])), 3)
    mape = round((1 - mean_absolute_percentage_error(val_pred['y'], val_pred['yhat'])) * 100, 2)
    mda = round(np.mean(np.sign(val_pred['y'].diff()) == np.sign(val_pred['yhat'].diff())) * 100, 2)

    return val_pred, rmse, mape, mda

# Find optimal hyperparameters and window size using sliding window validation
def find_optimal_params(df, validation_period, min_window_size, max_window_size):
    """
    Perform grid search to find the best hyperparameters using a rolling window approach.
    Also, search for the optimal window size to avoid overfitting.
    """
    best_rmse = float('inf')
    best_mda = 0
    best_params = None
    best_window_size = min_window_size

    param_grid = {

    }

    grid = ParameterGrid(param_grid)

    for window_size in range(min_window_size, max_window_size + 1,10):
        for params in grid:
            temp_df = df.copy()
            train_df, valid_df = split_data(temp_df, validation_period)

            _, rmse, _, mda = validate_model_rolling(train_df, valid_df, window_size)

            # Optimize by balancing between minimizing RMSE 
            if rmse < best_rmse:
                best_rmse = rmse
                best_params = params
                best_window_size = window_size

    print(f"Best Parameters: {best_params}, Best Window Size: {best_window_size}")
    return best_params, best_window_size

# Forecast using rolling window approach with optimal window size
def forecast_complete_data_rolling(data, vars_list, validation_period, forecast_period, min_window_size, max_window_size):
    """
    Train, validate, and forecast for multiple variables using a rolling window approach in Prophet.
    """
    results = {}
    for var in vars_list:
        df = data[var]
        df = df.dropna()

        if len(df) < min_window_size + 1:
            print(f"Not enough data to train model for {var}.")
            continue

        best_params, best_window_size = find_optimal_params(df, validation_period, min_window_size, max_window_size)

        train_df, valid_df = split_data(df, validation_period)
        val_forecast, rmse, mape, mda = validate_model_rolling(train_df, valid_df, best_window_size)

        combined_df = pd.concat([train_df, valid_df])
        final_model = train_prophet(combined_df, **best_params)

        forecast = make_rolling_predictions(final_model, combined_df, forecast_period, best_window_size)

        results[var] = {
            'Forecast': forecast,
            'Validation Forecast': val_forecast,
            'RMSE': rmse,
            'MAPE': mape,
            'MDA': mda
        }
    return results

# Load and preprocess stock data
datasets = load_and_preprocess_data(stock_data[['Date'] + close], close)

# Perform rolling forecast
results_stock_price = forecast_complete_data_rolling(datasets, close, validation_period, forecast_period, min_window_size=30, max_window_size=60)

# Print metrics for each variable
for var in close:
    print('--------------------------------------')
    print(f'VAR: {var}')
    print(f"RMSE: {results_stock_price[var]['RMSE']}")
    print(f"Accuracy (1 - MAPE): {results_stock_price[var]['MAPE']}")
    print(f"MDA: {results_stock_price[var]['MDA']}")


Best Parameters: {}, Best Window Size: 30
--------------------------------------
VAR: Close
RMSE: 4.612
Accuracy (1 - MAPE): 98.3
MDA: 40.0


In [17]:
results_stock_price[var]['Validation Forecast']

Unnamed: 0,ds,y,yhat
0,2024-08-30,229.0,230.980178
1,2024-09-03,222.770004,230.891904
2,2024-09-04,220.850006,227.534214
3,2024-09-05,222.380005,222.885568
4,2024-09-06,220.820007,222.94433
5,2024-09-09,220.910004,217.462353
6,2024-09-10,220.110001,218.232288
7,2024-09-11,222.660004,218.265919
8,2024-09-12,222.770004,220.73806
9,2024-09-13,222.5,222.935891


In [18]:
results_stock_price[var]['Forecast']

Unnamed: 0,ds,yhat,yhat_lower,yhat_upper
0,2024-09-23,221.640224,218.839029,224.36068
0,2024-09-24,222.860632,220.301307,225.354632
0,2024-09-25,223.53087,220.996219,226.247969
0,2024-09-26,226.744008,224.351337,229.073928
0,2024-09-27,227.0995,224.507827,229.401032
0,2024-09-30,224.968754,222.535028,227.6452
0,2024-10-01,225.677504,223.147019,227.984788
0,2024-10-02,226.743367,224.172875,229.134199
0,2024-10-03,229.731816,227.297644,232.010336
0,2024-10-04,230.232484,227.994425,232.226742


#### <strong>What can you look forward to in the upcoming videos?</strong> 

- Advanced constrained forecasting techniques for more accurate demand predictions
- Expanded content driven by LLM/Generative AI
- Hands-on experiment video on using AI/ML based forecasting to optimize equity portfolio returns. 


This is going to be huge...So stay tuned...


## <center><strong> Don't Forget to Like, Subscribe and share 

### <center>YouTube Video</center>

<center><iframe width="560" height="315" src="https://www.youtube.com/embed/1HQ09xgsUYw" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe></center>
