# <center><strong> Welcome to Data Gyani 

## Advanced Forecasting using Facebook's Prophet

### What are we learning?

<strong>How to optimize Prophet model output using Hyperparameters and regressors extracted from the series? How can we use regressors like hyperparameter for forecasting? </strong>

In [1]:
# !pip install yfinance
# !pip install prophet


In [1]:
import yfinance as yf
import pandas as pd
import numpy as np
import datetime
import os
from prophet import Prophet
from sklearn.metrics import mean_squared_error
import plotly.graph_objs as go
from sklearn.metrics import mean_squared_error, mean_absolute_percentage_error
from sklearn.model_selection import ParameterGrid
from statsmodels.stats.outliers_influence import variance_inflation_factor
import logging
np.random.seed(42)
logging.getLogger("cmdstanpy").disabled = True #  turn 'cmdstanpy' logs off
# Configure logging to suppress Prophet warnings
#logging.getLogger('cmdstanpy').setLevel(logging.CRITICAL)
logging.getLogger('prophet').disabled = True #  turn 'prophet' logs off
import warnings

warnings.filterwarnings("ignore")


#### <strong>Areas where it can be used </strong>
<html>
  <body>
    <ul>
      <li><strong>Demand forecasting:</strong> Predict future sales or product demand to optimize inventory management and production planning.</li>
      <li><strong>Stock price prediction:</strong> Forecast stock market trends and price movements for better investment strategies.</li>
      <li><strong>Commodities Forecasting: </strong> Predicting the price and consumption trends of commodities, helping in better decision-making for trading and supply chain management.</li>
      <li><strong>Resource allocation:</strong> Plan staffing or material requirements based on predicted demand or workload.</li>
      <li><strong>Weather forecasting:</strong> Predict temperature, rainfall, or other weather patterns for agricultural or business planning.</li>
      <li><strong>Website traffic forecasting:</strong> Predict website or app usage for capacity planning and infrastructure management.</li>
      <li><strong>Energy consumption forecasting:</strong> Estimate future energy demand for utilities to optimize supply and minimize costs.</li>
      <li><strong>Supply chain optimization:</strong> Forecast supply needs to streamline procurement and avoid stock shortages or excess.</li>
      <li><strong>Healthcare demand forecasting:</strong> Predict patient inflow to manage staffing, beds, and resources in hospitals.</li>
      <li><strong>Tourism forecasting:</strong> Anticipate tourist visits and demand for services in different regions or seasons.</li>
      <li><strong>Finance and budgeting:</strong> Forecast financial metrics like revenue, expenses, or cash flow for more accurate financial planning.</li>
      <li><strong>Marketing:</strong> Predict future trends in customer behavior, helping to tailor marketing campaigns and promotions.</li>
    </ul>
  </body>
</html>


In [3]:
# Define the stock symbol, start date, and end date
stock_symbol = 'AAPL'  # Using Apple share prices Just for example
start_date = '2010-01-01'
end_date = datetime.datetime.now().date()

# Fetch the stock data
stock_data = yf.download(stock_symbol, start=start_date, end=end_date)
stock_data = stock_data.reset_index()
# Display the data
stock_data.head()

[*********************100%***********************]  1 of 1 completed


Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,2010-01-04,7.6225,7.660714,7.585,7.643214,6.454505,493729600
1,2010-01-05,7.664286,7.699643,7.616071,7.656429,6.465664,601904800
2,2010-01-06,7.656429,7.686786,7.526786,7.534643,6.36282,552160000
3,2010-01-07,7.5625,7.571429,7.466071,7.520714,6.351055,477131200
4,2010-01-08,7.510714,7.571429,7.466429,7.570714,6.393282,447610800


In [4]:
stock_data.tail()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
3699,2024-09-16,216.539993,217.220001,213.919998,216.320007,216.320007,59357400
3700,2024-09-17,215.75,216.899994,214.5,216.789993,216.789993,45519300
3701,2024-09-18,217.550003,222.710007,217.539993,220.690002,220.690002,59894900
3702,2024-09-19,224.990005,229.820007,224.630005,228.869995,228.869995,66781300
3703,2024-09-20,229.970001,233.089996,227.619995,228.199997,228.199997,287134033


#### Declare global variables

In [5]:
close=['Close']

forecast_period = 15  # Number of future periods to forecast
validation_period= 15


In [6]:
# Load and preprocess data
#The function removes rows with any missing values. It then iterates through the specified variables,
#creating a dictionary where each key-value pair corresponds to a DataFrame for a specific variable formatted for Prophet.

def load_and_preprocess_data(df, vars_list):
    df = df.dropna()
    return {var: df[['Date', var]].rename(columns={'Date': 'ds', var: 'y'}).dropna() for var in vars_list}


# Remove outliers and replace with mean of last 5 months
def remove_outliers(df):
    df['z_score'] = (df['y'] - df['y'].mean()) / df['y'].std()
    outliers = df[np.abs(df['z_score']) > 2.5]
    if len(outliers) > 0:
        print(f"Removed {len(outliers)} outliers.")
    else:
        print("No outliers found.")
    for index in outliers.index:
        df.loc[index, 'y'] = df['y'].iloc[max(0, index-5):index].mean()
    df = df.drop(columns=['z_score'])
    return df


In [7]:

# Split data into training and validation sets
def split_data(df, validation_size):
    train_size = len(df) - validation_size
    train = df.iloc[:train_size].copy()
    valid = df.iloc[train_size:].copy()
    return train, valid

def calculate_vif(df):
    vif_data = pd.DataFrame()
    vif_data["feature"] = df.columns
    vif_data["VIF"] = [variance_inflation_factor(df.values, i) for i in range(len(df.columns))]
    return vif_data


In [8]:
# Plot Final Forecast plot with extended historical data
def plot_final_forecast(train_df, forecast, var):
    fig = go.Figure()

    # Historical Data
    fig.add_trace(go.Scatter(x=train_df['ds'], y=train_df['y'], mode='lines', name='Historical Data'))

    # Determine the start of forecast period by finding the first date after the historical data
    last_hist_date = train_df['ds'].iloc[-1]
    forecast_subset = forecast[forecast['ds'] > last_hist_date]

    # Forecast Data 
    forecast_period = forecast_subset['ds']
    forecast_yhat = forecast_subset['yhat']
    forecast_yhat_lower = forecast_subset['yhat_lower']
    forecast_yhat_upper = forecast_subset['yhat_upper']

    fig.add_trace(go.Scatter(x=forecast_period, y=forecast_yhat, mode='lines', name='Forecast'))
    fig.add_trace(go.Scatter(x=forecast_period, y=forecast_yhat_lower, mode='lines', line=dict(color='gray'), showlegend=False))
    fig.add_trace(go.Scatter(x=forecast_period, y=forecast_yhat_upper, fill='tonexty', mode='lines', line=dict(color='gray'), name='Confidence Interval'))

    fig.update_layout(title=f"Final Forecast as Extension of Historical Data for {var}", xaxis_title='Date', yaxis_title='Value')
    fig.show()


In [9]:
# Save outputs
def save_outputs(var, forecast, val_forecast):
    validation_folder = 'Validation'
    os.makedirs(validation_folder, exist_ok=True)
    forecast_folder = 'final_forecast'
    os.makedirs(forecast_folder, exist_ok=True)
    forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].to_csv(os.path.join(forecast_folder, f'{var}_final_forecast.csv'), index=False)
    val_forecast.to_csv(os.path.join(validation_folder, f'{var}_validation_forecast.csv'), index=False)


### Model Type: 1- Simple Model without hyperparameter tuning

In [11]:
# Train Prophet model without hyperparameter tuning. I am not passing anything in parameter's grid
def train_prophet(train_df, **params):
    """
    Train a Prophet model using the provided training dataframe and parameters.
    """
    model = Prophet(**params)
    return model.fit(train_df)

# Make recursive predictions with the model
def make_predictions(model, df, forecast_period):
    """
    Use the trained model to make recursive forecasts for the specified period.
    """
    # Create future dataframe for prediction
    future = model.make_future_dataframe(periods=1, freq='B', include_history=False)
    future = pd.DataFrame(future, columns=['ds'])
    
    # DataFrame to store all predictions
    predictions = pd.DataFrame()

    # Recursive forecasting loop
    for i in range(forecast_period):
        # Generate forecast
        forecast = model.predict(future)
        predictions = pd.concat([predictions, forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']]])

        # Update the input with the predicted value for the next iteration
        next_point = pd.DataFrame({
            'ds': [future['ds'].iloc[0]],
            'y': [forecast['yhat'].iloc[0]]
        })
        
        # Add the next predicted point to the original dataframe
        df = pd.concat([df, next_point]).reset_index(drop=True)
        
        # Update future for the next period
        future = model.make_future_dataframe(periods=i+2, freq='B', include_history=False)
        future = future.tail(1)
        future = pd.DataFrame(future, columns=['ds'])

    return predictions

# Validate the model and calculate metrics
def validate_model(model, df, valid_df, validation_period):
    """
    Validate the model on the validation set and compute error metrics.
    """
    # Create future dataframe for validation predictions
    future = model.make_future_dataframe(periods=1, freq='B', include_history=False)
    future = pd.DataFrame(future, columns=['ds'])
    
    # DataFrame to store predictions
    predictions = pd.DataFrame()

    # Recursive forecasting loop for validation
    for i in range(validation_period):
        # Generate forecast
        forecast = model.predict(future)
        predictions = pd.concat([predictions, forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']]])

        # Update the input with the predicted value for the next iteration
        next_point = pd.DataFrame({
            'ds': [future['ds'].iloc[0]],
            'y': [forecast['yhat'].iloc[0]]
        })
        
        # Add the next predicted point to the original dataframe
        df = pd.concat([df, next_point]).reset_index(drop=True)
        
        # Update future for the next period
        future = model.make_future_dataframe(periods=i+2, freq='B', include_history=False)
        future = future.tail(1)

    # Align and calculate metrics
    predictions = predictions[['ds', 'yhat']].set_index('ds')
    valid_df = valid_df[['ds', 'y']].set_index('ds')
    
    # Join validation and predictions for comparison
    val_pred = valid_df[['y']].join(predictions[['yhat']], how='inner')
    val_pred = val_pred.reset_index()

    # Compute metrics
    rmse = round(np.sqrt(mean_squared_error(val_pred['y'], val_pred['yhat'])), 3)
    mape = round((1 - mean_absolute_percentage_error(val_pred['y'], val_pred['yhat'])) * 100, 2)
    mda = round(np.mean(np.sign(val_pred['y'].diff()) == np.sign(val_pred['yhat'].diff())) * 100, 2)

    return val_pred, rmse, mape, mda

# Find optimal hyperparameters for Prophet model
def find_optimal_params(df, validation_period):
    """
    Perform grid search to find the best hyperparameters for the Prophet model.
    """
    best_rmse = float('inf')
    best_params = None
    
    # Keeping it empty for first case
    param_grid = {
  
    }
    
    grid = ParameterGrid(param_grid)

    # Grid search for best parameters
    for params in grid:
        temp_df = df.copy()
        train_df, valid_df = split_data(temp_df, validation_period)

        model = train_prophet(train_df, **params)
        _, rmse, _, _ = validate_model(model, train_df, valid_df, validation_period)
        
        # Update the best parameters if a better RMSE is found
        if rmse < best_rmse:
            best_rmse = rmse
            best_params = params

    print(f"Best Parameters: {best_params}")
    return best_params

# Forecast with the optimized Prophet model
def forecast_complete_data(data, vars_list, validation_period, forecast_period):
    """
    Train, validate, and forecast for multiple variables using Prophet model.
    """
    results = {}
    for var in vars_list:
        df = data[var]
        df = df.dropna()

        if len(df) < 2:
            print(f"Not enough data to train model for {var}.")
            continue

        # Find best hyperparameters
        best_params = find_optimal_params(df, validation_period)

        # Split data into training and validation sets
        train_df, valid_df = split_data(df, validation_period)
        
        # Train model on the training set
        model = train_prophet(train_df, **best_params)
        
        # Validate model
        val_forecast, rmse, mape, mda = validate_model(model, train_df, valid_df, validation_period)
        
        # Combine training and validation sets
        combined_df = pd.concat([train_df, valid_df])

        # Refit model on the complete dataset
        final_model = train_prophet(combined_df, **best_params)

        # Make predictions for the forecast period
        forecast = make_predictions(final_model, combined_df, forecast_period)

        # Save results for each variable
        results[var] = {
            'Forecast': forecast,
            'Validation Forecast': val_forecast,
            'RMSE': rmse,
            'MAPE': mape,
            'MDA': mda
        }
    return results

# Load and preprocess the data
datasets = load_and_preprocess_data(stock_data[['Date'] + close], close)

# Perform forecasting on the stock prices
results_stock_price = forecast_complete_data(datasets, close, validation_period, forecast_period)

# Print out metrics for each variable
for var in close:
    print('--------------------------------------')
    print('VAR: ', var)
    print("RMSE : ", results_stock_price[var]['RMSE'])
    print("Accuracy (1 - MAPE) : ", results_stock_price[var]['MAPE'])
    print("MDA : ", results_stock_price[var]['MDA'])


Best Parameters: {}
--------------------------------------
VAR:  Close
RMSE :  23.856
Accuracy (1 - MAPE) :  89.38
MDA :  42.86


In [12]:
results_stock_price[var]['Validation Forecast']

Unnamed: 0,ds,y,yhat
0,2024-08-30,229.0,200.283949
1,2024-09-03,222.770004,199.794185
2,2024-09-04,220.850006,199.59638
3,2024-09-05,222.380005,199.322646
4,2024-09-06,220.820007,199.049602
5,2024-09-09,220.910004,198.628055
6,2024-09-10,220.110001,198.311318
7,2024-09-11,222.660004,198.099192
8,2024-09-12,222.770004,197.83006
9,2024-09-13,222.5,197.579827


In [13]:
results_stock_price[var]['Forecast']

Unnamed: 0,ds,yhat,yhat_lower,yhat_upper
0,2024-09-23,199.292224,189.202802,208.759168
0,2024-09-24,199.077435,188.14161,208.959973
0,2024-09-25,198.976953,188.598493,209.483061
0,2024-09-26,198.833419,188.082738,209.318216
0,2024-09-27,198.733114,187.79687,208.726675
0,2024-09-30,198.776272,188.118519,209.017116
0,2024-10-01,198.664335,188.45689,209.151363
0,2024-10-02,198.664797,188.212074,209.430153
0,2024-10-03,198.619202,188.471592,208.808625
0,2024-10-04,198.61285,188.256548,208.223029


## Model Type: 2 - Model with Hyperparameter Tuning

In [19]:
# Train Prophet model with hyperparameter tuning
def train_prophet(train_df, add_monthly=True, add_quarterly=True, **params):
    """
    Train a Prophet model using the provided training dataframe and parameters.
    """
    model = Prophet(**params)
    # Add monthly and quarterly seasonality if specified
    if add_monthly:
        model.add_seasonality(name='monthly', period=30.5, fourier_order=5)
    if add_quarterly:
        model.add_seasonality(name='quarterly', period=91.25, fourier_order=5)

    return model.fit(train_df)




# Make recursive predictions with the model
def make_predictions(model, df, forecast_period):
    """
    Use the trained model to make recursive forecasts for the specified period.
    """
    # Create future dataframe for prediction
    future = model.make_future_dataframe(periods=1, freq='B', include_history=False)
    future = pd.DataFrame(future, columns=['ds'])
    
    # DataFrame to store all predictions
    predictions = pd.DataFrame()

    # Recursive forecasting loop
    for i in range(forecast_period):
        # Generate forecast
        forecast = model.predict(future)
        predictions = pd.concat([predictions, forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']]])

        # Update the input with the predicted value for the next iteration
        next_point = pd.DataFrame({
            'ds': [future['ds'].iloc[0]],
            'y': [forecast['yhat'].iloc[0]]
        })
        
        # Add the next predicted point to the original dataframe
        df = pd.concat([df, next_point]).reset_index(drop=True)
        
        # Update future for the next period
        future = model.make_future_dataframe(periods=i+2, freq='B', include_history=False)
        future = future.tail(1)
        future = pd.DataFrame(future, columns=['ds'])

    return predictions

# Validate the model and calculate metrics
def validate_model(model, df, valid_df, validation_period):
    """
    Validate the model on the validation set and compute error metrics.
    """
    # Create future dataframe for validation predictions
    future = model.make_future_dataframe(periods=1, freq='B', include_history=False)
    future = pd.DataFrame(future, columns=['ds'])
    
    # DataFrame to store predictions
    predictions = pd.DataFrame()

    # Recursive forecasting loop for validation
    for i in range(validation_period):
        # Generate forecast
        forecast = model.predict(future)
        predictions = pd.concat([predictions, forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']]])

        # Update the input with the predicted value for the next iteration
        next_point = pd.DataFrame({
            'ds': [future['ds'].iloc[0]],
            'y': [forecast['yhat'].iloc[0]]
        })
        
        # Add the next predicted point to the original dataframe
        df = pd.concat([df, next_point]).reset_index(drop=True)
        
        # Update future for the next period
        future = model.make_future_dataframe(periods=i+2, freq='B', include_history=False)
        future = future.tail(1)

    # Align and calculate metrics
    predictions = predictions[['ds', 'yhat']].set_index('ds')
    valid_df = valid_df[['ds', 'y']].set_index('ds')
    
    # Join validation and predictions for comparison
    val_pred = valid_df[['y']].join(predictions[['yhat']], how='inner')
    val_pred = val_pred.reset_index()

    # Compute metrics
    rmse = round(np.sqrt(mean_squared_error(val_pred['y'], val_pred['yhat'])), 3)
    mape = round((1 - mean_absolute_percentage_error(val_pred['y'], val_pred['yhat'])) * 100, 2)
    mda = round(np.mean(np.sign(val_pred['y'].diff()) == np.sign(val_pred['yhat'].diff())) * 100, 2)

    return val_pred, rmse, mape, mda

# Find optimal hyperparameters for Prophet model
def find_optimal_params(df, validation_period):
    """
    Perform grid search to find the best hyperparameters for the Prophet model.
    """
    best_rmse = float('inf')
    best_params = None
    
    # Keeping it empty for first case
    param_grid = {
        'changepoint_prior_scale': [0.01, 0.05, 0.1],
        'seasonality_prior_scale': [1.0, 10.0, 15.0],
        'seasonality_mode':  ['additive', 'multiplicative'],
        'yearly_seasonality': [True, False],
        'daily_seasonality': [True, False],
        'add_monthly': [True, False],
        'add_quarterly': [True, False],
    }
    
    grid = ParameterGrid(param_grid)

    # Grid search for best parameters
    for params in grid:
        temp_df = df.copy()
        train_df, valid_df = split_data(temp_df, validation_period)

        model = train_prophet(train_df, **params)
        _, rmse, _, _ = validate_model(model, train_df, valid_df, validation_period)
        
        # Update the best parameters if a better RMSE is found
        if rmse < best_rmse:
            best_rmse = rmse
            best_params = params

    print(f"Best Parameters: {best_params}")
    return best_params

# Forecast with the optimized Prophet model
def forecast_complete_data(data, vars_list, validation_period, forecast_period):
    """
    Train, validate, and forecast for multiple variables using Prophet model.
    """
    results = {}
    for var in vars_list:
        df = data[var]
        df = df.dropna()

        if len(df) < 2:
            print(f"Not enough data to train model for {var}.")
            continue

        # Find best hyperparameters
        best_params = find_optimal_params(df, validation_period)

        # Split data into training and validation sets
        train_df, valid_df = split_data(df, validation_period)
        
        # Train model on the training set
        model = train_prophet(train_df, **best_params)
        
        # Validate model
        val_forecast, rmse, mape, mda = validate_model(model, train_df, valid_df, validation_period)
        
        # Combine training and validation sets
        combined_df = pd.concat([train_df, valid_df])

        # Refit model on the complete dataset
        final_model = train_prophet(combined_df, **best_params)

        # Make predictions for the forecast period
        forecast = make_predictions(final_model, combined_df, forecast_period)

        # Save results for each variable
        results[var] = {
            'Forecast': forecast,
            'Validation Forecast': val_forecast,
            'RMSE': rmse,
            'MAPE': mape,
            'MDA': mda
        }
    return results

# Load and preprocess the data
datasets = load_and_preprocess_data(stock_data[['Date'] + close], close)

# Perform forecasting on the stock prices
results_stock_price = forecast_complete_data(datasets, close, validation_period, forecast_period)

# Print out metrics for each variable
for var in close:
    print('--------------------------------------')
    print('VAR: ', var)
    print("RMSE : ", results_stock_price[var]['RMSE'])
    print("Accuracy (1 - MAPE) : ", results_stock_price[var]['MAPE'])
    print("MDA : ", results_stock_price[var]['MDA'])


Best Parameters: {'add_monthly': True, 'add_quarterly': True, 'changepoint_prior_scale': 0.01, 'daily_seasonality': False, 'seasonality_mode': 'multiplicative', 'seasonality_prior_scale': 15.0, 'yearly_seasonality': True}
--------------------------------------
VAR:  Close
RMSE :  21.475
Accuracy (1 - MAPE) :  90.59
MDA :  42.86


In [20]:
results_stock_price[var]['Validation Forecast']

Unnamed: 0,ds,y,yhat
0,2024-08-30,229.0,209.592242
1,2024-09-03,222.770004,207.396907
2,2024-09-04,220.850006,206.427474
3,2024-09-05,222.380005,205.016081
4,2024-09-06,220.820007,203.733533
5,2024-09-09,220.910004,201.888904
6,2024-09-10,220.110001,200.626975
7,2024-09-11,222.660004,199.656972
8,2024-09-12,222.770004,198.630559
9,2024-09-13,222.5,198.037659


In [21]:
results_stock_price[var]['Forecast']

Unnamed: 0,ds,yhat,yhat_lower,yhat_upper
0,2024-09-23,197.893343,187.939882,207.049196
0,2024-09-24,196.21756,186.313078,206.191193
0,2024-09-25,195.250067,185.928283,205.409557
0,2024-09-26,194.455919,185.244691,204.740165
0,2024-09-27,194.069728,184.159861,204.409425
0,2024-09-30,192.521873,182.194582,202.290973
0,2024-10-01,191.568051,181.715456,200.304876
0,2024-10-02,191.502178,181.551177,201.339447
0,2024-10-03,191.617502,182.141623,202.180334
0,2024-10-04,191.960353,181.997194,202.033461


### Model Type: 3- Model with Hyperparameter and Automated Regressor Tuning

In [22]:


# Create rolling and expanding features
def create_rolling_expanding_features(df, window=3):
    df[f'rolling_mean_{window}'] = df['y'].rolling(window=window).mean().shift(1)
    df[f'rolling_std_{window}'] = df['y'].rolling(window=window).std().shift(1)
    df[f'expanding_std'] = df['y'].expanding().std().shift(1)

    return df.dropna()

# Apply differencing to make the series stationary
def apply_differencing(df, order=1):
    df['y_diff'] = df['y'].diff(periods=order)
    return df.dropna()



# Train Prophet model with hyperparameter tuning
def train_prophet(train_df, add_monthly=True, add_quarterly=True, add_rolling_mean=True, add_rolling_std=True, add_expanding_std=True, **params):
    """
    Train a Prophet model using the provided training dataframe and parameters.
    """
    model = Prophet(**params)

    if add_rolling_mean:
        model.add_regressor('rolling_mean_3')
    if add_rolling_std:
        model.add_regressor('rolling_std_3')   
    if add_expanding_std:
        model.add_regressor('expanding_std')
    if add_monthly:
        model.add_seasonality(name='monthly', period=30.5, fourier_order=5)
    if add_quarterly:
        model.add_seasonality(name='quarterly', period=91.25, fourier_order=5)

    return model.fit(train_df)

# Make predictions while considering the stock market calendar (exclude weekends and holidays)
def make_predictions(model, df, forecast_period):
     # Generate future dates considering business days only 
    future = model.make_future_dataframe(periods=1, freq='B', include_history=False)
    future = pd.DataFrame(future, columns=['ds'])

    predictions = pd.DataFrame()

    for i in range(forecast_period):
        future['rolling_mean_3'] = df['y'].rolling(3).mean().iloc[-1]
        future['rolling_std_3'] = df['y'].rolling(3).std().iloc[-1]
        future['expanding_std'] = df['y'].expanding().std().iloc[-1]

        forecast = model.predict(future)
        predictions = pd.concat([predictions, forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']]])
        

        next_point = pd.DataFrame({
            'ds': [future['ds'].iloc[0]],
            'y': [forecast['yhat'].iloc[0]]
        })
        df = pd.concat([df, next_point]).reset_index(drop=True)
    
        future = model.make_future_dataframe(periods=i+2, freq='B', include_history=False)
        future = future.tail(1) 

    return predictions

# Validate model and calculate metrics
def validate_model(model, df, valid_df, validation_period):
    future = model.make_future_dataframe(periods=1, freq='B', include_history=False)
    future = pd.DataFrame(future, columns=['ds'])
    predictions = pd.DataFrame()

    for i in range(validation_period):
        future['rolling_mean_3'] = df['y'].rolling(3).mean().iloc[-1]
        future['rolling_std_3'] = df['y'].rolling(3).std().iloc[-1]
        future['expanding_std'] = df['y'].expanding().std().iloc[-1]

        forecast = model.predict(future)
        predictions = pd.concat([predictions, forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']]])

        next_point = pd.DataFrame({
            'ds': [future['ds'].iloc[0]],
            'y': [forecast['yhat'].iloc[0]]
        })
        df = pd.concat([df, next_point]).reset_index(drop=True)
        future = model.make_future_dataframe(periods=i+2, freq='B', include_history=False)
        future = future.tail(1) 

    # Ensure predictions and validation data have matching dates
    predictions = predictions[['ds', 'yhat']].set_index('ds')
    valid_df = valid_df[['ds', 'y']].set_index('ds')

    # Align the indices of the two dataframes
    val_pred = valid_df[['y']].join(predictions[['yhat']], how='inner')
    val_pred= val_pred.reset_index()
    
    # Compute metrics
    rmse = round(np.sqrt(mean_squared_error(val_pred['y'], val_pred['yhat'])), 3)
    mape = round((1 - mean_absolute_percentage_error(val_pred['y'], val_pred['yhat'])) * 100, 2)
    mda = round(np.mean(np.sign(val_pred['y'].diff()) == np.sign(val_pred['yhat'].diff())) * 100, 2)

    return val_pred, rmse, mape, mda

# Find optimal parameters with differencing and feature engineering
def find_optimal_params(df, validation_period):
    best_rmse = float('inf')
    param_grid = {
        'changepoint_prior_scale': [0.01, 0.05, 0.1],
        'seasonality_prior_scale': [1.0, 10.0, 15.0],
        'seasonality_mode':  ['additive', 'multiplicative'],
        'yearly_seasonality': [True, False],
        'add_monthly': [True, False],
        'add_quarterly': [True, False],
        'daily_seasonality':[True, False],
        'add_rolling_mean': [True, False],
        'add_rolling_std':[True, False],
        'add_expanding_std':[True, False]

    }
    grid = ParameterGrid(param_grid)

    for params in grid:
        train_df, valid_df = split_data(df, validation_period)
        train_df = apply_differencing(train_df)

        model = train_prophet(train_df, **params)
        _, rmse, _, _ = validate_model(model, train_df, valid_df, validation_period)
        if rmse < best_rmse:
            best_rmse = rmse
            best_params = params

    print(f"Best Parameters: {best_params}")  # Print best parameters
    return best_params


# Save outputs
def save_outputs(var, forecast, val_forecast):
    validation_folder = 'Validation'
    os.makedirs(validation_folder, exist_ok=True)
    forecast_folder = 'final_forecast'
    os.makedirs(forecast_folder, exist_ok=True)
    forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].to_csv(os.path.join(forecast_folder, f'{var}_final_forecast.csv'), index=False)
    val_forecast.to_csv(os.path.join(validation_folder, f'{var}_validation_forecast.csv'), index=False)

# Forecast complete data with enhanced model
def forecast_complete_data(data, vars_list, validation_period, forecast_period):
    results = {}
    for var in vars_list:
        df = data[var]

        df = create_rolling_expanding_features(df)
        

        df = df.dropna()

        if len(df) < 2:
            print(f"Not enough data to train model for {var}.")
            continue

        best_params = find_optimal_params(df, validation_period)

        train_df, valid_df = split_data(df, validation_period)
        train_df = apply_differencing(train_df)
        valid_df = apply_differencing(valid_df)

        features_df = train_df.drop(columns=['ds', 'y'])
        vif_data = calculate_vif(features_df)
        print(vif_data)

        # Train model on training data
        model = train_prophet(train_df, **best_params)
        val_forecast, rmse, mape,mda = validate_model(model, train_df, valid_df, validation_period)

        # Combine training and validation data
        combined_df = pd.concat([train_df, valid_df])

        # Re-fit model on the complete dataset
        final_model = train_prophet(combined_df, **best_params)

        # Make predictions with the re-fitted model
        forecast = make_predictions(final_model, combined_df, forecast_period)

        # Plot final forecast as an extension of historical data
        plot_final_forecast(combined_df, forecast, var)

        # Save outputs
        # save_outputs(var, forecast, val_forecast)

        results[var] = {
            'Forecast': forecast,
            'Validation Forecast': val_forecast,
            'RMSE': rmse,
            'MAPE': mape,
            'MDA': mda
        }
    return results

# Load and preprocess data
datasets = load_and_preprocess_data(stock_data[['Date'] + close], close)

# Process and forecast
results_stock_price = forecast_complete_data(datasets, close, validation_period, forecast_period)
for var in close:
    print('--------------------------------------')
    print('VAR: ', var)
    print("RMSE : ", results_stock_price[var]['RMSE'])
    print("Acurracy(1- MAPE) : ", results_stock_price[var]['MAPE'])
    print("MDA : ", results_stock_price[var]['MDA'])


Best Parameters: {'add_expanding_std': True, 'add_monthly': False, 'add_quarterly': True, 'add_rolling_mean': True, 'add_rolling_std': True, 'changepoint_prior_scale': 0.1, 'daily_seasonality': True, 'seasonality_mode': 'additive', 'seasonality_prior_scale': 10.0, 'yearly_seasonality': True}
          feature        VIF
0  rolling_mean_3  37.701747
1   rolling_std_3   2.868501
2   expanding_std  34.897069
3          y_diff   1.003137


--------------------------------------
VAR:  Close
RMSE :  4.685
Acurracy(1- MAPE) :  98.1
MDA :  38.46


In [23]:
results_stock_price[var]['Validation Forecast']

Unnamed: 0,ds,y,yhat
0,2024-09-03,222.770004,227.700064
1,2024-09-04,220.850006,226.906599
2,2024-09-05,222.380005,226.438869
3,2024-09-06,220.820007,226.018833
4,2024-09-09,220.910004,225.44801
5,2024-09-10,220.110001,224.970693
6,2024-09-11,222.660004,224.485441
7,2024-09-12,222.770004,223.852008
8,2024-09-13,222.5,223.34625
9,2024-09-16,216.320007,223.103469


In [24]:
results_stock_price[var]['Forecast']

Unnamed: 0,ds,yhat,yhat_lower,yhat_upper
0,2024-09-23,225.6916,223.170816,228.210411
0,2024-09-24,227.265108,225.03029,229.865041
0,2024-09-25,226.74199,224.324326,229.204965
0,2024-09-26,226.130448,223.526961,228.596089
0,2024-09-27,226.254133,223.848569,228.502697
0,2024-09-30,226.115368,223.529598,228.631731
0,2024-10-01,225.959361,223.548703,228.304105
0,2024-10-02,225.958022,223.485063,228.34034
0,2024-10-03,225.777428,223.431099,228.152673
0,2024-10-04,225.687912,223.335671,228.199514


## <center><strong> Don't Forget to Like, Subscribe and share 