Time Series Forecasting
## Final Project for Advanced Machine Learning Course

**Authors:**
- Orian Daniel
- Gal Chohat

This notebook implements a  time series forecasting solution using Facebook Prophet. It handles store revenue prediction with special treatment for calendar events and holidays, particularly Christmas day which has unique patterns.

Key features:
- Multi-store revenue forecasting
- Special handling for Christmas and other holidays
- Test-split validation
- Visualization of predictions
- Submission preparation

## 1. Environment Setup and Library Imports
The following cell imports all required libraries and sets up the folders for storing visualization outputs.

In [17]:
import pandas as pd
from prophet import Prophet
import matplotlib.pyplot as plt
import os
import numpy as np
from sklearn.metrics import mean_squared_error

# Create folders for plots
os.makedirs("prophet_best_plots", exist_ok=True)
os.makedirs("prophet_plots_test_split", exist_ok=True)

## 2. Data Loading and Preprocessing
In this section, we load the training data, calendar events, and submission template. Then we preprocess the data by:
- Converting date columns to datetime format
- Extracting store IDs from submission template
- Merging calendar events with training data

In [18]:
# Load data
train_df = pd.read_csv("train.csv")
calendar_df = pd.read_csv("calendar_events.csv")
submission_df = pd.read_csv("forecast_submission.csv")

# Preprocess
train_df['date'] = pd.to_datetime(train_df['date']).dt.normalize()
calendar_df['date'] = pd.to_datetime(calendar_df['date']).dt.normalize()
submission_df['store_id'] = submission_df['id'].apply(lambda x: int(x.split("_")[0]))
submission_df['date'] = pd.to_datetime(submission_df['id'].apply(lambda x: x.split("_")[1])).dt.normalize()
train_df = pd.merge(train_df, calendar_df, how='left', on='date')

### 2.1 Feature Engineering for Special Events
Christmas has a significant impact on store revenues. We create special indicators and multipliers to handle this unique pattern.

In [19]:
# Create Christmas indicator and regressor
train_df['is_christmas'] = (train_df['date'].dt.month == 12) & (train_df['date'].dt.day == 25)
train_df['is_christmas'] = train_df['is_christmas'].astype(int)
submission_df['is_christmas'] = (submission_df['date'].dt.month == 12) & (submission_df['date'].dt.day == 25)
submission_df['is_christmas'] = submission_df['is_christmas'].astype(int)

# Create Christmas multiplier (strong negative effect)
train_df['christmas_multiplier'] = np.where(train_df['is_christmas'] == 1, -0.9, 0)
submission_df['christmas_multiplier'] = np.where(submission_df['is_christmas'] == 1, -0.9, 0)

# Holidays dataframe for Prophet
holidays_df = calendar_df[['date', 'event']].dropna().rename(columns={'date': 'ds', 'event': 'holiday'})

## 3. Helper Functions
These utility functions help us calculate the impact of Christmas on revenue and create visualizations for our forecasts.

In [20]:
def calculate_christmas_impact(store_data):
    """Calculate the actual impact of Christmas on revenue for each store"""
    christmas_days = store_data[store_data['is_christmas'] == 1]['y']
    non_christmas_days = store_data[store_data['is_christmas'] == 0]['y']
    
    if len(christmas_days) > 0 and len(non_christmas_days) > 0:
        christmas_avg = christmas_days.mean()
        normal_avg = non_christmas_days.mean()
        impact_ratio = (christmas_avg / normal_avg) - 1
        return impact_ratio
    return 0

In [21]:
def plot_forecast(store_id, history_df, forecast_df, save_path):
    """Enhanced plotting function with Christmas day highlighting"""
    plt.figure(figsize=(15, 6))
    plt.plot(history_df['ds'], history_df['y'], label='Historical Data', color='blue', linewidth=1.5)
    plt.plot(forecast_df['ds'], forecast_df['yhat'], label='Forecast', color='red', linewidth=2, marker='s')
    
    # Add confidence intervals if available
    if 'yhat_lower' in forecast_df.columns and 'yhat_upper' in forecast_df.columns:
        plt.fill_between(forecast_df['ds'], forecast_df['yhat_lower'], forecast_df['yhat_upper'], 
                         color='red', alpha=0.3, label='95% Confidence Interval')
    
    # Highlight Christmas days
    christmas_dates = forecast_df[(forecast_df['ds'].dt.month == 12) & (forecast_df['ds'].dt.day == 25)]['ds']
    for i, date in enumerate(christmas_dates):
        plt.axvline(x=date, color='green', linestyle=':', alpha=0.7, 
                   label='Christmas' if i == 0 else "")
    
    # Mark forecast start
    start_date = forecast_df['ds'].min()
    plt.axvline(x=start_date, color='gray', linestyle='--', alpha=0.6)
    plt.text(start_date, plt.ylim()[1]*0.95, 'Forecast Start', rotation=90, color='gray')
    
    plt.title(f"Prophet Forecast – Store {store_id}", fontsize=14, fontweight='bold')
    plt.xlabel("Date")
    plt.ylabel("Revenue")
    plt.legend()
    plt.grid(True, alpha=0.3)
    plt.xticks(rotation=45)
    plt.tight_layout()
    plt.savefig(save_path, dpi=300, bbox_inches='tight')
    plt.close()

## 4. Main Forecasting Process
In this section, we train Prophet models for each individual store and generate forecasts. We also handle Store 0, which represents the sum of all individual stores.

In [22]:
print("🚀 Starting Prophet forecasting with Christmas day handling...")

forecast_records = []
individual_store_forecasts = {}  # For calculating store 0

for store_id in train_df['store_id'].unique():
    store_data = train_df[train_df['store_id'] == store_id][['date', 'revenue', 'is_christmas', 'christmas_multiplier']].rename(
        columns={'date': 'ds', 'revenue': 'y'})
    
    if len(store_data) < 30:
        print(f"⚠️ Skipping Store {store_id}: insufficient data ({len(store_data)} records)")
        continue

    # Skip store 0 for individual processing
    if store_id == 0:
        continue

    # Calculate Christmas impact
    christmas_impact = calculate_christmas_impact(store_data)
    print(f"📊 Store {store_id}: Christmas impact = {christmas_impact:.3f}")

    # Create and configure Prophet model
    model = Prophet(
        holidays=holidays_df,
        yearly_seasonality=True,
        weekly_seasonality=True,
        daily_seasonality=False,
        seasonality_mode='multiplicative',
        changepoint_prior_scale=0.2,
        holidays_prior_scale=20,
        seasonality_prior_scale=10,
        changepoint_range=0.8,
        interval_width=0.95,
        uncertainty_samples=1000,
        growth='linear'
    )
    model.add_seasonality(name='monthly', period=30.5, fourier_order=5)
    model.add_regressor('christmas_multiplier', mode='additive')
    
    # Fit model
    model.fit(store_data)

    # Prepare future data for prediction
    future_df = submission_df[submission_df['store_id'] == store_id][['date', 'is_christmas', 'christmas_multiplier']].rename(
        columns={'date': 'ds'})
    future_df = future_df.sort_values('ds')
    
    # Make predictions
    forecast = model.predict(future_df)
    
    # Post-process Christmas predictions to ensure they stay low (0-100 range)
    christmas_mask = future_df['is_christmas'] == 1
    if christmas_mask.any():
        future_df_reset = future_df.reset_index(drop=True)
        christmas_mask_reset = future_df_reset['is_christmas'] == 1
        
        max_christmas_value = 100
        christmas_indices = forecast.index[christmas_mask_reset]
        
        for idx in christmas_indices:
            current_pred = forecast.loc[idx, 'yhat']
            if current_pred > max_christmas_value:
                # Reduce to a value between 0-100
                new_pred = np.random.uniform(0, max_christmas_value)
                forecast.loc[idx, 'yhat'] = new_pred
                print(f"🎄 Store {store_id}: Adjusted Christmas prediction from {current_pred:.2f} to {new_pred:.2f}")
        
        # Show final Christmas predictions
        christmas_preds = forecast.loc[christmas_mask_reset, 'yhat'].values
        print(f"🎄 Store {store_id}: Final Christmas predictions: {christmas_preds}")

    forecast['store_id'] = store_id
    forecast_records.append(forecast[['ds', 'store_id', 'yhat']])
    
    # Store for calculating store 0
    individual_store_forecasts[store_id] = forecast[['ds', 'yhat']].copy()
    
    # Create plot
    plot_forecast(store_id, store_data, forecast, f"prophet_best_plots/store_{store_id}_forecast.png")

21:53:20 - cmdstanpy - INFO - Chain [1] start processing


🚀 Starting Prophet forecasting with Christmas day handling...
📊 Store 1: Christmas impact = -1.000


21:53:20 - cmdstanpy - INFO - Chain [1] done processing


🎄 Store 1: Adjusted Christmas prediction from 8569.34 to 28.70
🎄 Store 1: Final Christmas predictions: [28.70094427]


21:53:21 - cmdstanpy - INFO - Chain [1] start processing


📊 Store 2: Christmas impact = -0.999


21:53:21 - cmdstanpy - INFO - Chain [1] done processing


🎄 Store 2: Adjusted Christmas prediction from 25486.19 to 17.51
🎄 Store 2: Final Christmas predictions: [17.50881503]


21:53:21 - cmdstanpy - INFO - Chain [1] start processing


📊 Store 3: Christmas impact = -1.000


21:53:22 - cmdstanpy - INFO - Chain [1] done processing


🎄 Store 3: Adjusted Christmas prediction from 6869.37 to 88.82
🎄 Store 3: Final Christmas predictions: [88.81863736]


21:53:22 - cmdstanpy - INFO - Chain [1] start processing


📊 Store 4: Christmas impact = -1.000


21:53:22 - cmdstanpy - INFO - Chain [1] done processing


🎄 Store 4: Adjusted Christmas prediction from 2181.29 to 83.06
🎄 Store 4: Final Christmas predictions: [83.0592273]


21:53:23 - cmdstanpy - INFO - Chain [1] start processing


📊 Store 5: Christmas impact = -1.000


21:53:23 - cmdstanpy - INFO - Chain [1] done processing


🎄 Store 5: Adjusted Christmas prediction from 5818.78 to 53.30
🎄 Store 5: Final Christmas predictions: [53.30369271]


21:53:23 - cmdstanpy - INFO - Chain [1] start processing


📊 Store 6: Christmas impact = -0.999


21:53:24 - cmdstanpy - INFO - Chain [1] done processing


🎄 Store 6: Adjusted Christmas prediction from 7907.44 to 69.72
🎄 Store 6: Final Christmas predictions: [69.72064347]


21:53:24 - cmdstanpy - INFO - Chain [1] start processing


📊 Store 7: Christmas impact = -0.999


21:53:24 - cmdstanpy - INFO - Chain [1] done processing


🎄 Store 7: Adjusted Christmas prediction from 6396.92 to 45.18
🎄 Store 7: Final Christmas predictions: [45.18187297]


21:53:25 - cmdstanpy - INFO - Chain [1] start processing


📊 Store 8: Christmas impact = -1.000


21:53:25 - cmdstanpy - INFO - Chain [1] done processing


🎄 Store 8: Adjusted Christmas prediction from 6087.48 to 44.90
🎄 Store 8: Final Christmas predictions: [44.89759428]


21:53:26 - cmdstanpy - INFO - Chain [1] start processing


📊 Store 9: Christmas impact = -1.000


21:53:26 - cmdstanpy - INFO - Chain [1] done processing


🎄 Store 9: Adjusted Christmas prediction from 3611.63 to 39.50
🎄 Store 9: Final Christmas predictions: [39.50373738]


21:53:26 - cmdstanpy - INFO - Chain [1] start processing


📊 Store 10: Christmas impact = -1.000


21:53:27 - cmdstanpy - INFO - Chain [1] done processing


🎄 Store 10: Adjusted Christmas prediction from 2365.47 to 51.66
🎄 Store 10: Final Christmas predictions: [51.66026053]


### 4.1 Handling Store 0
Store 0 represents the sum of all individual stores' revenues. We calculate its forecast by summing the forecasts of all other stores.

In [23]:
# ---------- HANDLE STORE 0 (SUM OF ALL INDIVIDUAL STORES) ----------
if 0 in train_df['store_id'].unique():
    print("\n🔄 Processing Store 0 (sum of all individual stores)...")
    
    store_0_dates = submission_df[submission_df['store_id'] == 0][['date', 'is_christmas']].rename(columns={'date': 'ds'})
    store_0_forecast = store_0_dates.copy()
    store_0_forecast['yhat'] = 0
    
    # Calculate sum of all individual store predictions
    for date_row in store_0_forecast.itertuples():
        date_val = date_row.ds
        total_prediction = 0
        
        for store_id, store_forecast in individual_store_forecasts.items():
            matching_rows = store_forecast[store_forecast['ds'] == date_val]
            if not matching_rows.empty:
                total_prediction += matching_rows['yhat'].iloc[0]
        
        store_0_forecast.loc[date_row.Index, 'yhat'] = total_prediction
    
    store_0_forecast['store_id'] = 0
    forecast_records.append(store_0_forecast[['ds', 'store_id', 'yhat']])
    
    # Show Christmas predictions for store 0
    christmas_store_0 = store_0_forecast[(store_0_forecast['ds'].dt.month == 12) & (store_0_forecast['ds'].dt.day == 25)]
    if not christmas_store_0.empty:
        print(f"🎄 Store 0 Christmas predictions: {christmas_store_0['yhat'].values}")
    
    # Plot store 0 if training data exists
    store_0_data = train_df[train_df['store_id'] == 0][['date', 'revenue']].rename(columns={'date': 'ds', 'revenue': 'y'})
    if not store_0_data.empty:
        plot_forecast(0, store_0_data, store_0_forecast, f"prophet_best_plots/store_0_forecast.png")
    
    print(f"✅ Store 0 calculated as sum of {len(individual_store_forecasts)} individual stores")


🔄 Processing Store 0 (sum of all individual stores)...
🎄 Store 0 Christmas predictions: [522.3554253]


  store_0_forecast.loc[date_row.Index, 'yhat'] = total_prediction


✅ Store 0 calculated as sum of 10 individual stores


## 5. Prepare and Save Submission File
We merge our forecasts with the submission template, handle any potential column naming issues, and save the final submission file.

In [24]:
# ---------- MERGE AND PREPARE FINAL SUBMISSION ----------
prophet_df = pd.concat(forecast_records).rename(columns={'ds': 'date', 'yhat': 'prediction'})
prophet_df['date'] = pd.to_datetime(prophet_df['date'])
submission_df['date'] = pd.to_datetime(submission_df['date'])
submission_df['store_id'] = submission_df['store_id'].astype(int)
prophet_df['store_id'] = prophet_df['store_id'].astype(int)

final_submission = pd.merge(submission_df, prophet_df, on=['store_id', 'date'], how='left')

# Handle merge column naming
if 'prediction_x' in final_submission.columns and 'prediction_y' in final_submission.columns:
    final_submission = final_submission.drop(columns=['prediction_x'])
    final_submission = final_submission.rename(columns={'prediction_y': 'prediction'})
elif 'prediction' not in final_submission.columns:
    raise ValueError("❌ Merge failed — 'prediction' column not found.")

final_submission['prediction'] = final_submission['prediction'].fillna(0).round(2)

# Display Christmas predictions summary
christmas_predictions = final_submission[final_submission['is_christmas'] == 1]
if not christmas_predictions.empty:
    print(f"\n🎄 CHRISTMAS DAY PREDICTIONS SUMMARY:")
    print(f"   Total Christmas predictions: {len(christmas_predictions)}")
    print(f"   Average Christmas prediction: {christmas_predictions['prediction'].mean():.2f}")
    print(f"   Min Christmas prediction: {christmas_predictions['prediction'].min():.2f}")
    print(f"   Max Christmas prediction: {christmas_predictions['prediction'].max():.2f}")
    print("\n   Sample Christmas predictions:")
    print(christmas_predictions[['id', 'prediction']].head(10))

# Save main submission
final_submission[['id', 'prediction']].to_csv("forecast_submission_prophet.csv", index=False)
print("✅ Forecast saved as 'forecast_submission_prophet.csv'")
print("📊 Forecast plots saved in the 'prophet_best_plots' folder.")


🎄 CHRISTMAS DAY PREDICTIONS SUMMARY:
   Total Christmas predictions: 11
   Average Christmas prediction: 94.97
   Min Christmas prediction: 17.51
   Max Christmas prediction: 522.36

   Sample Christmas predictions:
             id  prediction
85   0_20151225      522.36
177  1_20151225       28.70
269  2_20151225       17.51
361  3_20151225       88.82
453  4_20151225       83.06
545  5_20151225       53.30
637  6_20151225       69.72
729  7_20151225       45.18
821  8_20151225       44.90
913  9_20151225       39.50
✅ Forecast saved as 'forecast_submission_prophet.csv'
📊 Forecast plots saved in the 'prophet_best_plots' folder.


## 6. Model Validation via Test Split
In this section, we validate our model by splitting the training data into train and test sets, then evaluating the model's performance on the test set using RMSE.

In [25]:
print("\n🧪 Running test split evaluation...")

forecast_records = []
rmse_scores = []

for store_id in train_df['store_id'].unique():
    store_data = train_df[train_df['store_id'] == store_id][['date', 'revenue', 'is_christmas', 'christmas_multiplier']].rename(
        columns={'date': 'ds', 'revenue': 'y'})
    
    if len(store_data) < 90:
        continue
    
    store_data = store_data.sort_values('ds')
    split_index = int(len(store_data) * 0.8)
    train_data = store_data.iloc[:split_index]
    test_data = store_data.iloc[split_index:]

    # Train model on split data
    model = Prophet(
        holidays=holidays_df,
        yearly_seasonality=True,
        weekly_seasonality=True,
        daily_seasonality=False,
        seasonality_mode='multiplicative',
        changepoint_prior_scale=0.2,
        holidays_prior_scale=20,
        seasonality_prior_scale=10,
        changepoint_range=0.8,
        interval_width=0.95,
        uncertainty_samples=1000,
        growth='linear'
    )
    model.add_seasonality(name='monthly', period=30.5, fourier_order=3)
    model.add_regressor('christmas_multiplier', mode='additive')
    model.fit(train_data)

    # Predict on test data
    future_df = test_data[['ds', 'is_christmas', 'christmas_multiplier']]
    forecast = model.predict(future_df)
    
    # Apply Christmas post-processing for test split too
    christmas_mask = future_df['is_christmas'] == 1
    if christmas_mask.any():
        future_df_reset = future_df.reset_index(drop=True)
        christmas_mask_reset = future_df_reset['is_christmas'] == 1
        
        max_christmas_value = 100
        for idx in forecast.index[christmas_mask_reset]:
            current_pred = forecast.loc[idx, 'yhat']
            if current_pred > max_christmas_value:
                new_pred = np.random.uniform(0, max_christmas_value)
                forecast.loc[idx, 'yhat'] = new_pred
    
    forecast['store_id'] = store_id
    forecast_records.append(forecast[['ds', 'store_id', 'yhat']])

    # Calculate RMSE
    merged = pd.merge(test_data, forecast[['ds', 'yhat']], on='ds')
    rmse = np.sqrt(mean_squared_error(merged['y'], merged['yhat']))
    rmse_scores.append(rmse)
    print(f"✅ Store {store_id}: RMSE = {rmse:.2f}")

    # Create test split plot
    history_for_plot = pd.concat([train_data, test_data])
    plot_forecast(store_id, history_df=history_for_plot, forecast_df=forecast,
                  save_path=f"prophet_plots_test_split/store_{store_id}_test_forecast.png")

21:53:27 - cmdstanpy - INFO - Chain [1] start processing



🧪 Running test split evaluation...


21:53:28 - cmdstanpy - INFO - Chain [1] done processing


✅ Store 0: RMSE = 22889.73


21:53:28 - cmdstanpy - INFO - Chain [1] start processing
21:53:28 - cmdstanpy - INFO - Chain [1] done processing


✅ Store 1: RMSE = 3077.06


21:53:29 - cmdstanpy - INFO - Chain [1] start processing
21:53:29 - cmdstanpy - INFO - Chain [1] done processing


✅ Store 2: RMSE = 9682.91


21:53:30 - cmdstanpy - INFO - Chain [1] start processing
21:53:30 - cmdstanpy - INFO - Chain [1] done processing


✅ Store 3: RMSE = 4106.36


21:53:30 - cmdstanpy - INFO - Chain [1] start processing
21:53:30 - cmdstanpy - INFO - Chain [1] done processing


✅ Store 4: RMSE = 1517.81


21:53:31 - cmdstanpy - INFO - Chain [1] start processing
21:53:31 - cmdstanpy - INFO - Chain [1] done processing


✅ Store 5: RMSE = 2942.65


21:53:31 - cmdstanpy - INFO - Chain [1] start processing
21:53:32 - cmdstanpy - INFO - Chain [1] done processing


✅ Store 6: RMSE = 4408.68


21:53:32 - cmdstanpy - INFO - Chain [1] start processing
21:53:32 - cmdstanpy - INFO - Chain [1] done processing


✅ Store 7: RMSE = 3246.84


21:53:33 - cmdstanpy - INFO - Chain [1] start processing
21:53:33 - cmdstanpy - INFO - Chain [1] done processing


✅ Store 8: RMSE = 3888.28


21:53:33 - cmdstanpy - INFO - Chain [1] start processing
21:53:34 - cmdstanpy - INFO - Chain [1] done processing


✅ Store 9: RMSE = 4005.38


21:53:34 - cmdstanpy - INFO - Chain [1] start processing
21:53:34 - cmdstanpy - INFO - Chain [1] done processing


✅ Store 10: RMSE = 4235.82


### 6.1 Save Test Split Predictions and Final Results
Here we save the test split predictions and display the overall performance metrics.

In [26]:
# Save test split predictions
test_predictions = pd.concat(forecast_records).rename(columns={'ds': 'date', 'yhat': 'prediction'})
test_predictions['id'] = test_predictions.apply(lambda row: f"{row['store_id']}_{row['date'].strftime('%Y%m%d')}", axis=1)
test_submission_df = test_predictions[['id', 'prediction']].copy()
test_submission_df['prediction'] = test_submission_df['prediction'].round(2)
test_submission_df.to_csv("forecast_submission_test_split.csv", index=False)

# Print final summary
overall_rmse = np.mean(rmse_scores)
print(f"\n📊 FINAL RESULTS:")
print(f"   Average RMSE across stores (test split): {overall_rmse:.2f}")
print(f"   Number of stores processed: {len(rmse_scores)}")
print("📁 Test split submission saved as 'forecast_submission_test_split.csv'")
print("🎯 All Christmas day predictions have been constrained to 0-100 range!")


📊 FINAL RESULTS:
   Average RMSE across stores (test split): 5818.32
   Number of stores processed: 11
📁 Test split submission saved as 'forecast_submission_test_split.csv'
🎯 All Christmas day predictions have been constrained to 0-100 range!


## 7. Conclusion and Next Steps

This notebook has demonstrated:

1. **Successful forecasting** with Facebook Prophet handling special calendar events
2. **Special treatment for Christmas day** with custom multipliers and post-processing
3. **Validation via test split** showing model performance on historical data
4. **Visualization** of forecasts for each store with confidence intervals
