# **Research Project (Honours)**

###### *By Mahlatsi Malise Mashilo (202215639)*

### **Forecasting Car Sales in South Africa Using Google Search Data with Post-hoc Explainable AI**
##### *Results Analysis Notebook*

## 1. Libraries

In [1]:
import os
import pandas as pd, numpy as np, seaborn as sns
sns.set_style("darkgrid")
import matplotlib.pyplot as plt, seaborn as sns, plotly.express as px
import plotly.graph_objects as go
import warnings
warnings.filterwarnings("ignore")

## 2. Importing Final Datasts

In [19]:
def load_forecast_results(base_dir="forecast_results", summary_folder="results_summary"):
    """
    Load all Excel result files and their sheets from the given directory.
    
    Returns:
    --------
    results_dict : dict
        {
            "sales_comparison_tables": {
                "Summary": <DataFrame>,
                "RMSE": <DataFrame>,
                ...
            },
            "sales_model_results": <DataFrame>,
            "volume_comparison_tables": {
                "Summary": <DataFrame>,
                ...
            },
            "volume_model_results": <DataFrame>
        }
    """
    summary_path = os.path.join(base_dir, summary_folder)
    results_dict = {}

    # Sanity check
    if not os.path.exists(summary_path):
        raise FileNotFoundError(f"Path not found: {summary_path}")

    # Loop through all Excel files
    for file_name in os.listdir(summary_path):
        if file_name.endswith(".xlsx"):
            file_path = os.path.join(summary_path, file_name)
            file_key = file_name.replace(".xlsx", "")

            # Detect if it‚Äôs a comparison table (multiple sheets) or single sheet
            try:
                xls = pd.ExcelFile(file_path)
                sheet_names = xls.sheet_names

                if len(sheet_names) > 1:
                    # Multiple sheets ‚Üí store in nested dict
                    results_dict[file_key] = {
                        sheet: pd.read_excel(xls, sheet_name=sheet, skiprows=2)
                        for sheet in sheet_names
                    }
                else:
                    # Single sheet file
                    results_dict[file_key] = pd.read_excel(file_path)

                print(f"‚úÖ Loaded '{file_name}' with {len(sheet_names)} sheet(s).")

            except Exception as e:
                print(f"‚ö†Ô∏è Failed to load {file_name}: {e}")

    print("\nüìÇ Loaded files:")
    for k, v in results_dict.items():
        if isinstance(v, dict):
            print(f"  - {k}: {len(v)} sheets ({', '.join(v.keys())})")
        else:
            print(f"  - {k}: 1 sheet (single DataFrame)")

    return results_dict


# ==== Example usage ====
results = load_forecast_results()

# Access examples:
sales_summary = results["sales_comparison_tables"]["Summary"]
volume_summary = results["volume_comparison_tables"]["Summary"]
sales_raw = results["sales_model_results"]
volume_raw = results["volume_model_results"]

print("\nüßæ Example shapes:")
print("Sales Summary:", sales_summary.shape)
print("Volume Summary:", volume_summary.shape)
print("Volume Raw Results:", volume_raw.shape)
print("Sales Raw Results:", sales_raw.shape)


‚úÖ Loaded 'sales_comparison_tables.xlsx' with 7 sheet(s).
‚úÖ Loaded 'sales_model_results.xlsx' with 1 sheet(s).
‚úÖ Loaded 'volume_comparison_tables.xlsx' with 7 sheet(s).
‚úÖ Loaded 'volume_model_results.xlsx' with 1 sheet(s).

üìÇ Loaded files:
  - sales_comparison_tables: 7 sheets (Summary, RMSE, MAE, MAPE, R2, AIC, BIC)
  - sales_model_results: 1 sheet (single DataFrame)
  - volume_comparison_tables: 7 sheets (Summary, RMSE, MAE, MAPE, R2, AIC, BIC)
  - volume_model_results: 1 sheet (single DataFrame)

üßæ Example shapes:
Sales Summary: (6, 3)
Volume Summary: (6, 3)
Volume Raw Results: (24, 7)
Sales Raw Results: (24, 7)


In [18]:
volume_summary

Unnamed: 0,Metric,Avg Exog Improvement (%),Avg EEMD Improvement (%)
0,RMSE,0.46,-12.04
1,MAE,-0.44,-11.47
2,MAPE,0.87,-8.94
3,R2,-6.62,-421.99
4,AIC,11.97,4.81
5,BIC,23.8,16.73
