# Walmart Sales Forecasting Pipeline

This notebook runs the complete sales forecasting pipeline, including:
1. Data preprocessing
2. LSTM model training
3. XGBoost model training
4. Ensemble predictions
5. Power BI data export

Make sure you have all required datasets in the `data/` directory before running this notebook.

In [None]:
import os
import sys
import pandas as pd
import numpy as np
from IPython.display import display, HTML

# Add scripts directory to path
sys.path.append('scripts')

# Import our modules
import data_preprocessing
import train_lstm
import train_xgboost
import ensemble_results
import export_powerbi

## 1. Check Required Files

In [None]:
required_files = ['train.csv', 'test.csv', 'features.csv', 'stores.csv']
missing_files = []

for file in required_files:
    if not os.path.exists(f'data/{file}'):
        missing_files.append(file)

if missing_files:
    print("❌ Missing required files:")
    for file in missing_files:
        print(f"  - {file}")
    print("\nPlease add these files to the 'data/' directory before proceeding.")
else:
    print("✅ All required files present!")

## 2. Data Preprocessing

In [None]:
print("Starting data preprocessing...")
data_preprocessing.main()
print("\nPreprocessing completed!")

# Display sample of processed data
clean_train = pd.read_csv('data/clean_train_data.csv')
display(HTML(clean_train.head().to_html()))

## 3. Train LSTM Model

In [None]:
print("Training LSTM model...")
train_lstm.main()

# Display LSTM metrics
lstm_metrics = pd.read_csv('data/lstm_metrics.csv')
display(HTML(lstm_metrics.to_html()))

## 4. Train XGBoost Model

In [None]:
print("Training XGBoost model...")
train_xgboost.main()

# Display XGBoost metrics and feature importance
xgb_metrics = pd.read_csv('data/xgboost_metrics.csv')
feature_importance = pd.read_csv('data/feature_importance.csv')

print("\nXGBoost Metrics:")
display(HTML(xgb_metrics.to_html()))

print("\nTop 10 Important Features:")
display(HTML(feature_importance.head(10).to_html()))

## 5. Create Ensemble Predictions

In [None]:
print("Creating ensemble predictions...")
ensemble_results.main()

# Display ensemble metrics
ensemble_metrics = pd.read_csv('data/ensemble_metrics.csv')
display(HTML(ensemble_metrics.to_html()))

## 6. Export Data for Power BI

In [None]:
print("Exporting data for Power BI...")
export_powerbi.main()

# List exported files
powerbi_files = os.listdir('powerbi')
print("\nExported files for Power BI:")
for file in powerbi_files:
    print(f"  - {file}")

## Pipeline Complete!

The sales forecasting pipeline has finished running. You can now:
1. Check the model metrics in the data/ directory
2. Review the predictions in final_forecast.csv
3. Use the exported files in the powerbi/ directory to create your dashboard

For any issues or questions, please refer to the README.md file.