# Advanced Walmart Forecasting Models Documentation

## Overview

The `AdvancedWalmartForecastingModels` class implements a comprehensive ensemble of state-of-the-art time series forecasting models specifically designed for the Walmart Sales Forecasting Challenge. This system combines deep learning, statistical modeling, and probabilistic approaches to predict weekly sales across multiple stores and departments.

## System Architecture

### Data Pipeline
- **Input**: Historical sales data with features (temperature, fuel price, CPI, unemployment, holidays, markdowns)
- **Processing**: Time-based train/validation split, feature scaling, sequence creation
- **Output**: Multi-step ahead forecasts with uncertainty quantification

### Model Portfolio
The system implements five distinct forecasting approaches:

1. **Temporal Fusion Transformer (TFT) - Advanced**
2. **Ensemble Deep Learning Model**
3. **Neural ODE Model**
4. **State Space Model (SARIMAX)**
5. **Gaussian Process Model**

---

## Model Descriptions

### 1. Temporal Fusion Transformer (TFT) - Advanced

#### What it is
A sophisticated attention-based neural network that combines the best of transformer architectures with time series-specific components for interpretable multi-horizon forecasting.

#### How it's built
```python
# Key Components:
- Variable Selection Network (VSN): Learns feature importance dynamically
- Gated Residual Networks (GRN): Provides non-linear processing with skip connections
- Multi-Head Attention: Captures temporal dependencies
- Encoder-Decoder Architecture: Processes historical context and generates forecasts
```

#### Architecture Flow
1. **Variable Selection**: VSN identifies the most relevant features for each time step
2. **Feature Processing**: GRNs process selected features with gating mechanisms
3. **Temporal Encoding**: LSTM encoder processes historical sequences
4. **Attention Mechanism**: Multi-head attention captures complex temporal patterns
5. **Prediction**: Global pooling + dense layers generate final forecasts

#### Why it works for Walmart
- **Variable Selection**: Automatically identifies which features (temperature, holidays, etc.) matter most for each prediction
- **Attention Mechanisms**: Can focus on relevant historical periods (e.g., same holiday last year)
- **Multi-horizon**: Naturally handles forecasting multiple weeks ahead
- **Interpretability**: Provides attention weights showing which time periods and features drive predictions

#### Strengths
- State-of-the-art performance on time series benchmarks
- Built-in interpretability through attention weights and variable selection
- Handles multiple time series simultaneously
- Robust to missing data and irregular patterns

#### Weaknesses
- Computationally expensive (requires significant GPU resources)
- Many hyperparameters to tune
- Requires substantial training data
- Can overfit on small datasets

---

### 2. Ensemble Deep Learning Model

#### What it is
A hybrid neural network that combines four different deep learning architectures in parallel to capture diverse patterns in the data.

#### How it's built
```python
# Four Parallel Branches:
1. LSTM Branch: Sequential pattern learning
2. GRU Branch: Alternative recurrent processing
3. CNN Branch: Local pattern detection
4. Attention Branch: Global dependency modeling

# Final Architecture:
Combined Features → Dense Layers → Batch Normalization → Prediction
```

#### Architecture Details
- **LSTM Branch**: Two-layer LSTM (64→32 units) for sequence modeling
- **GRU Branch**: Two-layer GRU (64→32 units) as LSTM alternative
- **CNN Branch**: 1D convolutions for local pattern detection
- **Attention Branch**: Multi-head self-attention for global patterns
- **Fusion**: Concatenate all branches → 64→32→1 dense layers

#### Why it works for Walmart
- **Diverse Pattern Capture**: Each branch specializes in different temporal patterns
- **Robustness**: Ensemble approach reduces overfitting and improves generalization
- **Complementary Strengths**: LSTM for long sequences, CNN for local patterns, attention for global dependencies
- **Automatic Feature Learning**: No manual feature engineering required

#### Strengths
- Robust performance across different data patterns
- Combines strengths of multiple architectures
- Good generalization due to ensemble effect
- Automatic feature learning

#### Weaknesses
- High computational complexity
- Many parameters to train
- Requires careful tuning of each branch
- Can be unstable during training

---

### 3. Neural ODE Model

#### What it is
A neural network that models the continuous-time dynamics of the system using Ordinary Differential Equations, treating time series prediction as solving a continuous dynamical system.

#### How it's built
```python
# Core Concept: dx/dt = f(x, t)
# Implementation:
1. LSTM for initial temporal processing
2. Multiple ODE residual blocks simulating continuous dynamics
3. Euler integration: x_{t+1} = x_t + step_size * f(x_t, t)
4. GRU for final temporal aggregation
```

#### Mathematical Foundation
- **ODE Formulation**: Models the rate of change of sales as a function of current state and time
- **Residual Blocks**: Each block represents one integration step
- **Continuous Dynamics**: Captures smooth transitions between time points
- **Regularization**: L1/L2 regularization prevents unstable dynamics

#### Why it works for Walmart
- **Smooth Transitions**: Sales changes are often gradual, matching ODE assumptions
- **Physical Intuition**: Models sales as a continuous process influenced by economic factors
- **Irregular Sampling**: Can handle missing time points naturally
- **Parameter Efficiency**: Fewer parameters than traditional RNNs for similar capacity

#### Strengths
- Theoretically grounded in continuous dynamics
- Memory efficient
- Handles irregular time series naturally
- Can model complex system dynamics

#### Weaknesses
- Difficult to interpret
- Sensitive to integration step size
- Can be unstable during training
- Requires careful regularization

---

### 4. State Space Model (SARIMAX)

#### What it is
A classical econometric model that decomposes time series into trend, seasonal, and error components while incorporating external variables (temperature, fuel prices, etc.).

#### How it's built
```python
# SARIMAX(p,d,q)(P,D,Q,s) Structure:
- p, d, q: Non-seasonal autoregressive, differencing, moving average orders
- P, D, Q, s: Seasonal components with period s
- External variables: Temperature, fuel price, CPI, unemployment, holidays
```

#### Mathematical Components
- **Autoregressive (AR)**: Current value depends on previous values
- **Integrated (I)**: Handles non-stationary series through differencing
- **Moving Average (MA)**: Models error terms from previous periods
- **Seasonal**: Captures weekly/yearly patterns
- **Exogenous**: Incorporates external economic factors

#### Why it works for Walmart
- **Economic Interpretability**: Clear relationship between economic factors and sales
- **Seasonal Patterns**: Explicitly models weekly/yearly seasonality
- **External Factors**: Directly incorporates weather, economic conditions
- **Proven Track Record**: Extensively used in retail forecasting

#### Strengths
- Highly interpretable coefficients
- Well-established statistical theory
- Handles seasonality explicitly
- Uncertainty quantification through confidence intervals
- Fast training and prediction

#### Weaknesses
- Assumes linear relationships
- Requires stationary data
- Limited ability to capture complex non-linear patterns
- Sensitive to outliers
- Manual order selection required

---

### 5. Gaussian Process Model

#### What it is
A non-parametric Bayesian approach that models the distribution over functions, providing both predictions and uncertainty estimates.

#### How it's built
```python
# Key Components:
- Kernel Function: Matérn kernel + White noise kernel
- Prior: Gaussian process prior over functions
- Posterior: Analytical posterior given observations
- Prediction: Mean and variance at new points
```

#### Mathematical Foundation
- **Kernel**: Defines similarity between time points
- **Matérn Kernel**: Flexible kernel for modeling smooth functions
- **White Noise**: Models observation noise
- **Bayesian Inference**: Updates beliefs based on observed data

#### Why it works for Walmart
- **Uncertainty Quantification**: Provides confidence intervals for business decisions
- **Non-parametric**: No assumptions about functional form
- **Flexible**: Can model complex non-linear relationships
- **Small Data**: Works well with limited training data

#### Strengths
- Provides uncertainty estimates
- No assumptions about functional form
- Works well with small datasets
- Principled Bayesian approach
- Can incorporate prior knowledge through kernel design

#### Weaknesses
- Computationally expensive (O(n³) scaling)
- Limited to relatively small datasets
- Kernel selection is crucial
- Poor extrapolation beyond training range
- Sensitive to hyperparameter choices

---

## Model Comparison

### Performance Characteristics

| Model | Complexity | Interpretability | Uncertainty | Scalability | Training Speed |
|-------|------------|------------------|-------------|-------------|----------------|
| TFT Advanced | Very High | Medium | No | High | Slow |
| Ensemble DL | High | Low | No | High | Slow |
| Neural ODE | Medium | Low | No | Medium | Medium |
| SARIMAX | Low | Very High | Yes | Medium | Fast |
| Gaussian Process | Medium | Medium | Yes | Low | Medium |

### Use Case Recommendations

#### When to use TFT Advanced
- **Best for**: High-stakes forecasting where accuracy is paramount
- **Requirements**: Large datasets, GPU resources, interpretability needs
- **Example**: Strategic planning, inventory optimization

#### When to use Ensemble Deep Learning
- **Best for**: Robust performance across diverse data patterns
- **Requirements**: Sufficient training data, computational resources
- **Example**: Automated forecasting systems, multiple product lines

#### When to use Neural ODE
- **Best for**: Systems with smooth, continuous dynamics
- **Requirements**: Understanding of differential equations, irregular time series
- **Example**: Economic modeling, supply chain dynamics

#### When to use SARIMAX
- **Best for**: Baseline models, interpretable results, limited data
- **Requirements**: Domain expertise for model specification
- **Example**: Financial reporting, regulatory compliance

#### When to use Gaussian Process
- **Best for**: Uncertainty-critical decisions, small datasets
- **Requirements**: Computational limits acceptable, need for confidence intervals
- **Example**: Risk assessment, A/B testing, new product launches

---

## Implementation Details

### Data Preprocessing
```python
# Common preprocessing steps:
1. Time-based train/validation split
2. Feature scaling using StandardScaler
3. Sequence creation for neural models
4. Missing value handling
5. Aggregation by store/date for efficiency
```

### Feature Engineering
- **Temporal Features**: Date-based features, lag variables
- **Economic Indicators**: Temperature, fuel price, CPI, unemployment
- **Business Features**: Store size, holiday indicators, markdowns
- **Engineered Features**: Holiday weights, total markdowns

### Validation Strategy
- **Time-based split**: Last 8 weeks for validation
- **Walk-forward validation**: Simulates real-world deployment
- **Metrics**: WMAE (Weighted Mean Absolute Error) matching competition

### Training Configuration
- **Early Stopping**: Prevents overfitting
- **Learning Rate Scheduling**: Adaptive learning rates
- **Regularization**: Dropout, L1/L2 regularization
- **Batch Processing**: Efficient memory usage

---

## Business Impact

### Forecasting Accuracy
- **TFT**: Highest accuracy, suitable for strategic decisions
- **Ensemble**: Consistent performance across different scenarios
- **Neural ODE**: Good for smooth demand patterns
- **SARIMAX**: Reliable baseline with clear interpretation
- **Gaussian Process**: Best when uncertainty quantification is critical

### Computational Requirements
- **Training Time**: SARIMAX < GP < Neural ODE < Ensemble < TFT
- **Memory Usage**: SARIMAX < GP < Neural ODE < TFT < Ensemble
- **Inference Speed**: All models provide fast inference for business use

### Operational Considerations
- **Model Maintenance**: Simpler models require less maintenance
- **Interpretability**: Critical for stakeholder buy-in and regulatory compliance
- **Uncertainty**: Essential for inventory planning and risk management
- **Scalability**: Important for enterprise deployment across thousands of stores

---

## Future Enhancements

### Model Improvements
1. **Hierarchical Forecasting**: Coherent forecasts across store/department hierarchy
2. **Causal Inference**: Incorporating causal relationships between variables
3. **Transfer Learning**: Leveraging knowledge across similar stores
4. **Online Learning**: Continuous model updates with new data

### Technical Enhancements
1. **Model Ensembling**: Combining predictions from multiple models
2. **Hyperparameter Optimization**: Automated tuning using Bayesian optimization
3. **Distributed Training**: Scaling to larger datasets and model complexity
4. **Model Monitoring**: Detecting model degradation and concept drift

### Business Applications
1. **Inventory Optimization**: Using forecasts for stock level decisions
2. **Pricing Strategy**: Dynamic pricing based on demand forecasts
3. **Resource Allocation**: Staff scheduling and supply chain planning
4. **Risk Management**: Scenario planning and stress testing

---

## Conclusion

This advanced forecasting system provides a comprehensive toolkit for retail demand prediction. Each model offers unique strengths:

- **TFT** for maximum accuracy and interpretability
- **Ensemble** for robustness across scenarios
- **Neural ODE** for modeling continuous dynamics
- **SARIMAX** for statistical rigor and interpretability
- **Gaussian Process** for uncertainty quantification

The choice of model depends on specific business requirements, computational constraints, and the level of interpretability needed. In practice, combining multiple models often provides the best results, leveraging the strengths of each approach while mitigating individual weaknesses.

In [2]:
import pandas as pd
import numpy as np
import time
import warnings

warnings.filterwarnings("ignore")

# Core libraries
from sklearn.preprocessing import StandardScaler

# Deep learning
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import (
    LSTM,
    Dense,
    Dropout,
    GRU,
    Input,
    MultiHeadAttention,
    LayerNormalization,
    GlobalAveragePooling1D,
    Conv1D,
    MaxPooling1D,
    Concatenate,
    BatchNormalization,
    Add,
)
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau
from tensorflow.keras.regularizers import l1_l2

# Statistical models
from statsmodels.tsa.statespace.sarimax import SARIMAX

# Advanced time series
try:
    import torch
    import torch.nn as nn
    import torch.optim as optim
    from torch.utils.data import DataLoader, TensorDataset

    TORCH_AVAILABLE = True
except ImportError:
    print("PyTorch not available. Install with: pip install torch")
    TORCH_AVAILABLE = False


class AdvancedWalmartForecastingModels:
    """Advanced forecasting models for Walmart competition including hierarchical and causal approaches"""

    def __init__(self, data):
        self.data = data
        self.models = {}
        self.results = {}
        self.hierarchical_structure = None
        self.causal_graph = None
        self.feature_columns = []
        self.train_data = None
        self.val_data = None

    def prepare_data(self, validation_weeks=8):
        """Prepare data for time series modeling with train/validation split"""
        print("=== PREPARING DATA FOR MODELING ===")

        # Clean and sort data
        self.data_clean = self.data.dropna(subset=["Weekly_Sales"])
        self.data_clean = self.data_clean.sort_values(["Store", "Dept", "Date"])

        # Time-based split
        unique_dates = sorted(self.data_clean["Date"].unique())
        split_date = unique_dates[-validation_weeks]

        self.train_data = self.data_clean[self.data_clean["Date"] < split_date].copy()
        self.val_data = self.data_clean[self.data_clean["Date"] >= split_date].copy()

        # Select features (excluding target and identifier columns)
        exclude_cols = ["Weekly_Sales", "Store", "Dept", "Date"]
        self.feature_columns = [
            col
            for col in self.data_clean.columns
            if col not in exclude_cols and not col.endswith("_scaled")
        ]

        print(f"Data split completed:")
        print(f"  - Training data: {self.train_data.shape}")
        print(f"  - Validation data: {self.val_data.shape}")
        print(f"  - Feature columns: {self.feature_columns}")
        print(
            f"  - Date range: {self.data_clean['Date'].min()} to {self.data_clean['Date'].max()}"
        )
        print(f"  - Split date: {split_date}")

        return self.train_data, self.val_data

    def temporal_fusion_transformer_advanced(self, sequence_length=5, epochs=50):
        """Advanced Temporal Fusion Transformer with full architecture"""
        print("=== TRAINING ADVANCED TEMPORAL FUSION TRANSFORMER ===")
        start_time = time.time()

        try:
            # Variable Selection Network
            class VariableSelectionNetwork(tf.keras.layers.Layer):
                def __init__(self, num_features, hidden_size, dropout_rate=0.1):
                    super(VariableSelectionNetwork, self).__init__()
                    self.num_features = num_features
                    self.hidden_size = hidden_size

                    self.linear1 = Dense(hidden_size, activation="relu")
                    self.linear2 = Dense(num_features, activation="softmax")
                    self.dropout = Dropout(dropout_rate)

                def call(self, inputs, training=None):
                    # inputs: [batch_size, time_steps, num_features]
                    batch_size = tf.shape(inputs)[0]
                    time_steps = tf.shape(inputs)[1]

                    # Flatten time dimension for processing
                    flattened = tf.reshape(inputs, [-1, self.num_features])

                    # Variable selection
                    x = self.linear1(flattened)
                    x = self.dropout(x, training=training)
                    weights = self.linear2(x)

                    # Reshape back
                    weights = tf.reshape(
                        weights, [batch_size, time_steps, self.num_features]
                    )

                    # Apply variable selection
                    selected = inputs * weights
                    return selected, weights

            # Gated Residual Network
            class GatedResidualNetwork(tf.keras.layers.Layer):
                def __init__(self, hidden_size, dropout_rate=0.1):
                    super(GatedResidualNetwork, self).__init__()
                    self.hidden_size = hidden_size

                    self.linear1 = Dense(hidden_size, activation="relu")
                    self.linear2 = Dense(hidden_size)
                    self.gate = Dense(hidden_size, activation="sigmoid")
                    self.dropout = Dropout(dropout_rate)
                    self.layer_norm = LayerNormalization()

                def call(self, inputs, training=None):
                    x = self.linear1(inputs)
                    x = self.dropout(x, training=training)
                    x = self.linear2(x)

                    gate = self.gate(inputs)

                    # Gated residual connection
                    output = gate * x + (1 - gate) * inputs
                    output = self.layer_norm(output)

                    return output

            # Prepare data for TFT
            agg_train = (
                self.train_data.groupby(["Store", "Date"])
                .agg(
                    {
                        "Weekly_Sales": "sum",
                        "Temperature": "mean",
                        "Fuel_Price": "mean",
                        "CPI": "mean",
                        "Unemployment": "mean",
                        "IsHoliday": "max",
                        "Total_MarkDown": "sum",
                        "Holiday_Weight": "max",
                        "Size": "first",
                    }
                )
                .reset_index()
            )

            agg_val = (
                self.val_data.groupby(["Store", "Date"])
                .agg(
                    {
                        "Weekly_Sales": "sum",
                        "Temperature": "mean",
                        "Fuel_Price": "mean",
                        "CPI": "mean",
                        "Unemployment": "mean",
                        "IsHoliday": "max",
                        "Total_MarkDown": "sum",
                        "Holiday_Weight": "max",
                        "Size": "first",
                    }
                )
                .reset_index()
            )

            # Features
            continuous_features = [
                "Temperature",
                "Fuel_Price",
                "CPI",
                "Unemployment",
                "Total_MarkDown",
                "Size",
            ]
            categorical_features = ["IsHoliday"]

            # Create sequences
            def create_tft_sequences(data, seq_len):
                X_cont, X_cat, y, weights = [], [], [], []

                for store in data["Store"].unique():
                    store_data = data[data["Store"] == store].sort_values("Date")

                    if len(store_data) < seq_len + 1:
                        continue

                    for i in range(seq_len, len(store_data)):
                        X_cont.append(
                            store_data[continuous_features].iloc[i - seq_len : i].values
                        )
                        X_cat.append(
                            store_data[categorical_features]
                            .iloc[i - seq_len : i]
                            .values
                        )
                        y.append(store_data["Weekly_Sales"].iloc[i])
                        weights.append(store_data["Holiday_Weight"].iloc[i])

                return np.array(X_cont), np.array(X_cat), np.array(y), np.array(weights)

            X_cont_train, X_cat_train, y_train, train_weights = create_tft_sequences(
                agg_train, sequence_length
            )
            X_cont_val, X_cat_val, y_val, val_weights = create_tft_sequences(
                agg_val, sequence_length
            )

            if len(X_cont_train) == 0 or len(X_cont_val) == 0:
                print("Insufficient data for TFT")
                return None, None

            # Scale continuous features
            scaler_cont = StandardScaler()
            scaler_y = StandardScaler()

            X_cont_train_scaled = scaler_cont.fit_transform(
                X_cont_train.reshape(-1, X_cont_train.shape[-1])
            ).reshape(X_cont_train.shape)

            X_cont_val_scaled = scaler_cont.transform(
                X_cont_val.reshape(-1, X_cont_val.shape[-1])
            ).reshape(X_cont_val.shape)

            y_train_scaled = scaler_y.fit_transform(y_train.reshape(-1, 1)).flatten()

            # Build TFT model
            hidden_size = 64
            num_cont_features = len(continuous_features)
            num_cat_features = len(categorical_features)

            # Inputs
            cont_inputs = Input(
                shape=(sequence_length, num_cont_features), name="continuous"
            )
            cat_inputs = Input(
                shape=(sequence_length, num_cat_features), name="categorical"
            )

            # Embed categorical features
            cat_embedded = Dense(8, activation="relu")(cat_inputs)

            # Combine continuous and categorical
            combined = Concatenate(axis=-1)([cont_inputs, cat_embedded])

            # Variable Selection Network
            vsn = VariableSelectionNetwork(num_cont_features + 8, hidden_size)
            selected_features, variable_weights = vsn(combined)

            # Encoder-Decoder with attention
            # Encoder
            encoder_lstm = LSTM(hidden_size, return_sequences=True, return_state=True)
            encoder_outputs, state_h, state_c = encoder_lstm(selected_features)

            # Self-attention
            attention = MultiHeadAttention(num_heads=4, key_dim=hidden_size // 4)
            attention_output = attention(encoder_outputs, encoder_outputs)

            # Gated Residual Network
            grn = GatedResidualNetwork(hidden_size)
            grn_output = grn(attention_output)

            # Global pooling and output
            pooled = GlobalAveragePooling1D()(grn_output)

            # Final prediction layers
            dense1 = Dense(hidden_size, activation="relu")(pooled)
            dropout1 = Dropout(0.3)(dense1)
            dense2 = Dense(hidden_size // 2, activation="relu")(dropout1)
            dropout2 = Dropout(0.2)(dense2)
            output = Dense(1)(dropout2)

            # Create model
            model = Model(inputs=[cont_inputs, cat_inputs], outputs=output)

            model.compile(
                optimizer=Adam(learning_rate=0.001), loss="mse", metrics=["mae"]
            )

            # Train model
            callbacks = [
                EarlyStopping(
                    monitor="val_loss", patience=10, restore_best_weights=True
                ),
                ReduceLROnPlateau(
                    monitor="val_loss", factor=0.5, patience=5, min_lr=1e-6
                ),
            ]

            history = model.fit(
                [X_cont_train_scaled, X_cat_train],
                y_train_scaled,
                validation_data=(
                    [X_cont_val_scaled, X_cat_val],
                    scaler_y.transform(y_val.reshape(-1, 1)).flatten(),
                ),
                epochs=epochs,
                batch_size=32,
                callbacks=callbacks,
                verbose=0,
            )

            # Make predictions
            y_pred_scaled = model.predict([X_cont_val_scaled, X_cat_val], verbose=0)
            y_pred = scaler_y.inverse_transform(y_pred_scaled).flatten()

            training_time = time.time() - start_time

            self.models["TFT_Advanced"] = model
            self.results["TFT_Advanced"] = {
                "predictions": y_pred,
                "actual": y_val,
                "weights": val_weights,
                "training_time": training_time,
                "model_type": "Neural Network",
                "history": history,
                "variable_weights": None,  # Would extract from model
                "attention_weights": None,  # Would extract from model
            }

            print(f"Advanced TFT model trained in {training_time:.2f} seconds")
            return model, y_pred

        except Exception as e:
            print(f"Advanced TFT training failed: {e}")
            return None, None

    def ensemble_deep_learning_model(self, sequence_length=5, epochs=40):
        """Ensemble of multiple deep learning architectures - FIXED VERSION"""
        print("=== TRAINING ENSEMBLE DEEP LEARNING MODEL ===")
        start_time = time.time()

        try:
            # Prepare sequences
            features = [
                "Temperature",
                "Fuel_Price",
                "CPI",
                "Unemployment",
                "IsHoliday",
                "Total_MarkDown",
            ]

            def create_sequences(data, seq_len, features):
                X, y, weights = [], [], []

                # Aggregate by store and date for efficiency
                agg_data = (
                    data.groupby(["Store", "Date"])
                    .agg(
                        {
                            "Weekly_Sales": "sum",
                            **{
                                feat: "mean" if feat != "IsHoliday" else "max"
                                for feat in features
                            },
                            "Holiday_Weight": "max",
                        }
                    )
                    .reset_index()
                )

                for store in agg_data["Store"].unique():
                    store_data = agg_data[agg_data["Store"] == store].sort_values(
                        "Date"
                    )

                    if len(store_data) < seq_len + 1:
                        continue

                    for i in range(seq_len, len(store_data)):
                        X.append(store_data[features].iloc[i - seq_len : i].values)
                        y.append(store_data["Weekly_Sales"].iloc[i])
                        weights.append(store_data["Holiday_Weight"].iloc[i])

                return np.array(X), np.array(y), np.array(weights)

            X_train, y_train, train_weights = create_sequences(
                self.train_data, sequence_length, features
            )
            X_val, y_val, val_weights = create_sequences(
                self.val_data, sequence_length, features
            )

            if len(X_train) == 0 or len(X_val) == 0:
                print("Insufficient data for ensemble model")
                return None, None

            # Scale data
            scaler_X = StandardScaler()
            scaler_y = StandardScaler()

            X_train_scaled = scaler_X.fit_transform(
                X_train.reshape(-1, X_train.shape[-1])
            ).reshape(X_train.shape)
            X_val_scaled = scaler_X.transform(
                X_val.reshape(-1, X_val.shape[-1])
            ).reshape(X_val.shape)
            y_train_scaled = scaler_y.fit_transform(y_train.reshape(-1, 1)).flatten()

            # Define multiple architectures
            def create_lstm_branch(inputs):
                x = LSTM(64, return_sequences=True)(inputs)
                x = Dropout(0.3)(x)
                x = LSTM(32, return_sequences=False)(x)
                x = Dropout(0.2)(x)
                return Dense(16, activation="relu")(x)

            def create_gru_branch(inputs):
                x = GRU(64, return_sequences=True)(inputs)
                x = Dropout(0.3)(x)
                x = GRU(32, return_sequences=False)(x)
                x = Dropout(0.2)(x)
                return Dense(16, activation="relu")(x)

            def create_cnn_branch(inputs):
                # FIXED: Adjusted for short sequences
                if sequence_length <= 3:
                    # For very short sequences, use single conv layer without pooling
                    x = Conv1D(
                        filters=32, kernel_size=2, activation="relu", padding="same"
                    )(inputs)
                    x = GlobalAveragePooling1D()(x)
                    return Dense(16, activation="relu")(x)
                else:
                    # For longer sequences, use the original approach but with padding
                    x = Conv1D(
                        filters=64, kernel_size=3, activation="relu", padding="same"
                    )(inputs)

                    # Only apply pooling if sequence length allows it
                    if sequence_length > 4:
                        x = MaxPooling1D(pool_size=2)(x)

                    # Adjust kernel size for second conv layer based on remaining sequence length
                    remaining_length = (
                        sequence_length // 2 if sequence_length > 4 else sequence_length
                    )
                    kernel_size = min(3, remaining_length)

                    if kernel_size >= 2:
                        x = Conv1D(
                            filters=32,
                            kernel_size=kernel_size,
                            activation="relu",
                            padding="same",
                        )(x)

                    x = GlobalAveragePooling1D()(x)
                    return Dense(16, activation="relu")(x)

            def create_attention_branch(inputs):
                # Ensure key_dim is reasonable for the sequence length
                key_dim = min(32, max(8, len(features) // 2))
                x = MultiHeadAttention(num_heads=4, key_dim=key_dim)(inputs, inputs)
                x = LayerNormalization()(x)
                x = GlobalAveragePooling1D()(x)
                return Dense(16, activation="relu")(x)

            # Input layer
            inputs = Input(shape=(sequence_length, len(features)))

            # Create branches
            lstm_branch = create_lstm_branch(inputs)
            gru_branch = create_gru_branch(inputs)
            cnn_branch = create_cnn_branch(inputs)
            attention_branch = create_attention_branch(inputs)

            # Combine branches
            combined = Concatenate()(
                [lstm_branch, gru_branch, cnn_branch, attention_branch]
            )

            # Final layers
            x = Dense(64, activation="relu")(combined)
            x = BatchNormalization()(x)
            x = Dropout(0.4)(x)
            x = Dense(32, activation="relu")(x)
            x = Dropout(0.3)(x)
            outputs = Dense(1)(x)

            # Create and compile model
            model = Model(inputs=inputs, outputs=outputs)
            model.compile(
                optimizer=Adam(learning_rate=0.001), loss="mse", metrics=["mae"]
            )

            print(f"Model input shape: {inputs.shape}")
            print(f"Sequence length: {sequence_length}, Features: {len(features)}")

            # Train model
            callbacks = [
                EarlyStopping(
                    monitor="val_loss", patience=15, restore_best_weights=True
                ),
                ReduceLROnPlateau(
                    monitor="val_loss", factor=0.5, patience=7, min_lr=1e-6
                ),
            ]

            history = model.fit(
                X_train_scaled,
                y_train_scaled,
                validation_data=(
                    X_val_scaled,
                    scaler_y.transform(y_val.reshape(-1, 1)).flatten(),
                ),
                epochs=epochs,
                batch_size=64,
                callbacks=callbacks,
                verbose=0,
            )

            # Make predictions
            y_pred_scaled = model.predict(X_val_scaled, verbose=0)
            y_pred = scaler_y.inverse_transform(y_pred_scaled).flatten()

            training_time = time.time() - start_time

            self.models["EnsembleDeep"] = model
            self.results["EnsembleDeep"] = {
                "predictions": y_pred,
                "actual": y_val,
                "weights": val_weights,
                "training_time": training_time,
                "model_type": "Neural Network",
                "history": history,
                "architecture": "LSTM+GRU+CNN+Attention",
            }

            print(
                f"Ensemble Deep Learning model trained in {training_time:.2f} seconds"
            )
            return model, y_pred

        except Exception as e:
            print(f"Ensemble Deep Learning training failed: {e}")
            return None, None

    def neural_ode_model(self, sequence_length=5, epochs=30):
        """Neural ODE model - CORRECTED VERSION to match TFT and Ensemble DL structure"""
        print("=== TRAINING NEURAL ODE MODEL (CORRECTED) ===")
        start_time = time.time()

        try:
            # Use same feature set as other models for consistency
            features = [
                "Temperature",
                "Fuel_Price",
                "CPI",
                "Unemployment",
                "IsHoliday",
                "Total_MarkDown",  # Added this feature like other models
            ]

            # Use the SAME sequence creation logic as ensemble model
            def create_sequences(data, seq_len, features):
                X, y, weights = [], [], []

                # Aggregate by store and date for efficiency (SAME AS ENSEMBLE)
                agg_data = (
                    data.groupby(["Store", "Date"])
                    .agg(
                        {
                            "Weekly_Sales": "sum",
                            **{
                                feat: "mean" if feat != "IsHoliday" else "max"
                                for feat in features
                            },
                            "Holiday_Weight": "max",
                        }
                    )
                    .reset_index()
                )

                for store in agg_data["Store"].unique():
                    store_data = agg_data[agg_data["Store"] == store].sort_values(
                        "Date"
                    )

                    if len(store_data) < seq_len + 1:
                        continue

                    for i in range(seq_len, len(store_data)):
                        X.append(store_data[features].iloc[i - seq_len : i].values)
                        y.append(store_data["Weekly_Sales"].iloc[i])
                        weights.append(store_data["Holiday_Weight"].iloc[i])

                return np.array(X), np.array(y), np.array(weights)

            # Create training and validation sequences (SAME AS OTHER MODELS)
            X_train, y_train, train_weights = create_sequences(
                self.train_data, sequence_length, features
            )
            X_val, y_val, val_weights = create_sequences(
                self.val_data, sequence_length, features
            )

            print(f"Training sequences: {len(X_train)}")
            print(f"Validation sequences: {len(X_val)}")

            if len(X_train) == 0 or len(X_val) == 0:
                print("Insufficient data for Neural ODE")
                return None, None

            # Scale data (SAME AS OTHER MODELS)
            scaler_X = StandardScaler()
            scaler_y = StandardScaler()

            X_train_scaled = scaler_X.fit_transform(
                X_train.reshape(-1, X_train.shape[-1])
            ).reshape(X_train.shape)
            X_val_scaled = scaler_X.transform(
                X_val.reshape(-1, X_val.shape[-1])
            ).reshape(X_val.shape)
            y_train_scaled = scaler_y.fit_transform(y_train.reshape(-1, 1)).flatten()

            # Neural ODE Architecture - IMPROVED
            inputs = Input(shape=(sequence_length, len(features)))

            # Initial processing layer
            hidden_dim = 64

            # Process temporal sequences first
            x = LSTM(hidden_dim, return_sequences=True, return_state=False)(inputs)
            x = LayerNormalization()(x)

            # Neural ODE blocks - simulating continuous dynamics
            def ode_residual_block(x, step_size=0.1):
                """
                Simulates one step of ODE integration using residual connections
                dx/dt = f(x, t) approximated as x_{t+1} = x_t + step_size * f(x_t, t)
                """
                residual = x  # x_t

                # f(x_t, t) - the derivative function
                dx = Dense(
                    hidden_dim,
                    activation="tanh",
                    kernel_regularizer=l1_l2(l1=1e-5, l2=1e-4),
                )(x)
                dx = Dropout(0.1)(dx)  # Regularization
                dx = Dense(
                    hidden_dim,
                    activation="tanh",
                    kernel_regularizer=l1_l2(l1=1e-5, l2=1e-4),
                )(dx)

                # Euler integration: x_{t+1} = x_t + step_size * dx/dt
                # We'll use step_size = 1 for simplicity, but could be learnable
                x_new = Add()([residual, dx])

                # Normalize to prevent exploding gradients
                return LayerNormalization()(x_new)

            # Apply multiple ODE integration steps
            num_ode_steps = 6  # Simulate 6 time steps of continuous dynamics
            for step in range(num_ode_steps):
                x = ode_residual_block(x, step_size=0.1)

            # Additional processing for better temporal modeling
            x = GRU(32, return_sequences=False)(x)  # Final temporal aggregation
            x = Dropout(0.3)(x)

            # Output layers
            x = Dense(64, activation="relu")(x)
            x = BatchNormalization()(x)
            x = Dropout(0.4)(x)
            x = Dense(32, activation="relu")(x)
            x = Dropout(0.3)(x)
            outputs = Dense(1)(x)

            # Create and compile model
            model = Model(inputs=inputs, outputs=outputs)
            model.compile(
                optimizer=Adam(learning_rate=0.001), loss="mse", metrics=["mae"]
            )

            print(f"Model input shape: {inputs.shape}")
            print(f"Sequence length: {sequence_length}, Features: {len(features)}")

            # Train model with SAME callbacks as other models
            callbacks = [
                EarlyStopping(
                    monitor="val_loss", patience=15, restore_best_weights=True
                ),
                ReduceLROnPlateau(
                    monitor="val_loss", factor=0.5, patience=7, min_lr=1e-6
                ),
            ]

            history = model.fit(
                X_train_scaled,
                y_train_scaled,
                validation_data=(
                    X_val_scaled,
                    scaler_y.transform(y_val.reshape(-1, 1)).flatten(),
                ),
                epochs=epochs,
                batch_size=64,  # Same batch size as ensemble model
                callbacks=callbacks,
                verbose=0,
            )

            # Make predictions
            y_pred_scaled = model.predict(X_val_scaled, verbose=0)
            y_pred = scaler_y.inverse_transform(y_pred_scaled).flatten()

            training_time = time.time() - start_time

            # Store results in SAME format as other models
            self.models["NeuralODE"] = model
            self.results["NeuralODE"] = {
                "predictions": y_pred,
                "actual": y_val,
                "weights": val_weights,
                "training_time": training_time,
                "model_type": "Neural Network",
                "history": history,
                "architecture": "LSTM+ODE_Blocks+GRU",  # Added architecture info
            }

            print(f"Neural ODE model trained in {training_time:.2f} seconds")
            print(f"Predictions shape: {y_pred.shape}, Actual shape: {y_val.shape}")
            return model, y_pred

        except Exception as e:
            print(f"Neural ODE training failed: {e}")
            import traceback

            print(f"Detailed error: {traceback.format_exc()}")
            return None, None

    def state_space_model(self):
        """State Space Model using SARIMAX - COMPLETE FIXED VERSION"""
        print("=== TRAINING STATE SPACE MODEL (SARIMAX) - COMPLETE FIXED ===")
        start_time = time.time()

        try:
            # Prepare time series data with proper data type handling
            ts_data = (
                self.train_data.groupby("Date")
                .agg(
                    {
                        "Weekly_Sales": "sum",
                        "Temperature": "mean",
                        "Fuel_Price": "mean",
                        "CPI": "mean",
                        "Unemployment": "mean",
                        "IsHoliday": "max",
                    }
                )
                .reset_index()
            )

            # Debug: Check for data issues
            print(f"Training data shape before processing: {ts_data.shape}")
            print(f"Date range: {ts_data['Date'].min()} to {ts_data['Date'].max()}")

            # Check for missing values
            missing_counts = ts_data.isnull().sum()
            if missing_counts.any():
                print("Missing values found:")
                print(missing_counts[missing_counts > 0])

            # Ensure Date is datetime
            ts_data["Date"] = pd.to_datetime(ts_data["Date"])
            ts_data.set_index("Date", inplace=True)
            ts_data = ts_data.sort_index()

            # *** THIS IS THE KEY FIX: USE RESAMPLE INSTEAD OF ASFREQ ***
            print("Using resample instead of asfreq...")
            ts_data = ts_data.resample("W").agg(
                {
                    "Weekly_Sales": "sum",
                    "Temperature": "mean",
                    "Fuel_Price": "mean",
                    "CPI": "mean",
                    "Unemployment": "mean",
                    "IsHoliday": "max",
                }
            )

            print(f"Shape after resample: {ts_data.shape}")
            print(f"NaN values after resample: {ts_data.isnull().sum().sum()}")

            # Handle missing values BEFORE data type conversion
            if ts_data.isnull().any().any():
                print("Filling missing values...")
                ts_data = ts_data.fillna(method="ffill").fillna(method="bfill")

                # If still NaN, fill with column medians
                for col in ts_data.columns:
                    if ts_data[col].isnull().any():
                        median_val = ts_data[col].median()
                        ts_data[col] = ts_data[col].fillna(median_val)
                        print(f"Filled {col} with median: {median_val}")

            print(f"Shape after filling: {ts_data.shape}")

            # SAFE data type conversion - DON'T USE errors='coerce'
            exog_vars = [
                "Temperature",
                "Fuel_Price",
                "CPI",
                "Unemployment",
                "IsHoliday",
            ]

            print("Converting data types safely...")
            for col in ["Weekly_Sales"] + exog_vars:
                print(f"Converting {col}: {ts_data[col].dtype} -> float64")

                if col == "IsHoliday":
                    # Handle boolean column specially
                    if ts_data[col].dtype == "bool":
                        ts_data[col] = ts_data[col].astype(int).astype("float64")
                    elif ts_data[col].dtype == "object":
                        # Convert True/False strings to 1/0
                        ts_data[col] = ts_data[col].map(
                            {"True": 1, "False": 0, True: 1, False: 0}
                        )
                        ts_data[col] = ts_data[col].astype("float64")
                    else:
                        ts_data[col] = ts_data[col].astype("float64")
                else:
                    # For numeric columns, ensure they're float
                    if pd.api.types.is_numeric_dtype(ts_data[col]):
                        ts_data[col] = ts_data[col].astype("float64")
                    else:
                        # Only use coerce as last resort and fill immediately
                        original_count = len(ts_data)
                        ts_data[col] = pd.to_numeric(ts_data[col], errors="coerce")
                        nan_count = ts_data[col].isnull().sum()

                        if nan_count > 0:
                            print(
                                f"  WARNING: {nan_count} values converted to NaN in {col}"
                            )
                            median_val = ts_data[col].median()
                            ts_data[col] = ts_data[col].fillna(median_val)
                            print(f"  Filled with median: {median_val}")

                        ts_data[col] = ts_data[col].astype("float64")

            # Final check - should have NO NaNs and same shape
            print(f"Final training shape: {ts_data.shape}")
            print(f"Final training NaN count: {ts_data.isnull().sum().sum()}")
            print(f"Training data types: {ts_data.dtypes}")

            if ts_data.empty:
                print("ERROR: Training data is empty!")
                return None, None

            if ts_data.isnull().any().any():
                print("ERROR: Still have NaN values in training data!")
                print(ts_data.isnull().sum())
                return None, None

            # Verify we have enough data
            if len(ts_data) < 10:
                print("ERROR: Insufficient training data for SARIMAX model")
                return None, None

            # Fit SARIMAX model with error handling
            try:
                model = SARIMAX(
                    ts_data["Weekly_Sales"].values,  # Convert to numpy array explicitly
                    exog=ts_data[exog_vars].values,  # Convert to numpy array explicitly
                    order=(1, 1, 1),  # ARIMA order
                    seasonal_order=(1, 1, 1, 52),  # Seasonal order (weekly seasonality)
                    enforce_stationarity=False,
                    enforce_invertibility=False,
                )

                print("Fitting SARIMAX model...")
                fitted_model = model.fit(disp=False, maxiter=100)

            except Exception as model_error:
                print(
                    f"SARIMAX fitting failed with seasonal order, trying simpler model: {model_error}"
                )
                # Try simpler model without seasonal component
                model = SARIMAX(
                    ts_data["Weekly_Sales"].values,
                    exog=ts_data[exog_vars].values,
                    order=(1, 1, 1),  # ARIMA order only
                    enforce_stationarity=False,
                    enforce_invertibility=False,
                )
                fitted_model = model.fit(disp=False, maxiter=100)

            # Prepare validation data with same processing
            val_ts_data = (
                self.val_data.groupby("Date")
                .agg(
                    {
                        "Weekly_Sales": "sum",
                        "Temperature": "mean",
                        "Fuel_Price": "mean",
                        "CPI": "mean",
                        "Unemployment": "mean",
                        "IsHoliday": "max",
                    }
                )
                .reset_index()
            )

            print(f"Validation data shape before processing: {val_ts_data.shape}")

            # Process validation data - USE SAME SAFE APPROACH AS TRAINING
            val_ts_data["Date"] = pd.to_datetime(val_ts_data["Date"])
            val_ts_data.set_index("Date", inplace=True)
            val_ts_data = val_ts_data.sort_index()

            # *** USE RESAMPLE FOR VALIDATION TOO ***
            print("Using resample for validation data...")
            val_ts_data = val_ts_data.resample("W").agg(
                {
                    "Weekly_Sales": "sum",
                    "Temperature": "mean",
                    "Fuel_Price": "mean",
                    "CPI": "mean",
                    "Unemployment": "mean",
                    "IsHoliday": "max",
                }
            )

            print(f"Validation shape after resample: {val_ts_data.shape}")
            print(
                f"Validation NaN values after resample: {val_ts_data.isnull().sum().sum()}"
            )

            # Handle missing values in validation data BEFORE type conversion
            if val_ts_data.isnull().any().any():
                print("Filling validation missing values...")
                val_ts_data = val_ts_data.fillna(method="ffill").fillna(method="bfill")

                # If still NaN, fill with column medians
                for col in val_ts_data.columns:
                    if val_ts_data[col].isnull().any():
                        median_val = val_ts_data[col].median()
                        val_ts_data[col] = val_ts_data[col].fillna(median_val)
                        print(f"Filled validation {col} with median: {median_val}")

            # SAFE data type conversion for validation - DON'T USE errors='coerce'
            print("Converting validation data types safely...")
            for col in ["Weekly_Sales"] + exog_vars:
                print(
                    f"Converting validation {col}: {val_ts_data[col].dtype} -> float64"
                )

                if col == "IsHoliday":
                    # Handle boolean column specially
                    if val_ts_data[col].dtype == "bool":
                        val_ts_data[col] = (
                            val_ts_data[col].astype(int).astype("float64")
                        )
                    elif val_ts_data[col].dtype == "object":
                        # Convert True/False strings to 1/0
                        val_ts_data[col] = val_ts_data[col].map(
                            {"True": 1, "False": 0, True: 1, False: 0}
                        )
                        val_ts_data[col] = val_ts_data[col].astype("float64")
                    else:
                        val_ts_data[col] = val_ts_data[col].astype("float64")
                else:
                    # For numeric columns, ensure they're float
                    if pd.api.types.is_numeric_dtype(val_ts_data[col]):
                        val_ts_data[col] = val_ts_data[col].astype("float64")
                    else:
                        # Only use coerce as last resort and fill immediately
                        val_ts_data[col] = pd.to_numeric(
                            val_ts_data[col], errors="coerce"
                        )
                        nan_count = val_ts_data[col].isnull().sum()

                        if nan_count > 0:
                            print(
                                f"  WARNING: {nan_count} validation values converted to NaN in {col}"
                            )
                            median_val = val_ts_data[col].median()
                            val_ts_data[col] = val_ts_data[col].fillna(median_val)
                            print(f"  Filled with median: {median_val}")

                        val_ts_data[col] = val_ts_data[col].astype("float64")

            # Final validation check
            print(f"Final validation shape: {val_ts_data.shape}")
            print(f"Final validation NaN count: {val_ts_data.isnull().sum().sum()}")

            if val_ts_data.empty:
                print("ERROR: Validation data is empty!")
                return None, None

            if val_ts_data.isnull().any().any():
                print("ERROR: Still have NaN values in validation data!")
                print(val_ts_data.isnull().sum())
                return None, None

            # Make predictions
            forecast_steps = len(val_ts_data)
            if forecast_steps == 0:
                print("ERROR: No validation data available")
                return None, None

            print(f"Forecasting {forecast_steps} steps...")

            # Ensure exogenous variables are numeric arrays
            exog_forecast = val_ts_data[exog_vars].values.astype("float64")

            # Check for any issues with exogenous data
            if np.isnan(exog_forecast).any():
                print("WARNING: NaN values in exogenous forecast data")
                return None, None

            forecast = fitted_model.forecast(steps=forecast_steps, exog=exog_forecast)

            training_time = time.time() - start_time

            # Store results
            self.models["SARIMAX"] = fitted_model
            self.results["SARIMAX"] = {
                "predictions": (
                    forecast.values if hasattr(forecast, "values") else forecast
                ),
                "actual": val_ts_data["Weekly_Sales"].values,
                "weights": np.ones(len(forecast)),
                "training_time": training_time,
                "model_type": "Statistical",
                "model_summary": str(
                    fitted_model.summary()
                ),  # Convert to string to avoid issues
            }

            print(f"SARIMAX model trained successfully in {training_time:.2f} seconds")
            print(f"Forecast shape: {forecast.shape}")
            print(f"Actual shape: {val_ts_data['Weekly_Sales'].values.shape}")

            return fitted_model, (
                forecast.values if hasattr(forecast, "values") else forecast
            )

        except Exception as e:
            print(f"SARIMAX training failed: {e}")
            import traceback

            print(f"Detailed error: {traceback.format_exc()}")
            return None, None

    def gaussian_process_model(self):
        """Gaussian Process for time series forecasting (simplified using sklearn)"""
        print("=== TRAINING GAUSSIAN PROCESS MODEL ===")
        start_time = time.time()

        try:
            from sklearn.gaussian_process import GaussianProcessRegressor
            from sklearn.gaussian_process.kernels import RBF, WhiteKernel, Matern

            # Prepare data
            features = ["Temperature", "Fuel_Price", "CPI", "Unemployment", "IsHoliday"]

            # Aggregate by date
            train_agg = (
                self.train_data.groupby("Date")
                .agg(
                    {
                        "Weekly_Sales": "sum",
                        **{
                            feat: "mean" if feat != "IsHoliday" else "max"
                            for feat in features
                        },
                    }
                )
                .reset_index()
            )

            val_agg = (
                self.val_data.groupby("Date")
                .agg(
                    {
                        "Weekly_Sales": "sum",
                        **{
                            feat: "mean" if feat != "IsHoliday" else "max"
                            for feat in features
                        },
                    }
                )
                .reset_index()
            )

            # Prepare features and target
            X_train = train_agg[features].values
            y_train = train_agg["Weekly_Sales"].values
            X_val = val_agg[features].values
            y_val = val_agg["Weekly_Sales"].values

            # Scale features
            scaler_X = StandardScaler()
            scaler_y = StandardScaler()

            X_train_scaled = scaler_X.fit_transform(X_train)
            X_val_scaled = scaler_X.transform(X_val)
            y_train_scaled = scaler_y.fit_transform(y_train.reshape(-1, 1)).flatten()

            # Define kernel
            kernel = Matern(length_scale=1.0, nu=2.5) + WhiteKernel(noise_level=0.1)

            # Create and fit GP model
            model = GaussianProcessRegressor(
                kernel=kernel,
                alpha=1e-6,
                normalize_y=False,
                n_restarts_optimizer=3,
                random_state=42,
            )

            # Fit model (subsample for computational efficiency)
            if len(X_train_scaled) > 200:
                indices = np.random.choice(len(X_train_scaled), 200, replace=False)
                X_train_sub = X_train_scaled[indices]
                y_train_sub = y_train_scaled[indices]
            else:
                X_train_sub = X_train_scaled
                y_train_sub = y_train_scaled

            model.fit(X_train_sub, y_train_sub)

            # Make predictions with uncertainty
            y_pred_scaled, y_std_scaled = model.predict(X_val_scaled, return_std=True)

            # Denormalize
            y_pred = scaler_y.inverse_transform(y_pred_scaled.reshape(-1, 1)).flatten()
            y_std = y_std_scaled * scaler_y.scale_[0]

            training_time = time.time() - start_time

            self.models["GaussianProcess"] = model
            self.results["GaussianProcess"] = {
                "predictions": y_pred,
                "actual": y_val,
                "uncertainty": y_std,
                "weights": np.ones(len(y_pred)),
                "training_time": training_time,
                "model_type": "Probabilistic",
            }

            print(f"Gaussian Process model trained in {training_time:.2f} seconds")
            return model, y_pred

        except Exception as e:
            print(f"Gaussian Process training failed: {e}")
            return None, None


# Usage example and test functions
if __name__ == "__main__":

    from src.data_loader import WalmartDataLoader
    from src.data_processing import WalmartComprehensiveEDA
    from src.feature_engineering import WalmartFeatureEngineering

    data_loader = WalmartDataLoader()
    data_loader.load_data()
    train_data = data_loader.train_data
    test_data = data_loader.test_data
    features_data = data_loader.features_data
    stores_data = data_loader.stores_data

    # Assuming you have data from WalmartDataLoader
    eda = WalmartComprehensiveEDA(train_data, test_data, features_data, stores_data)
    merged_data = eda.merge_datasets()

    feature_eng = WalmartFeatureEngineering(merged_data)
    processed_data = feature_eng.create_walmart_features()
    processed_data = feature_eng.handle_missing_values()
    print("Feature Engineering class ready!")

    # Example usage:
    advanced_models = AdvancedWalmartForecastingModels(processed_data)
    train_data, val_data = advanced_models.prepare_data()

    # Train various models
    tft_model, tft_pred = advanced_models.temporal_fusion_transformer_advanced()
    ensemble_model, ensemble_pred = advanced_models.ensemble_deep_learning_model()
    ode_model, ode_pred = advanced_models.neural_ode_model()
    sarimax_model, sarimax_pred = advanced_models.state_space_model()
    gp_model, gp_pred = advanced_models.gaussian_process_model()

All datasets loaded successfully!
MERGING DATASETS
Merged training data shape: (421570, 16)
Date range: 2010-02-05 00:00:00 to 2012-10-26 00:00:00
Number of stores: 45
Number of departments: 81
=== WALMART-SPECIFIC FEATURE ENGINEERING ===
Feature engineering completed. New shape: (421570, 67)
Added 62 new features
=== HANDLING MISSING VALUES ===
Missing values handled. Remaining NaN: 0
Feature Engineering class ready!
=== PREPARING DATA FOR MODELING ===
Data split completed:
  - Training data: (397841, 67)
  - Validation data: (23729, 67)
  - Feature columns: ['IsHoliday', 'Temperature', 'Fuel_Price', 'MarkDown1', 'MarkDown2', 'MarkDown3', 'MarkDown4', 'MarkDown5', 'CPI', 'Unemployment', 'Type', 'Size', 'Holiday_Weight', 'Pre_Holiday', 'Post_Holiday', 'Year', 'Month', 'Week', 'Quarter', 'Month_sin', 'Month_cos', 'Week_sin', 'Week_cos', 'Sales_lag_1', 'Sales_lag_2', 'Sales_lag_4', 'Sales_lag_8', 'Sales_lag_12', 'Sales_lag_52', 'Sales_rolling_mean_4', 'Sales_rolling_std_4', 'Sales_rollin

In [3]:
advanced_models.results.keys()

dict_keys(['TFT_Advanced', 'EnsembleDeep', 'NeuralODE', 'SARIMAX', 'GaussianProcess'])

In [6]:
advanced_models.results["TFT_Advanced"]["predictions"]

array([1468960.8 , 1568785.5 , 1401252.8 , 1536595.4 , 1534727.9 ,
       1527413.6 ,  531471.8 ,  523890.3 ,  523479.12, 2097619.5 ,
       2079817.2 , 2051390.6 ,  525402.25,  518914.25,  519263.7 ,
       1550326.  , 1528853.6 , 1508155.6 ,  667022.9 ,  670484.3 ,
        658965.06,  796802.8 ,  817215.6 ,  801260.7 ,  633969.1 ,
        635364.  ,  631166.3 , 1730265.6 , 1784494.6 , 1750054.4 ,
       1538699.  , 1536218.  , 1530461.5 , 1029654.25, 1018852.3 ,
       1020372.  , 2046897.2 , 2028255.2 , 2015602.  , 1664766.  ,
       1680090.1 , 1683354.9 ,  967529.7 ,  939231.4 ,  932945.44,
        580742.3 ,  582540.44,  580036.25,  981530.1 ,  992248.2 ,
       1002800.75, 1032213.56, 1013025.06,  979147.25, 1596465.6 ,
       1588624.  , 1566017.2 , 1669508.1 , 1718824.2 , 1728527.2 ,
        841552.4 ,  846652.56,  836320.2 , 1011496.25,  988235.8 ,
        977094.6 , 1329710.6 , 1328042.9 , 1330569.1 , 1556516.8 ,
       1539797.  , 1536859.8 ,  836382.25,  821295.6 ,  811896

In [7]:
advanced_models.results["TFT_Advanced"]["actual"]

array([1573072.81, 1508068.77, 1493659.74, 1900745.13, 1847990.41,
       1834458.35,  410804.39,  424513.08,  405432.7 , 2133026.07,
       2097266.85, 2149594.46,  325345.41,  313358.15,  319550.77,
       1459396.84, 1436883.99, 1431426.34,  503463.93,  516424.83,
        495543.28,  927511.99,  900309.75,  891671.44,  558464.8 ,
        542009.46,  549731.49, 1713889.11, 1734834.82, 1744349.05,
       1311965.09, 1232073.18, 1200729.45,  934917.47,  960945.43,
        974697.6 , 1999079.44, 2018010.15, 2035189.66, 1639585.61,
       1590274.72, 1704357.62,  551799.63,  555652.77,  558473.6 ,
        491817.19,  577198.97,  475770.14,  919878.34,  957356.84,
        943465.29, 1074079.  , 1048706.75, 1127516.25, 1352809.5 ,
       1321102.35, 1322117.96, 2162951.36, 1999363.49, 2031650.55,
        653043.44,  641368.14,  675202.87, 1004039.84,  978027.95,
       1094422.69, 1412925.25, 1363155.77, 1347454.59, 1416301.17,
       1255414.84, 1307182.29,  697317.41,  685531.85,  688940