# Walmart M5 Forecasting Solution - Approach 3 & Results

## 1. Objective
To forecast 28 days of future sales for each Walmart item using a Sequence-to-Sequence (Seq2Seq) model with GRU layers. This approach captures temporal dependencies and generates the full forecast in a single forward pass.

---

## 2. Data Loading & Preprocessing

- Source:
  - `sales_train_evaluation.csv`

- Process:
  - Extracted data starting from `d_350` onwards to reduce memory footprint
  - Transposed so each row = day, each column = item
  - Normalized using `MinMaxScaler` across the entire matrix

---

## 3. Feature Engineering

- Inputs: 14 most recent days of sales data per item
- Outputs: next 28 days of sales to be predicted
- Created sliding windows for all available history to form `(X, y)` pairs:
  - `X.shape = (samples, 14, num_items)`
  - `y.shape = (samples, 28, num_items)`

---

## 4. Model Architecture

Implemented a basic encoder-decoder GRU-based Seq2Seq network:

| Component      | Details                                  |
|----------------|-------------------------------------------|
| Encoder Input  | Shape: (14, num_items)                   |
| Encoder GRU    | GRU(64 units), output: (64,)             |
| Repeat Vector  | RepeatVector(28), output: (28, 64)       |
| Decoder GRU    | GRU(64 units, return_sequences=True)     |
| TimeDistributed| Dense(num_items) across all 28 days      |

- Loss: Mean Squared Error (MSE)
- Optimizer: Adam

---

## 5. Training Strategy

- Trained for 10 epochs
- Batch size: 32
- No teacher forcing or autoregression
- All 28 days predicted in one forward pass

---

## 6. Forecasting & Submission

- Final forecast generated using the most recent 14 days of actual sales
- Inverse transformed predictions
- Transposed to match Kaggle submission format
- Duplicated rows for validation and evaluation sets

---

## 8. Kaggle Submission Scores

| Model         | Private RMSE | Public RMSE |
|---------------|-------------|--------------|
| Seq2Seq GRU   | 0.99343     | 1.07762      |

**Conclusion:** The simple Seq2Seq GRU model provides a baseline for multi-step deep learning forecasting, but underperforms GRU and LightGBM approaches from earlier experiments. Future improvements could include teacher forcing, attention, or autoregressive decoding.



## Seq2Seq GRU using Transposed Sales Matrix

In [1]:
# M5 Forecasting - Seq2Seq GRU using Transposed Sales Matrix

import numpy as np
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from keras.models import Model
from keras.layers import Input, GRU, Dense, RepeatVector, TimeDistributed
import time

# --- CONFIGURATION --- #
data_path = "/kaggle/input/m5-forecasting-accuracy/"
time_steps = 14
forecast_steps = 28
start_day = 350

# --- LOAD AND TRANSPOSE SALES DATA --- #
df = pd.read_csv(data_path + "sales_train_evaluation.csv")
d_cols = [col for col in df.columns if col.startswith('d_')]
df = df[['id'] + d_cols[start_day:]]
df = df.set_index('id').T  # days as rows, items as columns

# --- DROP METADATA ROWS IF PRESENT --- #
df = df.astype(np.float32)

# --- NORMALIZE --- #
scaler = MinMaxScaler()
df_scaled = scaler.fit_transform(df)

2025-04-18 15:46:30.248357: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1744991190.449352      31 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1744991190.503758      31 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


## Data Preparation

In [2]:
# --- PREPARE SEQ2SEQ DATA (INPUT/OUTPUT PAIRS) --- #
X, y = [], []
for i in range(time_steps, len(df_scaled) - forecast_steps):
    X.append(df_scaled[i - time_steps:i])
    y.append(df_scaled[i:i + forecast_steps])
X = np.array(X, dtype=np.float32)
y = np.array(y, dtype=np.float32)

# --- DEFINE SIMPLE GRU SEQ2SEQ MODEL --- #
input_dim = X.shape[2]
output_dim = y.shape[2]

encoder_input = Input(shape=(time_steps, input_dim))
encoder = GRU(64, return_sequences=False)(encoder_input)
encoder_repeat = RepeatVector(forecast_steps)(encoder)

decoder = GRU(64, return_sequences=True)(encoder_repeat)
decoder_output = TimeDistributed(Dense(output_dim))(decoder)

model = Model(encoder_input, decoder_output)
model.compile(optimizer='adam', loss='mse')
model.summary()

I0000 00:00:1744991215.936387      31 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 15513 MB memory:  -> device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0


## Model Training

In [3]:
# --- TRAIN MODEL --- #
model.fit(X, y, epochs=10, batch_size=32, verbose=1)

# --- FORECAST --- #
latest = df_scaled[-time_steps:].reshape(1, time_steps, input_dim)
pred_scaled = model.predict(latest).reshape(forecast_steps, input_dim)
pred_inverse = scaler.inverse_transform(pred_scaled)

Epoch 1/10


I0000 00:00:1744991277.614861      93 cuda_dnn.cc:529] Loaded cuDNN version 90300


[1m49/49[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 74ms/step - loss: 0.0193
Epoch 2/10
[1m49/49[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 76ms/step - loss: 0.0150
Epoch 3/10
[1m49/49[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 75ms/step - loss: 0.0144
Epoch 4/10
[1m49/49[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 75ms/step - loss: 0.0141
Epoch 5/10
[1m49/49[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 83ms/step - loss: 0.0138
Epoch 6/10
[1m49/49[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 74ms/step - loss: 0.0137
Epoch 7/10
[1m49/49[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 78ms/step - loss: 0.0135
Epoch 8/10
[1m49/49[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 77ms/step - loss: 0.0134
Epoch 9/10
[1m49/49[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 78ms/step - loss: 0.0135
Epoch 10/10
[1m49/49[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 76ms/step - loss: 0.0134
[1m1/1[0

## Submission

In [4]:
# --- SUBMISSION --- #
sample_sub = pd.read_csv(data_path + "sample_submission.csv")
ids = sample_sub['id'].values[:30490]
submission = pd.DataFrame(pred_inverse.T, columns=[f"F{i+1}" for i in range(forecast_steps)])
submission.insert(0, 'id', ids)
submission = pd.concat([submission, submission], ignore_index=True)
submission['id'] = sample_sub['id']
submission = submission[['id'] + [f"F{i+1}" for i in range(forecast_steps)]]

filename = f"seq2seq_submission_{int(time.time())}.csv"
submission.to_csv(filename, index=False)
print(f"Seq2Seq submission saved as {filename}")

Simple Seq2Seq submission saved as simple_seq2seq_submission_1744991332.csv
