# M5 Forecasting - Accuracy
#### Estimate the unit sales of Walmart retail goods
How much camping gear will one store sell each month in a year? To the uninitiated, calculating sales at this level may seem as difficult as predicting the weather. Both types of forecasting rely on science and historical data. While a wrong weather forecast may result in you carrying around an umbrella on a sunny day, inaccurate business forecasts could result in actual or opportunity losses. In this competition, in addition to traditional forecasting methods you’re also challenged to use machine learning to improve forecast accuracy.

Use hierarchical sales data from Walmart, the world’s largest company by revenue, to forecast daily sales for the next 28 days. The data, covers stores in three US States (California, Texas, and Wisconsin) and includes item level, department, product categories, and store details. In addition, it has explanatory variables such as price, promotions, day of the week, and special events. Together, this robust dataset can be used to improve forecasting accuracy.

If successful, your work will continue to advance the theory and practice of forecasting. The methods used can be applied in various business areas, such as setting up appropriate inventory or service levels. Through its business support and training, the MOFC will help distribute the tools and knowledge so others can achieve more accurate and better calibrated forecasts, reduce waste and be able to appreciate uncertainty and its risk implications.

The original dataset and information can be found on Kaggle here:
https://www.kaggle.com/competitions/m5-forecasting-accuracy

In [1]:
import pandas as pd
import numpy as np
calendar=r'/kaggle/input/m5-forecasting-accuracy/calendar.csv'
sample=r'/kaggle/input/m5-forecasting-accuracy/sample_submission.csv'
sellData=r'/kaggle/input/m5-forecasting-accuracy/sell_prices.csv'
trainVal=r'/kaggle/input/m5-forecasting-accuracy/sales_train_validation.csv'
trainEval=r'/kaggle/input/m5-forecasting-accuracy/sales_train_evaluation.csv'

This competition uses a Weighted Root Mean Squared Scaled Error (RMSSE).
Extensive details about the metric, scaling, and weighting can be found in the M5 Participants Guide.

## Evaluation Data
The train evaluation data has 30490 rows × 1947 columns. Model Evaluation is an essential part of the model development process that is also known as the train dataset. 

In [2]:
train=pd.read_csv(trainEval)
train

## Validation Data
There are 130490 rows × 1919 columns. Validation is the test set. Model validation confirms the models are performing as expected and the models are sound.

In [3]:
test=pd.read_csv(trainVal)
test

In [4]:
sell=pd.read_csv(sellData)
sell

## Make Final Prediction
Submission File
Each row contains an id that is a concatenation of an item_id and a store_id, which is either validation (corresponding to the Public leaderboard), or evaluation (corresponding to the Private leaderboard). You are predicting 28 forecast days (F1-F28) of items sold for each row. For the validation rows, this corresponds to d_1914 - d_1941, and for the evaluation rows, this corresponds to d_1942 - d_1969. (Note: a month before the competition close, the ground truth for the validation rows will be provided.)

The files must have a header and should look like the following:

In [5]:
sample=pd.read_csv(sample)
sample

## References
1. https://www.kaggle.com/code/minhajulhoque/deep-learning-rnn-for-m5-forecasting