In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf

%matplotlib inline

## Import Data

As usual, we will first import our data. We also take an excerpt figure from the M5 Competition handbook to show the breakdown of how the data is categorized:

<figure>
    <img src = "Supplementals/M5_Competition_Categories.png", style="width:90%">
    <figcaption><b>Fig.1: M5 Data Categorical Breakdown</b></figcaption>
</figure>

### Objective:
<i>The objective of the M5 forecasting competition is to advance the theory and practice of forecasting by identifying the method(s) that provide the most accurate point forecasts for each of the 42,840 time series of the competition. I addition, to elicit information to estimate the uncertainty distribution of the realized values of these series as precisely as possible. 
To that end, the participants of M5 are asked to <b>provide 28 days ahead point forecasts (PFs)</b> for all the series of the competition, as well as the corresponding median and 50%, 67%, 95%, and 99% prediction intervals (PIs).</i>


In [2]:
#sales_train_val = pd.read_csv('sales_train_validation.csv')
calendar = pd.read_csv('calendar.csv')
sales_train_eval = pd.read_csv('sales_train_evaluation.csv')
sample_submission = pd.read_csv('sample_submission.csv')
sell_prices = pd.read_csv('sell_prices.csv')

In [3]:
# We can take a look at the desired format by examining the first 10 entries from the sample submission
sample_submission.head(10)

Unnamed: 0,id,F1,F2,F3,F4,F5,F6,F7,F8,F9,...,F19,F20,F21,F22,F23,F24,F25,F26,F27,F28
0,HOBBIES_1_001_CA_1_validation,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,HOBBIES_1_002_CA_1_validation,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,HOBBIES_1_003_CA_1_validation,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,HOBBIES_1_004_CA_1_validation,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,HOBBIES_1_005_CA_1_validation,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,HOBBIES_1_006_CA_1_validation,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,HOBBIES_1_007_CA_1_validation,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7,HOBBIES_1_008_CA_1_validation,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,HOBBIES_1_009_CA_1_validation,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,HOBBIES_1_010_CA_1_validation,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [4]:
# I'll first modify the sell_prices dataframe so that it explicitly lists the relevant Year and Week of the year as encoded in
# the wm_yr_wk column. To do this, I'll define two functions that can extract the Year and Week for each wm_yr_wk value.

def year_extract(wk_id):
    year = (wk_id - (wk_id % 100))/100 - 100
    return int(year + 2000)

def week_extract(wk_id):
    week = wk_id % 100
    return week

sell_prices['Year'] = sell_prices['wm_yr_wk'].apply(year_extract)
sell_prices['Week'] = sell_prices['wm_yr_wk'].apply(week_extract)

# Check out the new sell_prices dataframe
sell_prices.head(10)

Unnamed: 0,store_id,item_id,wm_yr_wk,sell_price,Year,Week
0,CA_1,HOBBIES_1_001,11325,9.58,2013,25
1,CA_1,HOBBIES_1_001,11326,9.58,2013,26
2,CA_1,HOBBIES_1_001,11327,8.26,2013,27
3,CA_1,HOBBIES_1_001,11328,8.26,2013,28
4,CA_1,HOBBIES_1_001,11329,8.26,2013,29
5,CA_1,HOBBIES_1_001,11330,8.26,2013,30
6,CA_1,HOBBIES_1_001,11331,8.26,2013,31
7,CA_1,HOBBIES_1_001,11332,8.26,2013,32
8,CA_1,HOBBIES_1_001,11333,8.26,2013,33
9,CA_1,HOBBIES_1_001,11334,8.26,2013,34


# Models

The model architecture will influence how we want to structure the training data. There are of course numerous ways to go about doing this but we will propose a few here. A barebones template for our models will be that it takes some collection of values as input and outputs a prediction for the next 28 day sale values for that particular item. 

One of the first types of model structures that comes to mind is a function $model: \mathbb{R}^{n+1} \to \mathbb{R}^{28}$ that takes as input the $n+1$ previous days of sale values and outputs a prediction for the next 28 days:

\begin{equation}
    pred(t+1), pred(t+2), ..., pred(t+28) = model(obs(t), obs(t-1),...,obs(t-n)),
\end{equation}

where $pred(t+j)$ denotes the prediction for the $j$-th next day from which the model wants to perform a prediction and $obs(t-j)$ denotes the observation made on the $j$-th previous day. We note that such a model assumes a causal relationship between past sales and future sales, which for this model and most that we will consider we argue is a fair assumption to make. In such a model shown above, we explicitly assign a weighting to the prior $n+1$ days and zero to all others. We can of course modify which days in the past we would like to include as input into our model, which can be played around with to optimize the model.

### Encoding Categorical Variables

So far, this is an okay start, but it's useful to also encode categorical variables such as any events taking place over the next 4 weeks or what store / category the item belonged to. Suppose that the collection of all possible events had a size (or cardinality) of $n$. Then, for each such unique event, we can assign a number $g \in \mathbb{Z}_{n+1}$ that ranges from $1$ to $n$ to uniquely identify it. In this case, we consider $0$ to indicate 'no event'. Then, we can encode this into our model by specifying the event value for a given period of time, such as a week. 

If we assign a single value to a week, then we can define $model: \mathbb{Z}_{n+1}^4 \times \mathbb{R}^{n+1} \to \mathbb{R}^{28}$ where we have introduced the notation

\begin{equation}
    \mathbb{Z}^{j}_{m} := \underbrace{\mathbb{Z}_{m} \times ... \times \mathbb{Z}_{m}}_{\text{j times}}.
\end{equation}
### Loss Function 

To train a model, one standard way to do this is by formalizing a measure of model accuracy. Such a measure is typically encoded in quantities known as loss functions that we aim to minimize. These functions assign a penalty to predicted values that deviate from the observed values.

One common loss function is the <b> squared-error loss</b> function or <b> L2-Loss </b> function. It is given by

\begin{equation}
    L2(y,\hat{y}) = \sum_{i=1}^{28}(y_i - \hat{y}_i)^2,
\end{equation}

where $y_i$ denotes the observation for the $i$-th day and $\hat{y}_i$ denotes the prediction for the $i$-th day.









## Model 1: Neural Network

There are various neural network architectures that we can consider. One of the most standard and familiar ones are feed-forward neural networks where nodes are only connected to adjacent layers. An architecture I have proposed is provided below:

<figure>
    <img src = "Supplementals/NeuralNetworkStructure.png" width = "1000" height = "600">
    <figcaption><b>Fig.2: Standard Feed-Forward Neural Network Architecture with Listed Inputs</b></figcaption>
</figure>

### Parameters: 

Suppose that we only consider data from $n$ days in the past. We denote prediction day $j$ as the day $j$ days from the day of measurement. Then, we define the following parameters that enter into our input layer:

<ul>
    <li> <b>S</b>: Store ID (0-9) </li>
    <li> <b>Item</b>: Item ID </li>
    <li> <b>fe$_j$</b>: The Future Event value that will occur on prediction day $j$</li>
    <li> <b>e$_i$</b>: The Event value that occurred on prior day $i$</li>
    <li> <b>sp$_i$</b>: Snap Value that occurred on prior day $i$ (0 or 1)</li>
    <li> <b>yw$_i$</b>: Year-Week Number that occurred on prior day $i$ </li>
    <li> <b>wd$_i$</b>: Week-Day Value that occurred on prior day $i$ (1-7) </li>
    <li> <b>P$_i$</b>: Sell Price that occurred on prior day $i$ </li>
    <li> <b>d$_i$</b>: Number of Sales that occurred on prior day $i$ </li>
</ul>




In [32]:
# We'll first transform the event labels to categorical labels by assigning a unique integer to every unique event.
events = calendar['event_name_1'].unique()
event_map = {events[0]: 0}

# Exclude nan
events = events[1::] 

i=1
for word in events:
    
    event_map[word] = i
    i = i + 1
    
event_map

{'Chanukah End': 26,
 'Christmas': 25,
 'Cinco De Mayo': 10,
 'ColumbusDay': 20,
 'Easter': 30,
 'Eid al-Fitr': 18,
 'EidAlAdha': 22,
 "Father's day": 15,
 'Halloween': 21,
 'IndependenceDay': 16,
 'LaborDay': 19,
 'LentStart': 4,
 'LentWeek2': 5,
 'MartinLutherKingDay': 29,
 'MemorialDay': 12,
 "Mother's day": 11,
 'NBAFinalsEnd': 14,
 'NBAFinalsStart': 13,
 'NewYear': 27,
 'OrthodoxChristmas': 28,
 'OrthodoxEaster': 8,
 'Pesach End': 9,
 'PresidentsDay': 3,
 'Purim End': 7,
 'Ramadan starts': 17,
 'StPatricksDay': 6,
 'SuperBowl': 1,
 'Thanksgiving': 24,
 'ValentinesDay': 2,
 'VeteransDay': 23,
 nan: 0}

{'Chanukah End',
 'Christmas',
 'Cinco De Mayo',
 'ColumbusDay',
 'Easter',
 'Eid al-Fitr',
 'EidAlAdha',
 "Father's day",
 'Halloween',
 'IndependenceDay',
 'LaborDay',
 'LentStart',
 'LentWeek2',
 'MartinLutherKingDay',
 'MemorialDay',
 "Mother's day",
 'NBAFinalsEnd',
 'NBAFinalsStart',
 'NewYear',
 'OrthodoxChristmas',
 'OrthodoxEaster',
 'Pesach End',
 'PresidentsDay',
 'Purim End',
 'Ramadan starts',
 'StPatricksDay',
 'SuperBowl',
 'Thanksgiving',
 'ValentinesDay',
 'VeteransDay',
 nan}