### <center style="background-color:Gainsboro; width:70%;">Store Sales: Day of the Week model</center>

In the [Store Sales - Time Series Forecasting](https://www.kaggle.com/c/store-sales-time-series-forecasting) GettingStarted prediction Competition we have been tasked with building a model that predicts the unit sales for thousands of items sold at different Corporación Favorita stores. Specifically we are asked to predict the sales for 16 days from 2017-08-16 to 2017-08-31. In this short notebook we shall build a very simple model that uses the values of each family of product sold, for each store number. Given that the `sales` seem to very much depend on the day of the week, we shall also create a '*day of the week*' feature. 

For our training data we shall use the very last three weeks of the data we have been given, thus having three examples of each weekday to average over.

In [None]:
import numpy  as np
import pandas as pd

# read in the data
train = pd.read_csv("../input/store-sales-time-series-forecasting/train.csv",parse_dates=['date'])
test  = pd.read_csv("../input/store-sales-time-series-forecasting/test.csv",parse_dates=['date'])

# create a 'day of the week' feature
train['day_of_the_week'] = train['date'].dt.day_name()
test['day_of_the_week']  = test['date'].dt.day_name()

# select the very last three weeks of the training data
train_three_weeks = train.query("date >= '2017-07-26' ")

def exp_mean_ln(df):
    return np.expm1(np.mean(np.log1p(df['sales'])))

# calculate the average values
train_average = train_three_weeks.groupby(['store_nbr','family','day_of_the_week']).apply(exp_mean_ln).to_dict()
test['sales'] = test.set_index(['store_nbr','family','day_of_the_week']).index.map(train_average.get)

# create and write out the submission.csv file
submission = pd.DataFrame({'id': test.id, 'sales': test.sales})
submission.to_csv('submission.csv', index=False)

### <center style="background-color:Gainsboro; width:60%;">Visualize the results</center>

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
plt.style.use('fivethirtyeight')
plt.rcParams.update({'font.size': 16})
from datetime import datetime

# take a look at just one of the stores
store_number = 1
store_n_train_df = train_three_weeks.query('store_nbr == @store_number')
store_n_test_df  = test.query('store_nbr == @store_number')

fig, ax = plt.subplots(figsize=(20, 5))
sns.lineplot(data=store_n_train_df, x="date", y="sales", hue="family", linewidth = 1.5)
sns.lineplot(data=store_n_test_df,  x="date", y="sales", hue="family", linewidth = 1.5)
plt.text(datetime.strptime("2017-08-04", '%Y-%m-%d'), 4100, "training data")
plt.text(datetime.strptime("2017-08-24", '%Y-%m-%d'), 4100, "predictions")
plt.legend([],[], frameon=False);

### <center style="background-color:Gainsboro; width:60%;">Related notebooks</center>
* [Store Sales: Naive one-day model](https://www.kaggle.com/carlmcbrideellis/store-sales-naive-one-day-model)
* [Store Sales: Using the average of the last 16 days](https://www.kaggle.com/carlmcbrideellis/store-sales-using-the-average-of-the-last-16-days)

### <center style="background-color:Gainsboro; width:60%;">Recommended reading</center>
* [Rob J. Hyndman and George Athanasopoulos "*Forecasting: Principles and Practice*", (3rd Edition)](https://otexts.com/fpp3/)
* [Fotios Petropoulos, *et al. "Forecasting: Theory and Practice*", arXiv:2012.03854 (2020)](https://arxiv.org/pdf/2012.03854.pdf)