<div style="background-color: lightblue; color: black; padding: 20px; font-weight: bold; font-size: 20px;">Baseline-modell</div><br>
<div style="background-color: lightblue; color: black; padding: 10px; font-weight: bold; font-size: 15px;">The baseline model is a simple approach to predict weekly sales based on sales in recent years. For this purpose, the weekly sales from 2010 to 2011 are used. This is then evaluated in a validation and a test period in terms of RMSE in 2012.</div>

In [16]:
import pandas as pd
import numpy as np
from pandas.tseries.offsets import DateOffset
from sklearn.metrics import mean_squared_error    

In [17]:
"""
Calculate the baseline model for predicting weekly sales 
for different store location and different departments
based on a training time between 2010-02-01 and 2012-01-05. 
Testing period is between 1012-06-09 and 2012-10-31.
 
 """

# get data from the 
df = pd.read_pickle('data/data_combined_clean_2.pkl')      # read latest version of cleaned data
df.tail()
# Add Calendar week
df['CW'] = df['Date'].dt.isocalendar()['week']

In [18]:
#define train, validation and test dataset
train_data = df[df['Date'] < "2012-01-06"].reset_index(drop=True)
val_data = df[(df['Date'] >= "2012-01-06") & (df['Date'] <= "2012-06-08")].reset_index(drop=True)
test_data = df[df['Date'] > "2012-06-08"].reset_index(drop=True)

# Calculate mean Weekly_Sales for each Store and Dept in the training data
#                                                                                                          
train_mean_sales = train_data.groupby(['Store', 'Dept', 'CW'])['Weekly_Sales'].mean().reset_index()                                                                                           

In [19]:
# Merge mean_sales with the test_data
test_data_with_mean = pd.merge(test_data, train_mean_sales, on=['Store', 'Dept', 'CW'], how='left', suffixes=('_test_data', '_train_data_mean')) 
test_data_with_mean.head()
val_data_with_mean = pd.merge(val_data, train_mean_sales, on=['Store', 'Dept', 'CW'], how='left', suffixes=('_test_data', '_train_data_mean'))

In [20]:
# Calculate RMSE VALIDATION DATA
rmse = np.sqrt(mean_squared_error(val_data_with_mean['Weekly_Sales_test_data'], val_data_with_mean['Weekly_Sales_train_data_mean']))
print(f"Root Mean Squared Error(RMSE): {rmse.round(0)}")

Root Mean Squared Error(RMSE): 4172.0


In [21]:
# Calculate RMSE TEST DATA
rmse = np.sqrt(mean_squared_error(test_data_with_mean['Weekly_Sales_test_data'], test_data_with_mean['Weekly_Sales_train_data_mean']))
print(f"Root Mean Squared Error(RMSE): {rmse.round(0)}")

Root Mean Squared Error(RMSE): 3940.0
