## M5 Public LB
The true labels of the public LB have been released as announced [here](https://www.kaggle.com/c/m5-forecasting-accuracy/discussion/155399).

This gives us a unique opportunity of deeply exploring the additional data and understanding what went right and what went wrong. This could also help in improving and optimizing the model predictions for the final private test data.

**P.S.** Don't forget to hover over the graphs to get point-specific details.


In [1]:
## importing packages
import numpy as np
import pandas as pd

from bokeh.layouts import column, row
from bokeh.models import Panel, Tabs, LinearAxis, Range1d, BoxAnnotation, LabelSet, Span
from bokeh.models.tools import HoverTool
from bokeh.palettes import Category20, Spectral3, Spectral4, Spectral8
from bokeh.plotting import ColumnDataSource, figure, output_notebook, show
from bokeh.transform import dodge

from math import pi
from typing import Union
from tqdm.notebook import tqdm

output_notebook()

LB_DATES = list(pd.date_range(start = "2016-04-25", end = "2016-05-22").strftime("%Y-%m-%d"))
LB_WEEKDAYS = pd.to_datetime(LB_DATES).to_series().dt.day_name()


## Submission File
For the purpose of this notebook, I will use the submission from [kneroma](https://www.kaggle.com/kneroma)'s kernel: https://www.kaggle.com/kneroma/m5-first-public-notebook-under-0-50 but feel free to replace this submission with your own submission files. All you need to do is replace *df_submission* before running the notebook.


In [39]:
sub_model2 = pd.read_csv('submission_to_0.47.csv')
sub_model3 = pd.read_csv('submission_to_0.47_param3.csv')
sub_model3_fe = pd.read_csv('submission_weighted_param_3_7_14_21_28.csv')

In [73]:
df_submission = pd.read_csv('submission_self_run.csv')
# eval_set = df_submission[30490:].copy()
# valid_set = df_submission[30490:].copy()
# valid_set["id"] = valid_set["id"].str.replace("evaluation$", "validation")
# df_submission = pd.concat([valid_set,eval_set],axis=0,ignore_index=True)

In [50]:
combined = pd.DataFrame()
combined['id'] = sub_model2['id']
cols = [f'F{i}'for i in range(1,29)]
for col in cols:
    combined[col] = (sub_model2[col] + sub_model3[col]) / 2
df_submission = combined
df_submission.head()

Unnamed: 0,id,F1,F2,F3,F4,F5,F6,F7,F8,F9,...,F19,F20,F21,F22,F23,F24,F25,F26,F27,F28
0,FOODS_1_001_CA_1_validation,0.820314,0.802266,0.800925,0.845782,1.02826,1.024485,1.082191,0.925428,0.869167,...,1.041883,1.274529,1.132313,0.907766,0.791097,0.787247,0.827984,1.000772,1.19352,1.153473
1,FOODS_1_001_CA_2_validation,0.931526,0.929918,0.869254,1.141765,1.269044,1.110007,1.296057,0.90379,0.956518,...,1.260898,1.571852,1.519076,1.01201,1.013187,0.983792,1.053411,1.270174,1.610439,1.402252
2,FOODS_1_001_CA_3_validation,1.059968,0.983349,0.83919,0.84192,0.942459,1.075376,1.107389,1.028322,1.089524,...,0.998834,1.640254,1.722733,1.026528,0.950526,0.840668,0.830471,0.929206,1.320485,1.320658
3,FOODS_1_001_CA_4_validation,0.418446,0.370121,0.372825,0.380642,0.452901,0.400971,0.469814,0.41477,0.410258,...,0.463769,0.485351,0.449296,0.369747,0.352477,0.361242,0.380714,0.434519,0.449651,0.464405
4,FOODS_1_001_TX_1_validation,0.209947,0.208023,0.212368,0.220885,0.207161,0.176158,0.209236,0.552172,0.481739,...,0.428016,0.40585,0.405092,0.281968,0.276604,0.293723,0.292655,0.335101,0.36276,0.370952


In [75]:
df_submission.drop('Unnamed: 0',axis=1, inplace=True)

## Evaluation
Thanks to [sakami](https://www.kaggle.com/sakami) for providing a neat class for the evaluation metric [here](https://www.kaggle.com/c/m5-forecasting-accuracy/discussion/133834).

You can find details on how to calculate your true public LB score and rank: https://www.kaggle.com/rohanrao/m5-how-to-get-your-public-lb-score-rank.


In [4]:
## evaluation metric
## edited from https://www.kaggle.com/c/m5-forecasting-accuracy/discussion/133834
class WRMSSEEvaluator(object):

    def __init__(self, train_df: pd.DataFrame, valid_df: pd.DataFrame, calendar: pd.DataFrame, prices: pd.DataFrame):
        train_y = train_df.loc[:, train_df.columns.str.startswith('d_')]
        train_target_columns = train_y.columns.tolist()
        weight_columns = train_y.iloc[:, -28:].columns.tolist()

        train_df['all_id'] = 0  # for lv1 aggregation

        id_columns = train_df.loc[:, ~train_df.columns.str.startswith('d_')].columns.tolist()
        valid_target_columns = valid_df.loc[:, valid_df.columns.str.startswith('d_')].columns.tolist()

        if not all([c in valid_df.columns for c in id_columns]):
            valid_df = pd.concat([train_df[id_columns], valid_df], axis=1, sort=False)

        self.train_df = train_df
        self.valid_df = valid_df
        self.calendar = calendar
        self.prices = prices

        self.weight_columns = weight_columns
        self.id_columns = id_columns
        self.valid_target_columns = valid_target_columns

        weight_df = self.get_weight_df()

        self.group_ids = (
            'all_id',
            'cat_id',
            'state_id',
            'dept_id',
            'store_id',
            'item_id',
            ['state_id', 'cat_id'],
            ['state_id', 'dept_id'],
            ['store_id', 'cat_id'],
            ['store_id', 'dept_id'],
            ['item_id', 'state_id'],
            ['item_id', 'store_id']
        )

        for i, group_id in enumerate(tqdm(self.group_ids)):
            train_y = train_df.groupby(group_id)[train_target_columns].sum()
            scale = []
            for _, row in train_y.iterrows():
                series = row.values[np.argmax(row.values != 0):]
                scale.append(((series[1:] - series[:-1]) ** 2).mean())
            setattr(self, f'lv{i + 1}_scale', np.array(scale))
            setattr(self, f'lv{i + 1}_train_df', train_y)
            setattr(self, f'lv{i + 1}_valid_df', valid_df.groupby(group_id)[valid_target_columns].sum())

            lv_weight = weight_df.groupby(group_id)[weight_columns].sum().sum(axis=1)
            setattr(self, f'lv{i + 1}_weight', lv_weight / lv_weight.sum())

    def get_weight_df(self) -> pd.DataFrame:
        day_to_week = self.calendar.set_index('d')['wm_yr_wk'].to_dict()
        weight_df = self.train_df[['item_id', 'store_id'] + self.weight_columns].set_index(['item_id', 'store_id'])
        weight_df = weight_df.stack().reset_index().rename(columns={'level_2': 'd', 0: 'value'})
        weight_df['wm_yr_wk'] = weight_df['d'].map(day_to_week)

        weight_df = weight_df.merge(self.prices, how='left', on=['item_id', 'store_id', 'wm_yr_wk'])
        weight_df['value'] = weight_df['value'] * weight_df['sell_price']
        weight_df = weight_df.set_index(['item_id', 'store_id', 'd']).unstack(level=2)['value']
        weight_df = weight_df.loc[zip(self.train_df.item_id, self.train_df.store_id), :].reset_index(drop=True)
        weight_df = pd.concat([self.train_df[self.id_columns], weight_df], axis=1, sort=False)
        return weight_df

    def rmsse(self, valid_preds: pd.DataFrame, lv: int) -> pd.Series:
        valid_y = getattr(self, f'lv{lv}_valid_df')
        score = ((valid_y - valid_preds) ** 2).mean(axis=1)
        scale = getattr(self, f'lv{lv}_scale')
        return (score / scale).map(np.sqrt)

    def score(self, valid_preds: Union[pd.DataFrame, np.ndarray]):
        assert self.valid_df[self.valid_target_columns].shape == valid_preds.shape

        if isinstance(valid_preds, np.ndarray):
            valid_preds = pd.DataFrame(valid_preds, columns=self.valid_target_columns)

        valid_preds = pd.concat([self.valid_df[self.id_columns], valid_preds], axis=1, sort=False)

        group_ids = []
        all_scores = []

        for i, group_id in enumerate(self.group_ids):
            lv_scores = self.rmsse(valid_preds.groupby(group_id)[self.valid_target_columns].sum(), i + 1)
            weight = getattr(self, f'lv{i + 1}_weight')
            lv_scores = pd.concat([weight, lv_scores], axis=1, sort=False).prod(axis=1)
            group_ids.append(group_id)
            all_scores.append(lv_scores.sum())

        return group_ids, all_scores
    
    def get_scores(self, valid_preds: Union[pd.DataFrame, np.ndarray], lv: int):
        assert self.valid_df[self.valid_target_columns].shape == valid_preds.shape

        if isinstance(valid_preds, np.ndarray):
            valid_preds = pd.DataFrame(valid_preds, columns=self.valid_target_columns)

        valid_preds = pd.concat([self.valid_df[self.id_columns], valid_preds], axis=1, sort=False)
        
        for i, group_id in enumerate(self.group_ids):
            if lv == i+1:
                valid_df = valid_preds.groupby(group_id)[self.valid_target_columns].sum()
                valid_y = getattr(self, f"lv{lv}_valid_df")
                scale = getattr(self, f"lv{lv}_scale")
                weight = getattr(self, f"lv{lv}_weight")
                valid_df["score"] = (((valid_y - valid_df) ** 2).mean(axis = 1) / scale).map(np.sqrt)
                valid_df.columns = ["pred_d_" + str(x) for x in range(1914, 1942)] + ["score"]
                valid_df = pd.concat([valid_df, valid_y], axis = 1)
                valid_df["score_weighted"] = valid_df.score * weight
                valid_df["score_percentage"] = valid_df.score_weighted / valid_df.score_weighted.sum()

        return valid_df.reset_index()


## Preparing data
Reading the datasets and preparing the evaluator class.


In [5]:
## reading data
df_train_full =  pd.read_csv("sales_train_evaluation/sales_train_evaluation.csv")
df_calendar = pd.read_csv("calendar.csv")
df_prices = pd.read_csv("sell_prices.csv")
df_sample_submission = pd.read_csv("sample_submission.csv")
df_sample_submission["order"] = range(df_sample_submission.shape[0])

df_train = df_train_full.iloc[:, :-28]
df_valid = df_train_full.iloc[:, -28:]

evaluator = WRMSSEEvaluator(df_train, df_valid, df_calendar, df_prices)


HBox(children=(FloatProgress(value=0.0, max=12.0), HTML(value='')))




## Public LB Verification
Verifying the public LB calculation. This submission scores 0.48874 on the public LB and we should get the same score offline.


### Dynamic params optimization

In [18]:
score_max = 1000
params = {'param1':-1,'param2':-1}

df1 = pd.read_csv('submission_witch_to_0.47_2.csv')
df2 = pd.read_csv('submission_witch_to_0.47_3.csv')

combined = pd.DataFrame()
combined['id'] = df1['id']
cols = [f'F{i}'for i in range(1,29)]
weights1 = [round(i,2) for i in np.arange(0.2,0.4,0.01)]
weights2 = [round(i,2) for i in np.arange(0.6,0.8,0.01)]

for weight1 in weights1:
    for weight2 in weights2:
        #Generate submission file
        for col in cols:
            combined[col] = (weight1*df1[col] + weight2*df2[col])
        #Evaluate its score
        ## evaluating submission from public kernel M5 First Public Notebook Under 0.50
        ## from https://www.kaggle.com/kneroma/m5-first-public-notebook-under-0-50
        preds_valid = combined[combined.id.str.contains("validation")]
        preds_valid = preds_valid.merge(df_sample_submission[["id", "order"]], on = "id").sort_values("order").drop(["id", "order"], axis = 1).reset_index(drop = True)
        preds_valid.rename(columns = {
            "F1": "d_1914", "F2": "d_1915", "F3": "d_1916", "F4": "d_1917", "F5": "d_1918", "F6": "d_1919", "F7": "d_1920",
            "F8": "d_1921", "F9": "d_1922", "F10": "d_1923", "F11": "d_1924", "F12": "d_1925", "F13": "d_1926", "F14": "d_1927",
            "F15": "d_1928", "F16": "d_1929", "F17": "d_1930", "F18": "d_1931", "F19": "d_1932", "F20": "d_1933", "F21": "d_1934",
            "F22": "d_1935", "F23": "d_1936", "F24": "d_1937", "F25": "d_1938", "F26": "d_1939", "F27": "d_1940", "F28": "d_1941"
        }, inplace = True)

        groups, scores = evaluator.score(preds_valid)

        score_public_lb = np.mean(scores)

        print(f"\nPublic LB Score: {round(score_public_lb, 5)}")
        if round(score_public_lb, 5) < score_max:
            score_max = round(score_public_lb, 5);
            params['param1'] = weight1
            params['param2'] = weight2

Score for group all_id: 1.53369
Score for group cat_id: 1.47431
Score for group state_id: 1.42485
Score for group dept_id: 1.43936
Score for group store_id: 1.37043
Score for group item_id: 0.9352
Score for group ['state_id', 'cat_id']: 1.3572
Score for group ['state_id', 'dept_id']: 1.31312
Score for group ['store_id', 'cat_id']: 1.26908
Score for group ['store_id', 'dept_id']: 1.20055
Score for group ['item_id', 'state_id']: 0.88088
Score for group ['item_id', 'store_id']: 0.84304

Public LB Score: 1.25348
Score for group all_id: 1.45947
Score for group cat_id: 1.40336
Score for group state_id: 1.3578
Score for group dept_id: 1.37081
Score for group store_id: 1.30779
Score for group item_id: 0.9217
Score for group ['state_id', 'cat_id']: 1.2949
Score for group ['state_id', 'dept_id']: 1.25445
Score for group ['store_id', 'cat_id']: 1.21399
Score for group ['store_id', 'dept_id']: 1.15152
Score for group ['item_id', 'state_id']: 0.87393
Score for group ['item_id', 'store_id']: 0.84026

Score for group all_id: 0.37446
Score for group cat_id: 0.38599
Score for group state_id: 0.41153
Score for group dept_id: 0.42342
Score for group store_id: 0.46201
Score for group item_id: 0.8062
Score for group ['state_id', 'cat_id']: 0.44378
Score for group ['state_id', 'dept_id']: 0.4925
Score for group ['store_id', 'cat_id']: 0.50816
Score for group ['store_id', 'dept_id']: 0.57658
Score for group ['item_id', 'state_id']: 0.81664
Score for group ['item_id', 'store_id']: 0.82054

Public LB Score: 0.54348
Score for group all_id: 0.31154
Score for group cat_id: 0.33259
Score for group state_id: 0.36389
Score for group dept_id: 0.38367
Score for group store_id: 0.4283
Score for group item_id: 0.80521
Score for group ['state_id', 'cat_id']: 0.40581
Score for group ['state_id', 'dept_id']: 0.46634
Score for group ['store_id', 'cat_id']: 0.48387
Score for group ['store_id', 'dept_id']: 0.5624
Score for group ['item_id', 'state_id']: 0.81626
Score for group ['item_id', 'store_id']: 0.8207

Score for group all_id: 0.65163
Score for group cat_id: 0.63738
Score for group state_id: 0.64045
Score for group dept_id: 0.64178
Score for group store_id: 0.65064
Score for group item_id: 0.81894
Score for group ['state_id', 'cat_id']: 0.63955
Score for group ['state_id', 'dept_id']: 0.6527
Score for group ['store_id', 'cat_id']: 0.65311
Score for group ['store_id', 'dept_id']: 0.67671
Score for group ['item_id', 'state_id']: 0.82256
Score for group ['item_id', 'store_id']: 0.82162

Public LB Score: 0.69226
Score for group all_id: 0.58027
Score for group cat_id: 0.57124
Score for group state_id: 0.57965
Score for group dept_id: 0.58162
Score for group store_id: 0.59796
Score for group item_id: 0.8144
Score for group ['state_id', 'cat_id']: 0.58617
Score for group ['state_id', 'dept_id']: 0.60677
Score for group ['store_id', 'cat_id']: 0.6112
Score for group ['store_id', 'dept_id']: 0.64573
Score for group ['item_id', 'state_id']: 0.8204
Score for group ['item_id', 'store_id']: 0.8210

Score for group all_id: 0.94238
Score for group cat_id: 0.91075
Score for group state_id: 0.89457
Score for group dept_id: 0.89801
Score for group store_id: 0.87895
Score for group item_id: 0.84548
Score for group ['state_id', 'cat_id']: 0.86803
Score for group ['state_id', 'dept_id']: 0.85723
Score for group ['store_id', 'cat_id']: 0.84246
Score for group ['store_id', 'dept_id']: 0.82874
Score for group ['item_id', 'state_id']: 0.8355
Score for group ['item_id', 'store_id']: 0.82578

Public LB Score: 0.86899
Score for group all_id: 0.86914
Score for group cat_id: 0.84147
Score for group state_id: 0.82988
Score for group dept_id: 0.83234
Score for group store_id: 0.82002
Score for group item_id: 0.83758
Score for group ['state_id', 'cat_id']: 0.80926
Score for group ['state_id', 'dept_id']: 0.80373
Score for group ['store_id', 'cat_id']: 0.7927
Score for group ['store_id', 'dept_id']: 0.7874
Score for group ['item_id', 'state_id']: 0.83161
Score for group ['item_id', 'store_id']: 0.824

Score for group all_id: 1.23716
Score for group cat_id: 1.1909
Score for group state_id: 1.15769
Score for group dept_id: 1.16606
Score for group store_id: 1.12152
Score for group item_id: 0.88466
Score for group ['state_id', 'cat_id']: 1.10945
Score for group ['state_id', 'dept_id']: 1.08052
Score for group ['store_id', 'cat_id']: 1.05105
Score for group ['store_id', 'dept_id']: 1.00775
Score for group ['item_id', 'state_id']: 0.85505
Score for group ['item_id', 'store_id']: 0.83293

Public LB Score: 1.0579
Score for group all_id: 1.16325
Score for group cat_id: 1.12048
Score for group state_id: 1.09141
Score for group dept_id: 1.09841
Score for group store_id: 1.06009
Score for group item_id: 0.87375
Score for group ['state_id', 'cat_id']: 1.04835
Score for group ['state_id', 'dept_id']: 1.02359
Score for group ['store_id', 'cat_id']: 0.99777
Score for group ['store_id', 'dept_id']: 0.96137
Score for group ['item_id', 'state_id']: 0.84956
Score for group ['item_id', 'store_id']: 0.83

Score for group all_id: 0.18516
Score for group cat_id: 0.23595
Score for group state_id: 0.28186
Score for group dept_id: 0.34229
Score for group store_id: 0.38033
Score for group item_id: 0.80728
Score for group ['state_id', 'cat_id']: 0.3459
Score for group ['state_id', 'dept_id']: 0.44206
Score for group ['store_id', 'cat_id']: 0.44958
Score for group ['store_id', 'dept_id']: 0.54906
Score for group ['item_id', 'state_id']: 0.81766
Score for group ['item_id', 'store_id']: 0.82246

Public LB Score: 0.47163
Score for group all_id: 0.18935
Score for group cat_id: 0.23932
Score for group state_id: 0.28649
Score for group dept_id: 0.36082
Score for group store_id: 0.38402
Score for group item_id: 0.80979
Score for group ['state_id', 'cat_id']: 0.34987
Score for group ['state_id', 'dept_id']: 0.45259
Score for group ['store_id', 'cat_id']: 0.45192
Score for group ['store_id', 'dept_id']: 0.55452
Score for group ['item_id', 'state_id']: 0.81906
Score for group ['item_id', 'store_id']: 0.8

Score for group all_id: 0.37384
Score for group cat_id: 0.38539
Score for group state_id: 0.41119
Score for group dept_id: 0.4226
Score for group store_id: 0.46212
Score for group item_id: 0.80583
Score for group ['state_id', 'cat_id']: 0.44349
Score for group ['state_id', 'dept_id']: 0.49203
Score for group ['store_id', 'cat_id']: 0.50836
Score for group ['store_id', 'dept_id']: 0.57669
Score for group ['item_id', 'state_id']: 0.81642
Score for group ['item_id', 'store_id']: 0.82048

Public LB Score: 0.5432
Score for group all_id: 0.31076
Score for group cat_id: 0.332
Score for group state_id: 0.36345
Score for group dept_id: 0.38286
Score for group store_id: 0.42845
Score for group item_id: 0.80485
Score for group ['state_id', 'cat_id']: 0.40551
Score for group ['state_id', 'dept_id']: 0.46593
Score for group ['store_id', 'cat_id']: 0.48414
Score for group ['store_id', 'dept_id']: 0.56255
Score for group ['item_id', 'state_id']: 0.81605
Score for group ['item_id', 'store_id']: 0.8206

Score for group all_id: 0.65136
Score for group cat_id: 0.63687
Score for group state_id: 0.64036
Score for group dept_id: 0.64117
Score for group store_id: 0.65067
Score for group item_id: 0.81858
Score for group ['state_id', 'cat_id']: 0.63933
Score for group ['state_id', 'dept_id']: 0.65228
Score for group ['store_id', 'cat_id']: 0.6531
Score for group ['store_id', 'dept_id']: 0.67663
Score for group ['item_id', 'state_id']: 0.82234
Score for group ['item_id', 'store_id']: 0.82157

Public LB Score: 0.69202
Score for group all_id: 0.57994
Score for group cat_id: 0.5707
Score for group state_id: 0.57953
Score for group dept_id: 0.58096
Score for group store_id: 0.59799
Score for group item_id: 0.81405
Score for group ['state_id', 'cat_id']: 0.58593
Score for group ['state_id', 'dept_id']: 0.60632
Score for group ['store_id', 'cat_id']: 0.61122
Score for group ['store_id', 'dept_id']: 0.6457
Score for group ['item_id', 'state_id']: 0.82018
Score for group ['item_id', 'store_id']: 0.821

Score for group all_id: 0.94224
Score for group cat_id: 0.91032
Score for group state_id: 0.89458
Score for group dept_id: 0.89756
Score for group store_id: 0.87901
Score for group item_id: 0.84515
Score for group ['state_id', 'cat_id']: 0.86786
Score for group ['state_id', 'dept_id']: 0.85693
Score for group ['store_id', 'cat_id']: 0.84241
Score for group ['store_id', 'dept_id']: 0.82857
Score for group ['item_id', 'state_id']: 0.83529
Score for group ['item_id', 'store_id']: 0.82573

Public LB Score: 0.8688
Score for group all_id: 0.86898
Score for group cat_id: 0.84103
Score for group state_id: 0.82987
Score for group dept_id: 0.83186
Score for group store_id: 0.82008
Score for group item_id: 0.83725
Score for group ['state_id', 'cat_id']: 0.80908
Score for group ['state_id', 'dept_id']: 0.80341
Score for group ['store_id', 'cat_id']: 0.79265
Score for group ['store_id', 'dept_id']: 0.78724
Score for group ['item_id', 'state_id']: 0.8314
Score for group ['item_id', 'store_id']: 0.82

Score for group all_id: 0.32748
Score for group cat_id: 0.35562
Score for group state_id: 0.38208
Score for group dept_id: 0.47718
Score for group store_id: 0.44778
Score for group item_id: 0.82205
Score for group ['state_id', 'cat_id']: 0.42592
Score for group ['state_id', 'dept_id']: 0.5268
Score for group ['store_id', 'cat_id']: 0.49941
Score for group ['store_id', 'dept_id']: 0.59956
Score for group ['item_id', 'state_id']: 0.82575
Score for group ['item_id', 'store_id']: 0.82741

Public LB Score: 0.54309
Score for group all_id: 0.39185
Score for group cat_id: 0.41319
Score for group state_id: 0.43079
Score for group dept_id: 0.52708
Score for group store_id: 0.4833
Score for group item_id: 0.82777
Score for group ['state_id', 'cat_id']: 0.46652
Score for group ['state_id', 'dept_id']: 0.562
Score for group ['store_id', 'cat_id']: 0.52696
Score for group ['store_id', 'dept_id']: 0.62298
Score for group ['item_id', 'state_id']: 0.82886
Score for group ['item_id', 'store_id']: 0.8291

Score for group all_id: 0.18378
Score for group cat_id: 0.23546
Score for group state_id: 0.28111
Score for group dept_id: 0.34192
Score for group store_id: 0.3805
Score for group item_id: 0.807
Score for group ['state_id', 'cat_id']: 0.34568
Score for group ['state_id', 'dept_id']: 0.44198
Score for group ['store_id', 'cat_id']: 0.44998
Score for group ['store_id', 'dept_id']: 0.54931
Score for group ['item_id', 'state_id']: 0.81749
Score for group ['item_id', 'store_id']: 0.82242

Public LB Score: 0.47139
Score for group all_id: 0.18795
Score for group cat_id: 0.23872
Score for group state_id: 0.28573
Score for group dept_id: 0.36061
Score for group store_id: 0.38412
Score for group item_id: 0.80952
Score for group ['state_id', 'cat_id']: 0.34962
Score for group ['state_id', 'dept_id']: 0.45257
Score for group ['store_id', 'cat_id']: 0.45227
Score for group ['store_id', 'dept_id']: 0.55475
Score for group ['item_id', 'state_id']: 0.81891
Score for group ['item_id', 'store_id']: 0.823

Score for group all_id: 0.37329
Score for group cat_id: 0.38488
Score for group state_id: 0.41093
Score for group dept_id: 0.42189
Score for group store_id: 0.46232
Score for group item_id: 0.80553
Score for group ['state_id', 'cat_id']: 0.44328
Score for group ['state_id', 'dept_id']: 0.49167
Score for group ['store_id', 'cat_id']: 0.50866
Score for group ['store_id', 'dept_id']: 0.57691
Score for group ['item_id', 'state_id']: 0.81624
Score for group ['item_id', 'store_id']: 0.82045

Public LB Score: 0.543
Score for group all_id: 0.31007
Score for group cat_id: 0.33152
Score for group state_id: 0.36309
Score for group dept_id: 0.38217
Score for group store_id: 0.4287
Score for group item_id: 0.80455
Score for group ['state_id', 'cat_id']: 0.4053
Score for group ['state_id', 'dept_id']: 0.46562
Score for group ['store_id', 'cat_id']: 0.48452
Score for group ['store_id', 'dept_id']: 0.5628
Score for group ['item_id', 'state_id']: 0.81588
Score for group ['item_id', 'store_id']: 0.82065

Score for group all_id: 0.65112
Score for group cat_id: 0.6364
Score for group state_id: 0.64031
Score for group dept_id: 0.64063
Score for group store_id: 0.65076
Score for group item_id: 0.81829
Score for group ['state_id', 'cat_id']: 0.63916
Score for group ['state_id', 'dept_id']: 0.65193
Score for group ['store_id', 'cat_id']: 0.65315
Score for group ['store_id', 'dept_id']: 0.67663
Score for group ['item_id', 'state_id']: 0.82217
Score for group ['item_id', 'store_id']: 0.82154

Public LB Score: 0.69184
Score for group all_id: 0.57966
Score for group cat_id: 0.57022
Score for group state_id: 0.57945
Score for group dept_id: 0.58038
Score for group store_id: 0.5981
Score for group item_id: 0.81375
Score for group ['state_id', 'cat_id']: 0.58576
Score for group ['state_id', 'dept_id']: 0.60596
Score for group ['store_id', 'cat_id']: 0.61132
Score for group ['store_id', 'dept_id']: 0.64577
Score for group ['item_id', 'state_id']: 0.82001
Score for group ['item_id', 'store_id']: 0.82

Score for group all_id: 0.59924
Score for group cat_id: 0.60437
Score for group state_id: 0.59942
Score for group dept_id: 0.69202
Score for group store_id: 0.6186
Score for group item_id: 0.84931
Score for group ['state_id', 'cat_id']: 0.61286
Score for group ['state_id', 'dept_id']: 0.68725
Score for group ['store_id', 'cat_id']: 0.63582
Score for group ['store_id', 'dept_id']: 0.71284
Score for group ['item_id', 'state_id']: 0.84052
Score for group ['item_id', 'store_id']: 0.83526

Public LB Score: 0.69063
Score for group all_id: 0.67091
Score for group cat_id: 0.67144
Score for group state_id: 0.66024
Score for group dept_id: 0.75012
Score for group store_id: 0.67055
Score for group item_id: 0.85792
Score for group ['state_id', 'cat_id']: 0.66696
Score for group ['state_id', 'dept_id']: 0.73358
Score for group ['store_id', 'cat_id']: 0.67869
Score for group ['store_id', 'dept_id']: 0.7479
Score for group ['item_id', 'state_id']: 0.8452
Score for group ['item_id', 'store_id']: 0.837

Score for group all_id: 0.32663
Score for group cat_id: 0.35517
Score for group state_id: 0.38154
Score for group dept_id: 0.47742
Score for group store_id: 0.44773
Score for group item_id: 0.82188
Score for group ['state_id', 'cat_id']: 0.42569
Score for group ['state_id', 'dept_id']: 0.52701
Score for group ['store_id', 'cat_id']: 0.49957
Score for group ['store_id', 'dept_id']: 0.59979
Score for group ['item_id', 'state_id']: 0.82565
Score for group ['item_id', 'store_id']: 0.8274

Public LB Score: 0.54296
Score for group all_id: 0.39111
Score for group cat_id: 0.41281
Score for group state_id: 0.4303
Score for group dept_id: 0.52737
Score for group store_id: 0.48317
Score for group item_id: 0.82763
Score for group ['state_id', 'cat_id']: 0.4663
Score for group ['state_id', 'dept_id']: 0.56224
Score for group ['store_id', 'cat_id']: 0.52703
Score for group ['store_id', 'dept_id']: 0.6232
Score for group ['item_id', 'state_id']: 0.82876
Score for group ['item_id', 'store_id']: 0.8291

Score for group all_id: 0.18255
Score for group cat_id: 0.23513
Score for group state_id: 0.28048
Score for group dept_id: 0.34169
Score for group store_id: 0.38078
Score for group item_id: 0.80678
Score for group ['state_id', 'cat_id']: 0.34556
Score for group ['state_id', 'dept_id']: 0.442
Score for group ['store_id', 'cat_id']: 0.4505
Score for group ['store_id', 'dept_id']: 0.54966
Score for group ['item_id', 'state_id']: 0.81737
Score for group ['item_id', 'store_id']: 0.82242

Public LB Score: 0.47124
Score for group all_id: 0.18669
Score for group cat_id: 0.23826
Score for group state_id: 0.28507
Score for group dept_id: 0.36053
Score for group store_id: 0.38433
Score for group item_id: 0.80931
Score for group ['state_id', 'cat_id']: 0.34948
Score for group ['state_id', 'dept_id']: 0.45266
Score for group ['store_id', 'cat_id']: 0.45272
Score for group ['store_id', 'dept_id']: 0.55509
Score for group ['item_id', 'state_id']: 0.81879
Score for group ['item_id', 'store_id']: 0.823

Score for group all_id: 0.37282
Score for group cat_id: 0.38444
Score for group state_id: 0.41075
Score for group dept_id: 0.42129
Score for group store_id: 0.46263
Score for group item_id: 0.80528
Score for group ['state_id', 'cat_id']: 0.44315
Score for group ['state_id', 'dept_id']: 0.4914
Score for group ['store_id', 'cat_id']: 0.50906
Score for group ['store_id', 'dept_id']: 0.57723
Score for group ['item_id', 'state_id']: 0.81611
Score for group ['item_id', 'store_id']: 0.82044

Public LB Score: 0.54288
Score for group all_id: 0.30947
Score for group cat_id: 0.33113
Score for group state_id: 0.36282
Score for group dept_id: 0.3816
Score for group store_id: 0.42907
Score for group item_id: 0.80432
Score for group ['state_id', 'cat_id']: 0.40518
Score for group ['state_id', 'dept_id']: 0.46542
Score for group ['store_id', 'cat_id']: 0.485
Score for group ['store_id', 'dept_id']: 0.56316
Score for group ['item_id', 'state_id']: 0.81575
Score for group ['item_id', 'store_id']: 0.8206

Score for group all_id: 0.88872
Score for group cat_id: 0.87721
Score for group state_id: 0.84944
Score for group dept_id: 0.93091
Score for group store_id: 0.83795
Score for group item_id: 0.8877
Score for group ['state_id', 'cat_id']: 0.83795
Score for group ['state_id', 'dept_id']: 0.88207
Score for group ['store_id', 'cat_id']: 0.81969
Score for group ['store_id', 'dept_id']: 0.86434
Score for group ['item_id', 'state_id']: 0.86141
Score for group ['item_id', 'store_id']: 0.84587

Public LB Score: 0.86527
Score for group all_id: 0.57942
Score for group cat_id: 0.56979
Score for group state_id: 0.57943
Score for group dept_id: 0.57987
Score for group store_id: 0.59827
Score for group item_id: 0.81351
Score for group ['state_id', 'cat_id']: 0.58564
Score for group ['state_id', 'dept_id']: 0.60566
Score for group ['store_id', 'cat_id']: 0.6115
Score for group ['store_id', 'dept_id']: 0.64592
Score for group ['item_id', 'state_id']: 0.81988
Score for group ['item_id', 'store_id']: 0.82

Score for group all_id: 0.59874
Score for group cat_id: 0.60421
Score for group state_id: 0.59906
Score for group dept_id: 0.6924
Score for group store_id: 0.61835
Score for group item_id: 0.84927
Score for group ['state_id', 'cat_id']: 0.61272
Score for group ['state_id', 'dept_id']: 0.68758
Score for group ['store_id', 'cat_id']: 0.63576
Score for group ['store_id', 'dept_id']: 0.71307
Score for group ['item_id', 'state_id']: 0.84048
Score for group ['item_id', 'store_id']: 0.83529

Public LB Score: 0.69058
Score for group all_id: 0.67045
Score for group cat_id: 0.67132
Score for group state_id: 0.6599
Score for group dept_id: 0.75048
Score for group store_id: 0.67028
Score for group item_id: 0.85791
Score for group ['state_id', 'cat_id']: 0.66683
Score for group ['state_id', 'dept_id']: 0.7339
Score for group ['store_id', 'cat_id']: 0.6786
Score for group ['store_id', 'dept_id']: 0.74813
Score for group ['item_id', 'state_id']: 0.84517
Score for group ['item_id', 'store_id']: 0.8376

Score for group all_id: 0.32587
Score for group cat_id: 0.3548
Score for group state_id: 0.38108
Score for group dept_id: 0.47774
Score for group store_id: 0.44776
Score for group item_id: 0.82177
Score for group ['state_id', 'cat_id']: 0.42555
Score for group ['state_id', 'dept_id']: 0.52732
Score for group ['store_id', 'cat_id']: 0.49983
Score for group ['store_id', 'dept_id']: 0.60012
Score for group ['item_id', 'state_id']: 0.82559
Score for group ['item_id', 'store_id']: 0.82742

Public LB Score: 0.54291
Score for group all_id: 0.39045
Score for group cat_id: 0.41251
Score for group state_id: 0.42987
Score for group dept_id: 0.52775
Score for group store_id: 0.48312
Score for group item_id: 0.82754
Score for group ['state_id', 'cat_id']: 0.46615
Score for group ['state_id', 'dept_id']: 0.56258
Score for group ['store_id', 'cat_id']: 0.5272
Score for group ['store_id', 'dept_id']: 0.6235
Score for group ['item_id', 'state_id']: 0.82871
Score for group ['item_id', 'store_id']: 0.829

Score for group all_id: 0.18147
Score for group cat_id: 0.23496
Score for group state_id: 0.27996
Score for group dept_id: 0.3416
Score for group store_id: 0.38118
Score for group item_id: 0.80662
Score for group ['state_id', 'cat_id']: 0.34557
Score for group ['state_id', 'dept_id']: 0.44215
Score for group ['store_id', 'cat_id']: 0.45112
Score for group ['store_id', 'dept_id']: 0.55012
Score for group ['item_id', 'state_id']: 0.81729
Score for group ['item_id', 'store_id']: 0.82245

Public LB Score: 0.47121
Score for group all_id: 0.18557
Score for group cat_id: 0.23796
Score for group state_id: 0.28452
Score for group dept_id: 0.36058
Score for group store_id: 0.38465
Score for group item_id: 0.80917
Score for group ['state_id', 'cat_id']: 0.34945
Score for group ['state_id', 'dept_id']: 0.45287
Score for group ['store_id', 'cat_id']: 0.45329
Score for group ['store_id', 'dept_id']: 0.55552
Score for group ['item_id', 'state_id']: 0.81872
Score for group ['item_id', 'store_id']: 0.8

Score for group all_id: 0.37243
Score for group cat_id: 0.38409
Score for group state_id: 0.41064
Score for group dept_id: 0.42081
Score for group store_id: 0.46304
Score for group item_id: 0.8051
Score for group ['state_id', 'cat_id']: 0.4431
Score for group ['state_id', 'dept_id']: 0.49123
Score for group ['store_id', 'cat_id']: 0.50956
Score for group ['store_id', 'dept_id']: 0.57764
Score for group ['item_id', 'state_id']: 0.81602
Score for group ['item_id', 'store_id']: 0.82048

Public LB Score: 0.54284
Score for group all_id: 0.30896
Score for group cat_id: 0.33084
Score for group state_id: 0.36264
Score for group dept_id: 0.38116
Score for group store_id: 0.42955
Score for group item_id: 0.80414
Score for group ['state_id', 'cat_id']: 0.40516
Score for group ['state_id', 'dept_id']: 0.46532
Score for group ['store_id', 'cat_id']: 0.48559
Score for group ['store_id', 'dept_id']: 0.56361
Score for group ['item_id', 'state_id']: 0.81566
Score for group ['item_id', 'store_id']: 0.82

Score for group all_id: 0.88836
Score for group cat_id: 0.87719
Score for group state_id: 0.84915
Score for group dept_id: 0.9312
Score for group store_id: 0.83767
Score for group item_id: 0.8878
Score for group ['state_id', 'cat_id']: 0.83788
Score for group ['state_id', 'dept_id']: 0.88236
Score for group ['store_id', 'cat_id']: 0.81959
Score for group ['store_id', 'dept_id']: 0.86456
Score for group ['item_id', 'state_id']: 0.86144
Score for group ['item_id', 'store_id']: 0.84593

Public LB Score: 0.86526
Score for group all_id: 0.96173
Score for group cat_id: 0.9468
Score for group state_id: 0.91384
Score for group dept_id: 0.99312
Score for group store_id: 0.89618
Score for group item_id: 0.89898
Score for group ['state_id', 'cat_id']: 0.89693
Score for group ['state_id', 'dept_id']: 0.9342
Score for group ['store_id', 'cat_id']: 0.86959
Score for group ['store_id', 'dept_id']: 0.90624
Score for group ['item_id', 'state_id']: 0.86755
Score for group ['item_id', 'store_id']: 0.8489

Score for group all_id: 0.59829
Score for group cat_id: 0.60411
Score for group state_id: 0.59874
Score for group dept_id: 0.69284
Score for group store_id: 0.61817
Score for group item_id: 0.84929
Score for group ['state_id', 'cat_id']: 0.61264
Score for group ['state_id', 'dept_id']: 0.68798
Score for group ['store_id', 'cat_id']: 0.63578
Score for group ['store_id', 'dept_id']: 0.71338
Score for group ['item_id', 'state_id']: 0.84049
Score for group ['item_id', 'store_id']: 0.83534

Public LB Score: 0.69059
Score for group all_id: 0.67003
Score for group cat_id: 0.67125
Score for group state_id: 0.6596
Score for group dept_id: 0.7509
Score for group store_id: 0.67006
Score for group item_id: 0.85796
Score for group ['state_id', 'cat_id']: 0.66675
Score for group ['state_id', 'dept_id']: 0.73428
Score for group ['store_id', 'cat_id']: 0.67858
Score for group ['store_id', 'dept_id']: 0.74842
Score for group ['item_id', 'state_id']: 0.84518
Score for group ['item_id', 'store_id']: 0.83

Score for group all_id: 0.3252
Score for group cat_id: 0.35453
Score for group state_id: 0.38069
Score for group dept_id: 0.47815
Score for group store_id: 0.44789
Score for group item_id: 0.82172
Score for group ['state_id', 'cat_id']: 0.4255
Score for group ['state_id', 'dept_id']: 0.52772
Score for group ['store_id', 'cat_id']: 0.50019
Score for group ['store_id', 'dept_id']: 0.60054
Score for group ['item_id', 'state_id']: 0.82557
Score for group ['item_id', 'store_id']: 0.82748

Public LB Score: 0.54293
Score for group all_id: 0.38986
Score for group cat_id: 0.41229
Score for group state_id: 0.42951
Score for group dept_id: 0.5282
Score for group store_id: 0.48316
Score for group item_id: 0.82751
Score for group ['state_id', 'cat_id']: 0.46608
Score for group ['state_id', 'dept_id']: 0.56299
Score for group ['store_id', 'cat_id']: 0.52745
Score for group ['store_id', 'dept_id']: 0.62389
Score for group ['item_id', 'state_id']: 0.8287
Score for group ['item_id', 'store_id']: 0.8291

Score for group all_id: 0.18053
Score for group cat_id: 0.23493
Score for group state_id: 0.27955
Score for group dept_id: 0.34166
Score for group store_id: 0.3817
Score for group item_id: 0.80652
Score for group ['state_id', 'cat_id']: 0.34569
Score for group ['state_id', 'dept_id']: 0.44241
Score for group ['store_id', 'cat_id']: 0.45186
Score for group ['store_id', 'dept_id']: 0.55067
Score for group ['item_id', 'state_id']: 0.81726
Score for group ['item_id', 'store_id']: 0.8225

Public LB Score: 0.47127
Score for group all_id: 0.1846
Score for group cat_id: 0.23781
Score for group state_id: 0.28408
Score for group dept_id: 0.36076
Score for group store_id: 0.38508
Score for group item_id: 0.80908
Score for group ['state_id', 'cat_id']: 0.34954
Score for group ['state_id', 'dept_id']: 0.45319
Score for group ['store_id', 'cat_id']: 0.45397
Score for group ['store_id', 'dept_id']: 0.55606
Score for group ['item_id', 'state_id']: 0.81869
Score for group ['item_id', 'store_id']: 0.823

Score for group all_id: 1.1827
Score for group cat_id: 1.15735
Score for group state_id: 1.11049
Score for group dept_id: 1.18373
Score for group store_id: 1.07627
Score for group item_id: 0.93608
Score for group ['state_id', 'cat_id']: 1.07784
Score for group ['state_id', 'dept_id']: 1.09547
Score for group ['store_id', 'cat_id']: 1.02531
Score for group ['store_id', 'dept_id']: 1.0379
Score for group ['item_id', 'state_id']: 0.8879
Score for group ['item_id', 'store_id']: 0.85914

Public LB Score: 1.05251
Score for group all_id: 1.25668
Score for group cat_id: 1.22793
Score for group state_id: 1.17677
Score for group dept_id: 1.24864
Score for group store_id: 1.13751
Score for group item_id: 0.94947
Score for group ['state_id', 'cat_id']: 1.13908
Score for group ['state_id', 'dept_id']: 1.1507
Score for group ['store_id', 'cat_id']: 1.07868
Score for group ['store_id', 'dept_id']: 1.08349
Score for group ['item_id', 'state_id']: 0.89528
Score for group ['item_id', 'store_id']: 0.8628

In [21]:
params

{'param1': 0.33, 'param2': 0.67}

In [71]:
## evaluating submission from public kernel M5 First Public Notebook Under 0.50
## from https://www.kaggle.com/kneroma/m5-first-public-notebook-under-0-50
preds_valid = df_submission[df_submission.id.str.contains("validation")]
preds_valid = preds_valid.merge(df_sample_submission[["id", "order"]], on = "id").sort_values("order").drop(["id", "order"], axis = 1).reset_index(drop = True)
preds_valid.rename(columns = {
    "F1": "d_1914", "F2": "d_1915", "F3": "d_1916", "F4": "d_1917", "F5": "d_1918", "F6": "d_1919", "F7": "d_1920",
    "F8": "d_1921", "F9": "d_1922", "F10": "d_1923", "F11": "d_1924", "F12": "d_1925", "F13": "d_1926", "F14": "d_1927",
    "F15": "d_1928", "F16": "d_1929", "F17": "d_1930", "F18": "d_1931", "F19": "d_1932", "F20": "d_1933", "F21": "d_1934",
    "F22": "d_1935", "F23": "d_1936", "F24": "d_1937", "F25": "d_1938", "F26": "d_1939", "F27": "d_1940", "F28": "d_1941"
}, inplace = True)

groups, scores = evaluator.score(preds_valid)

score_public_lb = np.mean(scores)

for i in range(len(groups)):
    print(f"Score for group {groups[i]}: {round(scores[i], 5)}")

print(f"\nPublic LB Score: {round(score_public_lb, 5)}")

Score for group all_id: 0.39495
Score for group cat_id: 0.41311
Score for group state_id: 0.4313
Score for group dept_id: 0.45867
Score for group store_id: 0.48321
Score for group item_id: 0.7795
Score for group ['state_id', 'cat_id']: 0.47042
Score for group ['state_id', 'dept_id']: 0.51845
Score for group ['store_id', 'cat_id']: 0.53186
Score for group ['store_id', 'dept_id']: 0.59193
Score for group ['item_id', 'state_id']: 0.79972
Score for group ['item_id', 'store_id']: 0.81025

Public LB Score: 0.55695


## 0. Global
Lets look at the overall metrics and errors.


In [None]:
levels = ["all", "category", "state", "department", "store", "item", "state_category", "state_department",
          "store_category", "store_department", "state_item", "store_item"]

df_levels = pd.DataFrame({
    "level": levels,
    "score": scores
})

source_1 = ColumnDataSource(data = dict(
    level = df_levels.level.values,
    score = df_levels.score.values
))

tooltips = [
    ("Level", "@level"),
    ("Score", "@score")
]

v1 = figure(plot_width = 650, plot_height = 400, y_range = df_levels.level.values, tooltips = tooltips, title = "Scores by all aggregation levels")
v1.hbar(y = "level", right = "score", source = source_1, height = 0.75, alpha = 0.6, legend_label = "Public LB Score")

mean = Span(location = np.mean(df_levels.score.values), dimension = "height", line_color = "grey", line_dash = "dashed", line_width = 1.5)
v1.add_layout(mean)

v1.xaxis.axis_label = "WRMSSE Score"
v1.yaxis.axis_label = "Aggregation Level"

v1.legend.location = "bottom_right"


df_levels.sort_values("score", inplace = True)

source_2 = ColumnDataSource(data = dict(
    level = df_levels.level.values,
    score = df_levels.score.values
))

v2 = figure(plot_width = 650, plot_height = 400, y_range = df_levels.level.values, tooltips = tooltips, title = "Scores by all aggregation levels")
v2.hbar(y = "level", right = "score", source = source_2, height = 0.75, alpha = 0.6, legend_label = "Public LB Score")

mean = Span(location = np.mean(df_levels.score.values), dimension = "height", line_color = "grey", line_dash = "dashed", line_width = 1.5)
v2.add_layout(mean)

v2.xaxis.axis_label = "WRMSSE Score"
v2.yaxis.axis_label = "Aggregation Level"

v2.legend.location = "bottom_right"


df_items = df_levels[df_levels.level.str.contains("item")]

source_3 = ColumnDataSource(data = dict(
    level = df_items.level.values,
    score = df_items.score.values
))

v3 = figure(plot_width = 330, plot_height = 200, y_range = df_items.level.values, x_range = Range1d(0, 1), tooltips = tooltips, title = "Scores by item levels")
v3.hbar(y = "level", right = "score", source = source_3, height = 0.75, color = "mediumseagreen", alpha = 0.6)

mean = Span(location = np.mean(df_items.score.values), dimension = "height", line_color = "grey", line_dash = "dashed", line_width = 1.5)
v3.add_layout(mean)

v3.xaxis.axis_label = "WRMSSE Score"
v3.yaxis.axis_label = "Aggregation Level"


df_stores = df_levels[df_levels.level.str.contains("store")]

source_4 = ColumnDataSource(data = dict(
    level = df_stores.level.values,
    score = df_stores.score.values
))

v4 = figure(plot_width = 330, plot_height = 200, y_range = df_stores.level.values, x_range = Range1d(0, 1), tooltips = tooltips, title = "Scores by store levels")
v4.hbar(y = "level", right = "score", source = source_4, height = 0.75, color = "mediumseagreen", alpha = 0.6)

mean = Span(location = np.mean(df_stores.score.values), dimension = "height", line_color = "grey", line_dash = "dashed", line_width = 1.5)
v4.add_layout(mean)

v4.xaxis.axis_label = "WRMSSE Score"
v4.yaxis.axis_label = "Aggregation Level"


df_departments = df_levels[df_levels.level.str.contains("dep")]

source_5 = ColumnDataSource(data = dict(
    level = df_departments.level.values,
    score = df_departments.score.values
))

v5 = figure(plot_width = 330, plot_height = 200, y_range = df_departments.level.values, x_range = Range1d(0, 1), tooltips = tooltips, title = "Scores by department levels")
v5.hbar(y = "level", right = "score", source = source_5, height = 0.75, color = "mediumseagreen", alpha = 0.6)

mean = Span(location = np.mean(df_departments.score.values), dimension = "height", line_color = "grey", line_dash = "dashed", line_width = 1.5)
v5.add_layout(mean)

v5.xaxis.axis_label = "WRMSSE Score"
v5.yaxis.axis_label = "Aggregation Level"


df_states = df_levels[df_levels.level.str.contains("state")]

source_6 = ColumnDataSource(data = dict(
    level = df_states.level.values,
    score = df_states.score.values
))

v6 = figure(plot_width = 330, plot_height = 200, y_range = df_states.level.values, x_range = Range1d(0, 1), tooltips = tooltips, title = "Scores by state levels")
v6.hbar(y = "level", right = "score", source = source_6, height = 0.75, color = "mediumseagreen", alpha = 0.6)

mean = Span(location = np.mean(df_states.score.values), dimension = "height", line_color = "grey", line_dash = "dashed", line_width = 1.5)
v6.add_layout(mean)

v6.xaxis.axis_label = "WRMSSE Score"
v6.yaxis.axis_label = "Aggregation Level"


df_categories = df_levels[df_levels.level.str.contains("cat")]

source_7 = ColumnDataSource(data = dict(
    level = df_categories.level.values,
    score = df_categories.score.values
))

v7 = figure(plot_width = 330, plot_height = 200, y_range = df_categories.level.values, x_range = Range1d(0, 1), tooltips = tooltips, title = "Scores by category levels")
v7.hbar(y = "level", right = "score", source = source_7, height = 0.75, color = "mediumseagreen", alpha = 0.6)

mean = Span(location = np.mean(df_categories.score.values), dimension = "height", line_color = "grey", line_dash = "dashed", line_width = 1.5)
v7.add_layout(mean)

v7.xaxis.axis_label = "WRMSSE Score"
v7.yaxis.axis_label = "Aggregation Level"


show(column(v1, v2, row(v3, v4), row(v5, v6), v7))


It's clear that the item level aggregations are the hardest to predict. It is inituitive as well due to high volatility and changes in inventory, demand and consumption.

The more rolled up the aggregate levels are the lower the scores are. This is intuitive as well since rolling up tends to cancel out positive and negative errors at granular levels. That is why the singular levels have better scores than the coupled ones. And when all levels are rolled up into ***all*** it is the best.

## 1. Store-Item
Let's look at the most relevant store-item combinations of interest.

In [None]:
df_store_item = evaluator.get_scores(preds_valid, 12)
df_store_item["store_item_id"] = df_store_item.store_id + "-" + df_store_item.item_id

df_store_item_best = df_store_item.sort_values("score_weighted").head(10).rename(columns = {"score_weighted": "score_best"})
df_store_item_worst = df_store_item.sort_values("score_weighted", ascending = False).head(10).sort_values("score_weighted").rename(columns = {"score_weighted": "score_worst"})

df_store_item_best_worst = pd.concat([df_store_item_best, df_store_item_worst])

source_1 = ColumnDataSource(data = dict(
    store_item_id = df_store_item_best_worst.store_item_id.values,
    score_best = df_store_item_best_worst.score_best.values,
    score_worst = df_store_item_best_worst.score_worst.values
))

tooltips_1 = [
    ("Store-Item", "@store_item_id"),
    ("Score", "@score_best{0.0000}")
]

tooltips_2 = [
    ("Store-Item", "@store_item_id"),
    ("Score", "@score_worst{0.0000}")
]

v1 = figure(plot_width = 700, plot_height = 400, y_range = df_store_item_best_worst.store_item_id.values, title = "Best and Worst Store-Item")
v11 = v1.hbar("store_item_id", right = "score_best", source = source_1, height = 0.75, alpha = 0.6, color = "green")
v12 = v1.hbar("store_item_id", right = "score_worst", source = source_1, height = 0.75, alpha = 0.6, color = "red")

v1.add_tools(HoverTool(renderers = [v11], tooltips = tooltips_1))
v1.add_tools(HoverTool(renderers = [v12], tooltips = tooltips_2))

v1.xaxis.axis_label = "WRMSSE Score"
v1.yaxis.axis_label = "Store-Item"


def get_store_item_plot(df, store_item_id):
    """
    Plots the actual and predicted values of store-item
    """
    actual_dates = ["d_" + str(x) for x in range(1914, 1942)]
    predicted_dates = ["pred_d_" + str(x) for x in range(1914, 1942)]
    
    source = ColumnDataSource(data = dict(
        date_number = actual_dates,
        date = LB_DATES,
        weekday = LB_WEEKDAYS,
        actual = df.loc[df.store_item_id == store_item_id, actual_dates].values[0],
        predicted = df.loc[df.store_item_id == store_item_id, predicted_dates].values[0]
    ))
    
    tooltips = [
        ("Date", "@date"),
        ("Weekday", "@weekday"),
        ("Actual", "@actual{0}"),
        ("Predicted", "@predicted{0.0}")
    ]
    
    v = figure(plot_width = 700, plot_height = 400, x_range = actual_dates, tooltips = tooltips, title = f"store-item = {store_item_id}")
    v.line("date_number", "actual", source = source, color = "steelblue", alpha = 0.6, width = 3, legend_label = "Actual")
    v.line("date_number", "predicted", source = source, color = "coral", alpha = 0.6, width = 3, legend_label = "Predicted")

    v.xaxis.major_label_orientation = pi / 4
    
    return v

v2 = get_store_item_plot(df_store_item, df_store_item_worst.store_item_id.values[-1])
v3 = get_store_item_plot(df_store_item, df_store_item_worst.store_item_id.values[-2])
v4 = get_store_item_plot(df_store_item, df_store_item_worst.store_item_id.values[-3])
v5 = get_store_item_plot(df_store_item, df_store_item_worst.store_item_id.values[-4])
v6 = get_store_item_plot(df_store_item, df_store_item_worst.store_item_id.values[-5])
v7 = get_store_item_plot(df_store_item, df_store_item_worst.store_item_id.values[-6])
v8 = get_store_item_plot(df_store_item, df_store_item_worst.store_item_id.values[-7])
v9 = get_store_item_plot(df_store_item, df_store_item_worst.store_item_id.values[-8])
v10 = get_store_item_plot(df_store_item, df_store_item_worst.store_item_id.values[-9])
v11 = get_store_item_plot(df_store_item, df_store_item_worst.store_item_id.values[-10])


show(column(v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11))


## 2. State-Item
Let's look at the most relevant state-item combinations of interest.


In [None]:
df_state_item = evaluator.get_scores(preds_valid, 11)
df_state_item["state_item_id"] = df_state_item.state_id + "-" + df_state_item.item_id

df_state_item_best = df_state_item.sort_values("score_weighted").head(10).rename(columns = {"score_weighted": "score_best"})
df_state_item_worst = df_state_item.sort_values("score_weighted", ascending = False).head(10).sort_values("score_weighted").rename(columns = {"score_weighted": "score_worst"})

df_state_item_best_worst = pd.concat([df_state_item_best, df_state_item_worst])

source_1 = ColumnDataSource(data = dict(
    state_item_id = df_state_item_best_worst.state_item_id.values,
    score_best = df_state_item_best_worst.score_best.values,
    score_worst = df_state_item_best_worst.score_worst.values
))

tooltips_1 = [
    ("state-Item", "@state_item_id"),
    ("Score", "@score_best{0.0000}")
]

tooltips_2 = [
    ("state-Item", "@state_item_id"),
    ("Score", "@score_worst{0.0000}")
]

v1 = figure(plot_width = 700, plot_height = 400, y_range = df_state_item_best_worst.state_item_id.values, title = "Best and Worst state-Item")
v11 = v1.hbar("state_item_id", right = "score_best", source = source_1, height = 0.75, alpha = 0.6, color = "green")
v12 = v1.hbar("state_item_id", right = "score_worst", source = source_1, height = 0.75, alpha = 0.6, color = "red")

v1.add_tools(HoverTool(renderers = [v11], tooltips = tooltips_1))
v1.add_tools(HoverTool(renderers = [v12], tooltips = tooltips_2))

v1.xaxis.axis_label = "WRMSSE Score"
v1.yaxis.axis_label = "State-Item"


def get_state_item_plot(df, state_item_id):
    """
    Plots the actual and predicted values of state-item
    """
    actual_dates = ["d_" + str(x) for x in range(1914, 1942)]
    predicted_dates = ["pred_d_" + str(x) for x in range(1914, 1942)]
    
    source = ColumnDataSource(data = dict(
        date_number = actual_dates,
        date = LB_DATES,
        weekday = LB_WEEKDAYS,
        actual = df.loc[df.state_item_id == state_item_id, actual_dates].values[0],
        predicted = df.loc[df.state_item_id == state_item_id, predicted_dates].values[0]
    ))
    
    tooltips = [
        ("Date", "@date"),
        ("Weekday", "@weekday"),
        ("Actual", "@actual{0}"),
        ("Predicted", "@predicted{0.0}")
    ]
    
    v = figure(plot_width = 700, plot_height = 400, x_range = actual_dates, tooltips = tooltips, title = f"state-item = {state_item_id}")
    v.line("date_number", "actual", source = source, color = "steelblue", alpha = 0.6, width = 3, legend_label = "Actual")
    v.line("date_number", "predicted", source = source, color = "coral", alpha = 0.6, width = 3, legend_label = "Predicted")

    v.xaxis.major_label_orientation = pi / 4

    return v

v2 = get_state_item_plot(df_state_item, df_state_item_worst.state_item_id.values[-1])
v3 = get_state_item_plot(df_state_item, df_state_item_worst.state_item_id.values[-2])
v4 = get_state_item_plot(df_state_item, df_state_item_worst.state_item_id.values[-3])
v5 = get_state_item_plot(df_state_item, df_state_item_worst.state_item_id.values[-4])
v6 = get_state_item_plot(df_state_item, df_state_item_worst.state_item_id.values[-5])
v7 = get_state_item_plot(df_state_item, df_state_item_worst.state_item_id.values[-6])
v8 = get_state_item_plot(df_state_item, df_state_item_worst.state_item_id.values[-7])
v9 = get_state_item_plot(df_state_item, df_state_item_worst.state_item_id.values[-8])
v10 = get_state_item_plot(df_state_item, df_state_item_worst.state_item_id.values[-9])
v11 = get_state_item_plot(df_state_item, df_state_item_worst.state_item_id.values[-10])


show(column(v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11))


## 3. Item
Let's look at the most relevant items of interest.


In [None]:
df_item = evaluator.get_scores(preds_valid, 6)

df_item_best = df_item.sort_values("score_weighted").head(10).rename(columns = {"score_weighted": "score_best"})
df_item_worst = df_item.sort_values("score_weighted", ascending = False).head(10).sort_values("score_weighted").rename(columns = {"score_weighted": "score_worst"})

df_item_best_worst = pd.concat([df_item_best, df_item_worst])

source_1 = ColumnDataSource(data = dict(
    item_id = df_item_best_worst.item_id.values,
    score_best = df_item_best_worst.score_best.values,
    score_worst = df_item_best_worst.score_worst.values
))

tooltips_1 = [
    ("Item", "@item_id"),
    ("Score", "@score_best{0.0000}")
]

tooltips_2 = [
    ("Item", "@item_id"),
    ("Score", "@score_worst{0.0000}")
]

v1 = figure(plot_width = 700, plot_height = 400, y_range = df_item_best_worst.item_id.values, title = "Best and Worst Items")
v11 = v1.hbar("item_id", right = "score_best", source = source_1, height = 0.75, alpha = 0.6, color = "green")
v12 = v1.hbar("item_id", right = "score_worst", source = source_1, height = 0.75, alpha = 0.6, color = "red")

v1.add_tools(HoverTool(renderers = [v11], tooltips = tooltips_1))
v1.add_tools(HoverTool(renderers = [v12], tooltips = tooltips_2))

v1.xaxis.axis_label = "WRMSSE Score"
v1.yaxis.axis_label = "Item"


def get_item_plot(df, item_id):
    """
    Plots the actual and predicted values of item_id
    """
    actual_dates = ["d_" + str(x) for x in range(1914, 1942)]
    predicted_dates = ["pred_d_" + str(x) for x in range(1914, 1942)]
    
    source = ColumnDataSource(data = dict(
        date_number = actual_dates,
        date = LB_DATES,
        weekday = LB_WEEKDAYS,
        actual = df.loc[df.item_id == item_id, actual_dates].values[0],
        predicted = df.loc[df.item_id == item_id, predicted_dates].values[0]
    ))
    
    tooltips = [
        ("Date", "@date"),
        ("Weekday", "@weekday"),
        ("Actual", "@actual{0}"),
        ("Predicted", "@predicted{0.0}")
    ]
    
    v = figure(plot_width = 700, plot_height = 400, x_range = actual_dates, tooltips = tooltips, title = f"item_id = {item_id}")
    v.line("date_number", "actual", source = source, color = "steelblue", alpha = 0.6, width = 3, legend_label = "Actual")
    v.line("date_number", "predicted", source = source, color = "coral", alpha = 0.6, width = 3, legend_label = "Predicted")

    v.xaxis.major_label_orientation = pi / 4

    return v

v2 = get_item_plot(df_item, df_item_worst.item_id.values[-1])
v3 = get_item_plot(df_item, df_item_worst.item_id.values[-2])
v4 = get_item_plot(df_item, df_item_worst.item_id.values[-3])
v5 = get_item_plot(df_item, df_item_worst.item_id.values[-4])
v6 = get_item_plot(df_item, df_item_worst.item_id.values[-5])
v7 = get_item_plot(df_item, df_item_worst.item_id.values[-6])
v8 = get_item_plot(df_item, df_item_worst.item_id.values[-7])
v9 = get_item_plot(df_item, df_item_worst.item_id.values[-8])
v10 = get_item_plot(df_item, df_item_worst.item_id.values[-9])
v11 = get_item_plot(df_item, df_item_worst.item_id.values[-10])


show(column(v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11))


## 4. Store-Department
Let's look at the most relevant store-department combinations of interest.

In [None]:
df_store_dept = evaluator.get_scores(preds_valid, 10)
df_store_dept["store_dept_id"] = df_store_dept.store_id + "-" + df_store_dept.dept_id

df_store_dept_best = df_store_dept.sort_values("score_weighted").head(10).rename(columns = {"score_weighted": "score_best"})
df_store_dept_worst = df_store_dept.sort_values("score_weighted", ascending = False).head(10).sort_values("score_weighted").rename(columns = {"score_weighted": "score_worst"})

df_store_dept_best_worst = pd.concat([df_store_dept_best, df_store_dept_worst])

source_1 = ColumnDataSource(data = dict(
    store_dept_id = df_store_dept_best_worst.store_dept_id.values,
    score_best = df_store_dept_best_worst.score_best.values,
    score_worst = df_store_dept_best_worst.score_worst.values
))

tooltips_1 = [
    ("Store-dept", "@store_dept_id"),
    ("Score", "@score_best{0.0000}")
]

tooltips_2 = [
    ("Store-dept", "@store_dept_id"),
    ("Score", "@score_worst{0.0000}")
]

v1 = figure(plot_width = 700, plot_height = 400, y_range = df_store_dept_best_worst.store_dept_id.values, title = "Best and Worst Store-dept")
v11 = v1.hbar("store_dept_id", right = "score_best", source = source_1, height = 0.75, alpha = 0.6, color = "green")
v12 = v1.hbar("store_dept_id", right = "score_worst", source = source_1, height = 0.75, alpha = 0.6, color = "red")

v1.add_tools(HoverTool(renderers = [v11], tooltips = tooltips_1))
v1.add_tools(HoverTool(renderers = [v12], tooltips = tooltips_2))

v1.xaxis.axis_label = "WRMSSE Score"
v1.yaxis.axis_label = "Store-Department"


def get_store_dept_plot(df, store_dept_id):
    """
    Plots the actual and predicted values of store-dept
    """
    actual_dates = ["d_" + str(x) for x in range(1914, 1942)]
    predicted_dates = ["pred_d_" + str(x) for x in range(1914, 1942)]
    
    source = ColumnDataSource(data = dict(
        date_number = actual_dates,
        date = LB_DATES,
        weekday = LB_WEEKDAYS,
        actual = df.loc[df.store_dept_id == store_dept_id, actual_dates].values[0],
        predicted = df.loc[df.store_dept_id == store_dept_id, predicted_dates].values[0]
    ))
    
    tooltips = [
        ("Date", "@date"),
        ("Weekday", "@weekday"),
        ("Actual", "@actual{0}"),
        ("Predicted", "@predicted{0.0}")
    ]
    
    v = figure(plot_width = 700, plot_height = 400, x_range = actual_dates, tooltips = tooltips, title = f"store-dept = {store_dept_id}")
    v.line("date_number", "actual", source = source, color = "steelblue", alpha = 0.6, width = 3, legend_label = "Actual")
    v.line("date_number", "predicted", source = source, color = "coral", alpha = 0.6, width = 3, legend_label = "Predicted")

    v.xaxis.major_label_orientation = pi / 4

    return v

v2 = get_store_dept_plot(df_store_dept, df_store_dept_worst.store_dept_id.values[-1])
v3 = get_store_dept_plot(df_store_dept, df_store_dept_worst.store_dept_id.values[-2])
v4 = get_store_dept_plot(df_store_dept, df_store_dept_worst.store_dept_id.values[-3])
v5 = get_store_dept_plot(df_store_dept, df_store_dept_worst.store_dept_id.values[-4])
v6 = get_store_dept_plot(df_store_dept, df_store_dept_worst.store_dept_id.values[-5])
v7 = get_store_dept_plot(df_store_dept, df_store_dept_worst.store_dept_id.values[-6])
v8 = get_store_dept_plot(df_store_dept, df_store_dept_worst.store_dept_id.values[-7])
v9 = get_store_dept_plot(df_store_dept, df_store_dept_worst.store_dept_id.values[-8])
v10 = get_store_dept_plot(df_store_dept, df_store_dept_worst.store_dept_id.values[-9])
v11 = get_store_dept_plot(df_store_dept, df_store_dept_worst.store_dept_id.values[-10])


show(column(v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11))


## 5. Store-Category
Let's look at the most relevant store-category combinations of interest.

In [None]:
df_store_cat = evaluator.get_scores(preds_valid, 9)
df_store_cat["store_cat_id"] = df_store_cat.store_id + "-" + df_store_cat.cat_id

df_store_cat_best = df_store_cat.sort_values("score_weighted").head(10).rename(columns = {"score_weighted": "score_best"})
df_store_cat_worst = df_store_cat.sort_values("score_weighted", ascending = False).head(10).sort_values("score_weighted").rename(columns = {"score_weighted": "score_worst"})

df_store_cat_best_worst = pd.concat([df_store_cat_best, df_store_cat_worst])

source_1 = ColumnDataSource(data = dict(
    store_cat_id = df_store_cat_best_worst.store_cat_id.values,
    score_best = df_store_cat_best_worst.score_best.values,
    score_worst = df_store_cat_best_worst.score_worst.values
))

tooltips_1 = [
    ("Store-cat", "@store_cat_id"),
    ("Score", "@score_best{0.0000}")
]

tooltips_2 = [
    ("Store-cat", "@store_cat_id"),
    ("Score", "@score_worst{0.0000}")
]

v1 = figure(plot_width = 700, plot_height = 400, y_range = df_store_cat_best_worst.store_cat_id.values, title = "Best and Worst Store-cat")
v11 = v1.hbar("store_cat_id", right = "score_best", source = source_1, height = 0.75, alpha = 0.6, color = "green")
v12 = v1.hbar("store_cat_id", right = "score_worst", source = source_1, height = 0.75, alpha = 0.6, color = "red")

v1.add_tools(HoverTool(renderers = [v11], tooltips = tooltips_1))
v1.add_tools(HoverTool(renderers = [v12], tooltips = tooltips_2))

v1.xaxis.axis_label = "WRMSSE Score"
v1.yaxis.axis_label = "Store-Category"


def get_store_cat_plot(df, store_cat_id):
    """
    Plots the actual and predicted values of store-cat
    """
    actual_dates = ["d_" + str(x) for x in range(1914, 1942)]
    predicted_dates = ["pred_d_" + str(x) for x in range(1914, 1942)]
    
    source = ColumnDataSource(data = dict(
        date_number = actual_dates,
        date = LB_DATES,
        weekday = LB_WEEKDAYS,
        actual = df.loc[df.store_cat_id == store_cat_id, actual_dates].values[0],
        predicted = df.loc[df.store_cat_id == store_cat_id, predicted_dates].values[0]
    ))
    
    tooltips = [
        ("Date", "@date"),
        ("Weekday", "@weekday"),
        ("Actual", "@actual{0}"),
        ("Predicted", "@predicted{0.0}")
    ]
    
    v = figure(plot_width = 700, plot_height = 400, x_range = actual_dates, tooltips = tooltips, title = f"store-cat = {store_cat_id}")
    v.line("date_number", "actual", source = source, color = "steelblue", alpha = 0.6, width = 3, legend_label = "Actual")
    v.line("date_number", "predicted", source = source, color = "coral", alpha = 0.6, width = 3, legend_label = "Predicted")

    v.xaxis.major_label_orientation = pi / 4

    return v

v2 = get_store_cat_plot(df_store_cat, df_store_cat_worst.store_cat_id.values[-1])
v3 = get_store_cat_plot(df_store_cat, df_store_cat_worst.store_cat_id.values[-2])
v4 = get_store_cat_plot(df_store_cat, df_store_cat_worst.store_cat_id.values[-3])
v5 = get_store_cat_plot(df_store_cat, df_store_cat_worst.store_cat_id.values[-4])
v6 = get_store_cat_plot(df_store_cat, df_store_cat_worst.store_cat_id.values[-5])
v7 = get_store_cat_plot(df_store_cat, df_store_cat_worst.store_cat_id.values[-6])
v8 = get_store_cat_plot(df_store_cat, df_store_cat_worst.store_cat_id.values[-7])
v9 = get_store_cat_plot(df_store_cat, df_store_cat_worst.store_cat_id.values[-8])
v10 = get_store_cat_plot(df_store_cat, df_store_cat_worst.store_cat_id.values[-9])
v11 = get_store_cat_plot(df_store_cat, df_store_cat_worst.store_cat_id.values[-10])


show(column(v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11))


## 6. State-Department
Let's look at the most relevant state-department combinations of interest.

In [None]:
df_state_dept = evaluator.get_scores(preds_valid, 8)
df_state_dept["state_dept_id"] = df_state_dept.state_id + "-" + df_state_dept.dept_id

df_state_dept_best = df_state_dept.sort_values("score_weighted").head(10).rename(columns = {"score_weighted": "score_best"})
df_state_dept_worst = df_state_dept.sort_values("score_weighted", ascending = False).head(10).sort_values("score_weighted").rename(columns = {"score_weighted": "score_worst"})

df_state_dept_best_worst = pd.concat([df_state_dept_best, df_state_dept_worst])

source_1 = ColumnDataSource(data = dict(
    state_dept_id = df_state_dept_best_worst.state_dept_id.values,
    score_best = df_state_dept_best_worst.score_best.values,
    score_worst = df_state_dept_best_worst.score_worst.values
))

tooltips_1 = [
    ("state-dept", "@state_dept_id"),
    ("Score", "@score_best{0.0000}")
]

tooltips_2 = [
    ("state-dept", "@state_dept_id"),
    ("Score", "@score_worst{0.0000}")
]

v1 = figure(plot_width = 700, plot_height = 400, y_range = df_state_dept_best_worst.state_dept_id.values, title = "Best and Worst state-dept")
v11 = v1.hbar("state_dept_id", right = "score_best", source = source_1, height = 0.75, alpha = 0.6, color = "green")
v12 = v1.hbar("state_dept_id", right = "score_worst", source = source_1, height = 0.75, alpha = 0.6, color = "red")

v1.add_tools(HoverTool(renderers = [v11], tooltips = tooltips_1))
v1.add_tools(HoverTool(renderers = [v12], tooltips = tooltips_2))

v1.xaxis.axis_label = "WRMSSE Score"
v1.yaxis.axis_label = "State-Department"


def get_state_dept_plot(df, state_dept_id):
    """
    Plots the actual and predicted values of state-dept
    """
    actual_dates = ["d_" + str(x) for x in range(1914, 1942)]
    predicted_dates = ["pred_d_" + str(x) for x in range(1914, 1942)]
    
    source = ColumnDataSource(data = dict(
        date_number = actual_dates,
        date = LB_DATES,
        weekday = LB_WEEKDAYS,
        actual = df.loc[df.state_dept_id == state_dept_id, actual_dates].values[0],
        predicted = df.loc[df.state_dept_id == state_dept_id, predicted_dates].values[0]
    ))
    
    tooltips = [
        ("Date", "@date"),
        ("Weekday", "@weekday"),
        ("Actual", "@actual{0}"),
        ("Predicted", "@predicted{0.0}")
    ]
    
    v = figure(plot_width = 700, plot_height = 400, x_range = actual_dates, tooltips = tooltips, title = f"state-dept = {state_dept_id}")
    v.line("date_number", "actual", source = source, color = "steelblue", alpha = 0.6, width = 3, legend_label = "Actual")
    v.line("date_number", "predicted", source = source, color = "coral", alpha = 0.6, width = 3, legend_label = "Predicted")

    v.xaxis.major_label_orientation = pi / 4

    return v

v2 = get_state_dept_plot(df_state_dept, df_state_dept_worst.state_dept_id.values[-1])
v3 = get_state_dept_plot(df_state_dept, df_state_dept_worst.state_dept_id.values[-2])
v4 = get_state_dept_plot(df_state_dept, df_state_dept_worst.state_dept_id.values[-3])
v5 = get_state_dept_plot(df_state_dept, df_state_dept_worst.state_dept_id.values[-4])
v6 = get_state_dept_plot(df_state_dept, df_state_dept_worst.state_dept_id.values[-5])
v7 = get_state_dept_plot(df_state_dept, df_state_dept_worst.state_dept_id.values[-6])
v8 = get_state_dept_plot(df_state_dept, df_state_dept_worst.state_dept_id.values[-7])
v9 = get_state_dept_plot(df_state_dept, df_state_dept_worst.state_dept_id.values[-8])
v10 = get_state_dept_plot(df_state_dept, df_state_dept_worst.state_dept_id.values[-9])
v11 = get_state_dept_plot(df_state_dept, df_state_dept_worst.state_dept_id.values[-10])


show(column(v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11))


## 7. Store
Let's look at all the 10 stores.

In [None]:
df_store = evaluator.get_scores(preds_valid, 5)

df_store_best = df_store.sort_values("score_weighted").head(5).rename(columns = {"score_weighted": "score_best"})
df_store_worst = df_store.sort_values("score_weighted", ascending = False).head(5).sort_values("score_weighted").rename(columns = {"score_weighted": "score_worst"})

df_store_best_worst = pd.concat([df_store_best, df_store_worst])

source_1 = ColumnDataSource(data = dict(
    store_id = df_store_best_worst.store_id.values,
    score_best = df_store_best_worst.score_best.values,
    score_worst = df_store_best_worst.score_worst.values
))

tooltips_1 = [
    ("store", "@store_id"),
    ("Score", "@score_best{0.0000}")
]

tooltips_2 = [
    ("store", "@store_id"),
    ("Score", "@score_worst{0.0000}")
]

v1 = figure(plot_width = 700, plot_height = 400, y_range = df_store_best_worst.store_id.values, title = "Best and Worst stores")
v11 = v1.hbar("store_id", right = "score_best", source = source_1, height = 0.75, alpha = 0.6, color = "green")
v12 = v1.hbar("store_id", right = "score_worst", source = source_1, height = 0.75, alpha = 0.6, color = "red")

v1.add_tools(HoverTool(renderers = [v11], tooltips = tooltips_1))
v1.add_tools(HoverTool(renderers = [v12], tooltips = tooltips_2))

v1.xaxis.axis_label = "WRMSSE Score"
v1.yaxis.axis_label = "Store"


def get_store_plot(df, store_id):
    """
    Plots the actual and predicted values of store_id
    """
    actual_dates = ["d_" + str(x) for x in range(1914, 1942)]
    predicted_dates = ["pred_d_" + str(x) for x in range(1914, 1942)]
    
    source = ColumnDataSource(data = dict(
        date_number = actual_dates,
        date = LB_DATES,
        weekday = LB_WEEKDAYS,
        actual = df.loc[df.store_id == store_id, actual_dates].values[0],
        predicted = df.loc[df.store_id == store_id, predicted_dates].values[0]
    ))
    
    tooltips = [
        ("Date", "@date"),
        ("Weekday", "@weekday"),
        ("Actual", "@actual{0}"),
        ("Predicted", "@predicted{0.0}")
    ]
    
    v = figure(plot_width = 700, plot_height = 400, x_range = actual_dates, tooltips = tooltips, title = f"store_id = {store_id}")
    v.line("date_number", "actual", source = source, color = "steelblue", alpha = 0.6, width = 3, legend_label = "Actual")
    v.line("date_number", "predicted", source = source, color = "coral", alpha = 0.6, width = 3, legend_label = "Predicted")

    v.xaxis.major_label_orientation = pi / 4

    return v

v2 = get_store_plot(df_store, df_store_best_worst.store_id.values[-1])
v3 = get_store_plot(df_store, df_store_best_worst.store_id.values[-2])
v4 = get_store_plot(df_store, df_store_best_worst.store_id.values[-3])
v5 = get_store_plot(df_store, df_store_best_worst.store_id.values[-4])
v6 = get_store_plot(df_store, df_store_best_worst.store_id.values[-5])
v7 = get_store_plot(df_store, df_store_best_worst.store_id.values[-6])
v8 = get_store_plot(df_store, df_store_best_worst.store_id.values[-7])
v9 = get_store_plot(df_store, df_store_best_worst.store_id.values[-8])
v10 = get_store_plot(df_store, df_store_best_worst.store_id.values[-9])
v11 = get_store_plot(df_store, df_store_best_worst.store_id.values[-10])


show(column(v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11))


## 8. Department
Let's look at all the 7 departments.

In [None]:
df_dept = evaluator.get_scores(preds_valid, 4)

df_dept_best = df_dept.sort_values("score_weighted").head(3).rename(columns = {"score_weighted": "score_best"})
df_dept_worst = df_dept.sort_values("score_weighted", ascending = False).head(4).sort_values("score_weighted").rename(columns = {"score_weighted": "score_worst"})

df_dept_best_worst = pd.concat([df_dept_best, df_dept_worst])

source_1 = ColumnDataSource(data = dict(
    dept_id = df_dept_best_worst.dept_id.values,
    score_best = df_dept_best_worst.score_best.values,
    score_worst = df_dept_best_worst.score_worst.values
))

tooltips_1 = [
    ("dept", "@dept_id"),
    ("Score", "@score_best{0.0000}")
]

tooltips_2 = [
    ("dept", "@dept_id"),
    ("Score", "@score_worst{0.0000}")
]

v1 = figure(plot_width = 700, plot_height = 400, y_range = df_dept_best_worst.dept_id.values, title = "Best and Worst depts")
v11 = v1.hbar("dept_id", right = "score_best", source = source_1, height = 0.75, alpha = 0.6, color = "green")
v12 = v1.hbar("dept_id", right = "score_worst", source = source_1, height = 0.75, alpha = 0.6, color = "red")

v1.add_tools(HoverTool(renderers = [v11], tooltips = tooltips_1))
v1.add_tools(HoverTool(renderers = [v12], tooltips = tooltips_2))

v1.xaxis.axis_label = "WRMSSE Score"
v1.yaxis.axis_label = "Department"


def get_dept_plot(df, dept_id):
    """
    Plots the actual and predicted values of dept_id
    """
    actual_dates = ["d_" + str(x) for x in range(1914, 1942)]
    predicted_dates = ["pred_d_" + str(x) for x in range(1914, 1942)]
    
    source = ColumnDataSource(data = dict(
        date_number = actual_dates,
        date = LB_DATES,
        weekday = LB_WEEKDAYS,
        actual = df.loc[df.dept_id == dept_id, actual_dates].values[0],
        predicted = df.loc[df.dept_id == dept_id, predicted_dates].values[0]
    ))

    tooltips = [
        ("Date", "@date"),
        ("Weekday", "@weekday"),
        ("Actual", "@actual{0}"),
        ("Predicted", "@predicted{0.0}")
    ]

    v = figure(plot_width = 700, plot_height = 400, x_range = actual_dates, tooltips = tooltips, title = f"dept_id = {dept_id}")
    v.line("date_number", "actual", source = source, color = "steelblue", alpha = 0.6, width = 3, legend_label = "Actual")
    v.line("date_number", "predicted", source = source, color = "coral", alpha = 0.6, width = 3, legend_label = "Predicted")

    v.xaxis.major_label_orientation = pi / 4

    return v

v2 = get_dept_plot(df_dept, df_dept_best_worst.dept_id.values[-1])
v3 = get_dept_plot(df_dept, df_dept_best_worst.dept_id.values[-2])
v4 = get_dept_plot(df_dept, df_dept_best_worst.dept_id.values[-3])
v5 = get_dept_plot(df_dept, df_dept_best_worst.dept_id.values[-4])
v6 = get_dept_plot(df_dept, df_dept_best_worst.dept_id.values[-5])
v7 = get_dept_plot(df_dept, df_dept_best_worst.dept_id.values[-6])
v8 = get_dept_plot(df_dept, df_dept_best_worst.dept_id.values[-7])


show(column(v1, v2, v3, v4, v5, v6, v7, v8))


## 9. State-Category
Let's look at all the 9 state-category combinations.

In [None]:
df_state_cat = evaluator.get_scores(preds_valid, 7)
df_state_cat["state_cat_id"] = df_state_cat.state_id + "-" + df_state_cat.cat_id

df_state_cat_best = df_state_cat.sort_values("score_weighted").head(4).rename(columns = {"score_weighted": "score_best"})
df_state_cat_worst = df_state_cat.sort_values("score_weighted", ascending = False).head(5).sort_values("score_weighted").rename(columns = {"score_weighted": "score_worst"})

df_state_cat_best_worst = pd.concat([df_state_cat_best, df_state_cat_worst])

source_1 = ColumnDataSource(data = dict(
    state_cat_id = df_state_cat_best_worst.state_cat_id.values,
    score_best = df_state_cat_best_worst.score_best.values,
    score_worst = df_state_cat_best_worst.score_worst.values
))

tooltips_1 = [
    ("state-cat", "@state_cat_id"),
    ("Score", "@score_best{0.0000}")
]

tooltips_2 = [
    ("state-cat", "@state_cat_id"),
    ("Score", "@score_worst{0.0000}")
]

v1 = figure(plot_width = 700, plot_height = 400, y_range = df_state_cat_best_worst.state_cat_id.values, title = "Best and Worst state-cat")
v11 = v1.hbar("state_cat_id", right = "score_best", source = source_1, height = 0.75, alpha = 0.6, color = "green")
v12 = v1.hbar("state_cat_id", right = "score_worst", source = source_1, height = 0.75, alpha = 0.6, color = "red")

v1.add_tools(HoverTool(renderers = [v11], tooltips = tooltips_1))
v1.add_tools(HoverTool(renderers = [v12], tooltips = tooltips_2))

v1.xaxis.axis_label = "WRMSSE Score"
v1.yaxis.axis_label = "State-Category"


def get_state_cat_plot(df, state_cat_id):
    """
    Plots the actual and predicted values of state-cat
    """
    actual_dates = ["d_" + str(x) for x in range(1914, 1942)]
    predicted_dates = ["pred_d_" + str(x) for x in range(1914, 1942)]
    
    source = ColumnDataSource(data = dict(
        date_number = actual_dates,
        date = LB_DATES,
        weekday = LB_WEEKDAYS,
        actual = df.loc[df.state_cat_id == state_cat_id, actual_dates].values[0],
        predicted = df.loc[df.state_cat_id == state_cat_id, predicted_dates].values[0]
    ))
    
    tooltips = [
        ("Date", "@date"),
        ("Weekday", "@weekday"),
        ("Actual", "@actual{0}"),
        ("Predicted", "@predicted{0.0}")
    ]
    
    v = figure(plot_width = 700, plot_height = 400, x_range = actual_dates, tooltips = tooltips, title = f"state-cat = {state_cat_id}")
    v.line("date_number", "actual", source = source, color = "steelblue", alpha = 0.6, width = 3, legend_label = "Actual")
    v.line("date_number", "predicted", source = source, color = "coral", alpha = 0.6, width = 3, legend_label = "Predicted")

    v.xaxis.major_label_orientation = pi / 4

    return v

v2 = get_state_cat_plot(df_state_cat, df_state_cat_best_worst.state_cat_id.values[-1])
v3 = get_state_cat_plot(df_state_cat, df_state_cat_best_worst.state_cat_id.values[-2])
v4 = get_state_cat_plot(df_state_cat, df_state_cat_best_worst.state_cat_id.values[-3])
v5 = get_state_cat_plot(df_state_cat, df_state_cat_best_worst.state_cat_id.values[-4])
v6 = get_state_cat_plot(df_state_cat, df_state_cat_best_worst.state_cat_id.values[-5])
v7 = get_state_cat_plot(df_state_cat, df_state_cat_best_worst.state_cat_id.values[-6])
v8 = get_state_cat_plot(df_state_cat, df_state_cat_best_worst.state_cat_id.values[-7])
v9 = get_state_cat_plot(df_state_cat, df_state_cat_best_worst.state_cat_id.values[-8])
v10 = get_state_cat_plot(df_state_cat, df_state_cat_best_worst.state_cat_id.values[-9])


show(column(v1, v2, v3, v4, v5, v6, v7, v8, v9, v10))


## 10. State
Let's look at all the 3 states.

In [None]:
df_state = evaluator.get_scores(preds_valid, 3)

df_state_best = df_state.sort_values("score_weighted").head(1).rename(columns = {"score_weighted": "score_best"})
df_state_worst = df_state.sort_values("score_weighted", ascending = False).head(2).sort_values("score_weighted").rename(columns = {"score_weighted": "score_worst"})

df_state_best_worst = pd.concat([df_state_best, df_state_worst])

source_1 = ColumnDataSource(data = dict(
    state_id = df_state_best_worst.state_id.values,
    score_best = df_state_best_worst.score_best.values,
    score_worst = df_state_best_worst.score_worst.values
))

tooltips_1 = [
    ("state", "@state_id"),
    ("Score", "@score_best{0.0000}")
]

tooltips_2 = [
    ("state", "@state_id"),
    ("Score", "@score_worst{0.0000}")
]

v1 = figure(plot_width = 700, plot_height = 400, y_range = df_state_best_worst.state_id.values, title = "Best and Worst states")
v11 = v1.hbar("state_id", right = "score_best", source = source_1, height = 0.75, alpha = 0.6, color = "green")
v12 = v1.hbar("state_id", right = "score_worst", source = source_1, height = 0.75, alpha = 0.6, color = "red")

v1.add_tools(HoverTool(renderers = [v11], tooltips = tooltips_1))
v1.add_tools(HoverTool(renderers = [v12], tooltips = tooltips_2))

v1.xaxis.axis_label = "WRMSSE Score"
v1.yaxis.axis_label = "State"


def get_state_plot(df, state_id):
    """
    Plots the actual and predicted values of state_id
    """
    actual_dates = ["d_" + str(x) for x in range(1914, 1942)]
    predicted_dates = ["pred_d_" + str(x) for x in range(1914, 1942)]
    
    source = ColumnDataSource(data = dict(
        date_number = actual_dates,
        date = LB_DATES,
        weekday = LB_WEEKDAYS,
        actual = df.loc[df.state_id == state_id, actual_dates].values[0],
        predicted = df.loc[df.state_id == state_id, predicted_dates].values[0]
    ))

    tooltips = [
        ("Date", "@date"),
        ("Weekday", "@weekday"),
        ("Actual", "@actual{0}"),
        ("Predicted", "@predicted{0.0}")
    ]
    
    v = figure(plot_width = 700, plot_height = 400, x_range = actual_dates, tooltips = tooltips, title = f"state_id = {state_id}")
    v.line("date_number", "actual", source = source, color = "steelblue", alpha = 0.6, width = 3, legend_label = "Actual")
    v.line("date_number", "predicted", source = source, color = "coral", alpha = 0.6, width = 3, legend_label = "Predicted")

    v.xaxis.major_label_orientation = pi / 4

    return v

v2 = get_state_plot(df_state, df_state_best_worst.state_id.values[-1])
v3 = get_state_plot(df_state, df_state_best_worst.state_id.values[-2])
v4 = get_state_plot(df_state, df_state_best_worst.state_id.values[-3])


show(column(v1, v2, v3, v4))


## 11. Category
Let's look at all the 3 categories.

In [None]:
df_cat = evaluator.get_scores(preds_valid, 2)

df_cat_best = df_cat.sort_values("score_weighted").head(1).rename(columns = {"score_weighted": "score_best"})
df_cat_worst = df_cat.sort_values("score_weighted", ascending = False).head(2).sort_values("score_weighted").rename(columns = {"score_weighted": "score_worst"})

df_cat_best_worst = pd.concat([df_cat_best, df_cat_worst])

source_1 = ColumnDataSource(data = dict(
    cat_id = df_cat_best_worst.cat_id.values,
    score_best = df_cat_best_worst.score_best.values,
    score_worst = df_cat_best_worst.score_worst.values
))

tooltips_1 = [
    ("cat", "@cat_id"),
    ("Score", "@score_best{0.0000}")
]

tooltips_2 = [
    ("cat", "@cat_id"),
    ("Score", "@score_worst{0.0000}")
]

v1 = figure(plot_width = 700, plot_height = 400, y_range = df_cat_best_worst.cat_id.values, title = "Best and Worst cats")
v11 = v1.hbar("cat_id", right = "score_best", source = source_1, height = 0.75, alpha = 0.6, color = "green")
v12 = v1.hbar("cat_id", right = "score_worst", source = source_1, height = 0.75, alpha = 0.6, color = "red")

v1.add_tools(HoverTool(renderers = [v11], tooltips = tooltips_1))
v1.add_tools(HoverTool(renderers = [v12], tooltips = tooltips_2))

v1.xaxis.axis_label = "WRMSSE Score"
v1.yaxis.axis_label = "Category"


def get_cat_plot(df, cat_id):
    """
    Plots the actual and predicted values of cat_id
    """
    actual_dates = ["d_" + str(x) for x in range(1914, 1942)]
    predicted_dates = ["pred_d_" + str(x) for x in range(1914, 1942)]
    
    source = ColumnDataSource(data = dict(
        date_number = actual_dates,
        date = LB_DATES,
        weekday = LB_WEEKDAYS,
        actual = df.loc[df.cat_id == cat_id, actual_dates].values[0],
        predicted = df.loc[df.cat_id == cat_id, predicted_dates].values[0]
    ))
    
    tooltips = [
        ("Date", "@date"),
        ("Weekday", "@weekday"),
        ("Actual", "@actual{0}"),
        ("Predicted", "@predicted{0.0}")
    ]

    v = figure(plot_width = 700, plot_height = 400, x_range = actual_dates, tooltips = tooltips, title = f"cat_id = {cat_id}")
    v.line("date_number", "actual", source = source, color = "steelblue", alpha = 0.6, width = 3, legend_label = "Actual")
    v.line("date_number", "predicted", source = source, color = "coral", alpha = 0.6, width = 3, legend_label = "Predicted")

    v.xaxis.major_label_orientation = pi / 4

    return v

v2 = get_cat_plot(df_cat, df_cat_best_worst.cat_id.values[-1])
v3 = get_cat_plot(df_cat, df_cat_best_worst.cat_id.values[-2])
v4 = get_cat_plot(df_cat, df_cat_best_worst.cat_id.values[-3])


show(column(v1, v2, v3, v4))


## 12. All
Let's look at all levels aggregated together.

In [None]:
df = evaluator.get_scores(preds_valid, 1)

source_1 = ColumnDataSource(data = dict(
    level = ["All"],
    score = df.score_weighted.values
))

tooltips_1 = [
    ("Level", "all"),
    ("Score", "@score{0.0000}")
]

v1 = figure(plot_width = 700, plot_height = 200, y_range = ["All"], tooltips = tooltips_1, title = "All levels")
v1.hbar("level", right = "score", source = source_1, height = 0.5, alpha = 0.6, color = "red")

v1.xaxis.axis_label = "WRMSSE Score"

actual_dates = ["d_" + str(x) for x in range(1914, 1942)]
predicted_dates = ["pred_d_" + str(x) for x in range(1914, 1942)]

source_2 = ColumnDataSource(data = dict(
    date_number = actual_dates,
    date = LB_DATES,
    weekday = LB_WEEKDAYS,
    actual = df[actual_dates].values[0],
    predicted = df[predicted_dates].values[0]
))
    
tooltips = [
    ("Date", "@date"),
    ("Weekday", "@weekday"),
    ("Actual", "@actual{0}"),
    ("Predicted", "@predicted{0.0}")
]
    
    
v2 = figure(plot_width = 700, plot_height = 400, x_range = actual_dates, tooltips = tooltips, title = "All levels")
v2.line("date_number", "actual", source = source_2, color = "steelblue", alpha = 0.6, width = 3, legend_label = "Actual")
v2.line("date_number", "predicted", source = source_2, color = "coral", alpha = 0.6, width = 3, legend_label = "Predicted")

v2.xaxis.major_label_orientation = pi / 4

show(column(v1, v2))

## Anatomy to Action
It's not just about summarizing and plotting these graphs. It's about extracting insights from them and convert them to code/actions that can help in improving the model's performance on the private test data. Not just to overfit the public LB.

Good Luck!
