**Validation set consist of last 28 days of the training data i.e. single holdout set of last 28 days for each series**

Other validation techniques:

https://otexts.com/fpp2/accuracy.html

https://www.kaggle.com/ragnar123/simple-lgbm-groupkfold-cv

https://towardsdatascience.com/time-series-nested-cross-validation-76adba623eb9



In [None]:
!pip install lightgbm==2.3.1

Collecting lightgbm==2.3.1
[?25l  Downloading https://files.pythonhosted.org/packages/0b/9d/ddcb2f43aca194987f1a99e27edf41cf9bc39ea750c3371c2a62698c509a/lightgbm-2.3.1-py2.py3-none-manylinux1_x86_64.whl (1.2MB)
[K     |████████████████████████████████| 1.2MB 2.8MB/s 
Installing collected packages: lightgbm
  Found existing installation: lightgbm 2.2.3
    Uninstalling lightgbm-2.2.3:
      Successfully uninstalled lightgbm-2.2.3
Successfully installed lightgbm-2.3.1


In [None]:
from google.colab import drive
drive.mount('/content/gdrive')
%cd /content/gdrive/My Drive/M5-Evaluation

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/gdrive
/content/gdrive/.shortcut-targets-by-id/1IRMYDLHp5HGU8lqmV_ly3a9orQWGxVGB/M5-Evaluation


In [None]:
# General imports
import numpy as np
import pandas as pd
import os, sys, gc, time, warnings, pickle, psutil, random

# custom imports
from multiprocessing import Pool        # Multiprocess Runs

warnings.filterwarnings('ignore')

In [None]:
########################### Helpers
#################################################################################
## Seeder
# :seed to make all processes deterministic     # type: int
def seed_everything(seed=0):
    random.seed(seed)
    np.random.seed(seed)

## Multiprocess Runs
def df_parallelize_run(func, t_split):
    num_cores = np.min([N_CORES,len(t_split)])
    pool = Pool(num_cores)
    df = pd.concat(pool.map(func, t_split), axis=1)
    pool.close()
    pool.join()
    return df

In [None]:
########################### Helper to load data by store ID
#################################################################################
# Read data
def get_data_by_store(store):
    
  # Read and contact basic feature
  df = pd.concat([pd.read_pickle(BASE),
                  pd.read_pickle(PRICE).iloc[:,2:],
                  pd.read_pickle(CALENDAR).iloc[:,2:]],
                  axis=1)
  
  # Leave only relevant store
  df = df[df['store_id']==store]

  # With memory limits we have to read 
  # lags and mean encoding features
  # separately and drop items that we don't need.
  # As our Features Grids are aligned 
  # we can use index to keep only necessary rows
  # Alignment is good for us as concat uses less memory than merge.
  df2 = pd.read_pickle(MEAN_ENC)[mean_features]
  df2 = df2[df2.index.isin(df.index)]
  
  df3 = pd.read_pickle(LAGS).iloc[:,3:]
  df3 = df3[df3.index.isin(df.index)]
  
  df = pd.concat([df, df2], axis=1)
  del df2 # to not reach memory limit 
  
  df = pd.concat([df, df3], axis=1)
  del df3 # to not reach memory limit 
  
  # Create features list
  features = [col for col in list(df) if col not in remove_features]
  df = df[['id','d',TARGET]+features]
  
  # Skipping first n rows
  df = df[df['d']>=START_TRAIN].reset_index(drop=True)
  
  return df, features

# Recombine Test set after training
def get_base_test():
  base_test = pd.DataFrame()

  for store_id in STORES_IDS:
      temp_df = pd.read_pickle('test_'+store_id+'.pkl')
      temp_df['store_id'] = store_id
      base_test = pd.concat([base_test, temp_df]).reset_index(drop=True)
  
  return base_test


########################### Helper to make dynamic rolling lags
#################################################################################
def make_lag(LAG_DAY):
  lag_df = base_test[['id','d',TARGET]]
  col_name = 'sales_lag_'+str(LAG_DAY)
  lag_df[col_name] = lag_df.groupby(['id'])[TARGET].transform(lambda x: x.shift(LAG_DAY)).astype(np.float16)
  return lag_df[[col_name]]


def make_lag_roll(LAG_DAY):
  shift_day = LAG_DAY[0]
  roll_wind = LAG_DAY[1]
  lag_df = base_test[['id','d',TARGET]]
  col_name = 'rolling_mean_tmp_'+str(shift_day)+'_'+str(roll_wind)
  lag_df[col_name] = lag_df.groupby(['id'])[TARGET].transform(lambda x: x.shift(shift_day).rolling(roll_wind).mean())
  return lag_df[[col_name]]

In [None]:
########################### Model params
#################################################################################
import lightgbm as lgb
lgb_params = {
                    'boosting_type': 'gbdt',
                    'objective': 'tweedie',
                    'tweedie_variance_power': 1.1,
                    'metric': 'rmse',
                    'subsample': 0.5,
                    'subsample_freq': 1,
                    'learning_rate': 0.03,
                    'num_leaves': 2**11-1,
                    'min_data_in_leaf': 2**12-1,
                    'feature_fraction': 0.5,
                    'max_bin': 100,
                    'n_estimators': 1400,
                    'boost_from_average': False,
                    'verbose': -1,
                } 

# Let's look closer on params

## 'boosting_type': 'gbdt'
# we have 'goss' option for faster training
# but it normally leads to underfit.
# Also there is good 'dart' mode
# but it takes forever to train
# and model performance depends 
# a lot on random factor 
# https://www.kaggle.com/c/home-credit-default-risk/discussion/60921

## 'objective': 'tweedie'
# Tweedie Gradient Boosting for Extremely
# Unbalanced Zero-inflated Data
# https://arxiv.org/pdf/1811.10192.pdf
# and many more articles about tweediie
#
# Strange (for me) but Tweedie is close in results
# to my own ugly loss.
# My advice here - make OWN LOSS function
# https://www.kaggle.com/c/m5-forecasting-accuracy/discussion/140564
# https://www.kaggle.com/c/m5-forecasting-accuracy/discussion/143070
# I think many of you already using it (after poisson kernel appeared) 
# (kagglers are very good with "params" testing and tuning).
# Try to figure out why Tweedie works.
# probably it will show you new features options
# or data transformation (Target transformation?).

## 'tweedie_variance_power': 1.1
# default = 1.5
# set this closer to 2 to shift towards a Gamma distribution
# set this closer to 1 to shift towards a Poisson distribution
# my CV shows 1.1 is optimal 
# but you can make your own choice

## 'metric': 'rmse'
# Doesn't mean anything to us
# as competition metric is different
# and we don't use early stoppings here.
# So rmse serves just for general 
# model performance overview.
# Also we use "fake" validation set
# (as it makes part of the training set)
# so even general rmse score doesn't mean anything))
# https://www.kaggle.com/c/m5-forecasting-accuracy/discussion/133834

## 'subsample': 0.5
# Serves to fight with overfit
# this will randomly select part of data without resampling
# Chosen by CV (my CV can be wrong!)
# Next kernel will be about CV

##'subsample_freq': 1
# frequency for bagging
# default value - seems ok

## 'learning_rate': 0.03
# Chosen by CV
# Smaller - longer training
# but there is an option to stop 
# in "local minimum"
# Bigger - faster training
# but there is a chance to
# not find "global minimum" minimum

## 'num_leaves': 2**11-1
## 'min_data_in_leaf': 2**12-1
# Force model to use more features
# We need it to reduce "recursive"
# error impact.
# Also it leads to overfit
# that's why we use small 
# 'max_bin': 100

## l1, l2 regularizations
# https://towardsdatascience.com/l1-and-l2-regularization-methods-ce25e7fc831c
# Good tiny explanation
# l2 can work with bigger num_leaves
# but my CV doesn't show boost
                    
## 'n_estimators': 1400
# CV shows that there should be
# different values for each state/store.
# Current value was chosen 
# for general purpose.
# As we don't use any early stopings
# careful to not overfit Public LB.

##'feature_fraction': 0.5
# LightGBM will randomly select 
# part of features on each iteration (tree).
# We have maaaany features
# and many of them are "duplicates"
# and many just "noise"
# good values here - 0.5-0.7 (by CV)

## 'boost_from_average': False
# There is some "problem"
# to code boost_from_average for 
# custom loss
# 'True' makes training faster
# BUT carefull use it
# https://github.com/microsoft/LightGBM/issues/1514
# not our case but good to know cons

In [None]:
########################### Vars
#################################################################################
VER = 1                          # Our model version
SEED = 42                        # We want all things
seed_everything(SEED)            # to be as deterministic 
lgb_params['seed'] = SEED        # as possible
N_CORES = psutil.cpu_count()     # Available CPU cores


#LIMITS and const
TARGET      = 'sales'            # Our target
START_TRAIN = 0                  # We can skip some rows (Nans/faster training)
END_TRAIN   = 1941               # End day of our train set
P_HORIZON   = 28                 # Prediction horizon
USE_AUX     = False               # Use or not pretrained models

#FEATURES to remove
## These features lead to overfit
## or values not present in test set
remove_features = ['id','state_id','store_id',
                   'date','wm_yr_wk','d',TARGET]
mean_features   = ['enc_cat_id_mean','enc_cat_id_std',
                   'enc_dept_id_mean','enc_dept_id_std',
                   'enc_item_id_mean','enc_item_id_std'] 

#PATHS for Features
ORIGINAL = './input/m5-forecasting-accuracy/'
BASE     = './output/KY/grid_part_1.pkl'
PRICE    = './output/KY/grid_part_2.pkl'
CALENDAR = './output/KY/grid_part_3.pkl'
LAGS     = './output/KY/lags_df_28.pkl'
MEAN_ENC = './output/KY/mean_encoding_df.pkl'

# AUX(pretrained) Models paths
AUX_MODELS = './input/m5-aux-models/'


#STORES ids
STORES_IDS = pd.read_csv(ORIGINAL+'sales_train_validation.csv')['store_id']
STORES_IDS = list(STORES_IDS.unique())


#SPLITS for lags creation
SHIFT_DAY  = 28
N_LAGS     = 15
LAGS_SPLIT = [col for col in range(SHIFT_DAY,SHIFT_DAY+N_LAGS)]
ROLS_SPLIT = []
for i in [1,7,14]:
    for j in [7,14,30,60]:
        ROLS_SPLIT.append([i,j])

In [None]:
# Get grid for current store
grid_df, features_columns = get_data_by_store('CA_1')
MODEL_FEATURES = features_columns
grid_df.head(10)

Unnamed: 0,id,d,sales,item_id,dept_id,cat_id,release,sell_price,price_max,price_min,price_std,price_mean,price_norm,price_nunique,item_nunique,price_momentum,price_momentum_m,price_momentum_y,event_name_1,event_type_1,event_name_2,event_type_2,snap_CA,snap_TX,snap_WI,tm_d,tm_w,tm_m,tm_y,tm_wm,tm_dw,tm_w_end,enc_cat_id_mean,enc_cat_id_std,enc_dept_id_mean,enc_dept_id_std,enc_item_id_mean,enc_item_id_std,sales_lag_28,sales_lag_29,sales_lag_30,sales_lag_31,sales_lag_32,sales_lag_33,sales_lag_34,sales_lag_35,sales_lag_36,sales_lag_37,sales_lag_38,sales_lag_39,sales_lag_40,sales_lag_41,sales_lag_42,rolling_mean_7,rolling_std_7,rolling_mean_14,rolling_std_14,rolling_mean_30,rolling_std_30,rolling_mean_60,rolling_std_60,rolling_mean_180,rolling_std_180,rolling_mean_tmp_1_7,rolling_mean_tmp_1_14,rolling_mean_tmp_1_30,rolling_mean_tmp_1_60,rolling_mean_tmp_7_7,rolling_mean_tmp_7_14,rolling_mean_tmp_7_30,rolling_mean_tmp_7_60,rolling_mean_tmp_14_7,rolling_mean_tmp_14_14,rolling_mean_tmp_14_30,rolling_mean_tmp_14_60
0,HOBBIES_1_008_CA_1_evaluation,1,12.0,HOBBIES_1_008,HOBBIES_1,HOBBIES,0,0.459961,0.5,0.419922,0.01976,0.476318,0.919922,4.0,16,,0.96875,0.949219,,,,,0,0,0,29,4,1,0,5,5,1,0.708984,2.259766,0.865234,2.544922,4.695312,7.183594,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,HOBBIES_1_009_CA_1_evaluation,1,2.0,HOBBIES_1_009,HOBBIES_1,HOBBIES,0,1.55957,1.769531,1.55957,0.032745,1.764648,0.881348,2.0,9,,0.885742,0.896484,,,,,0,0,0,29,4,1,0,5,5,1,0.708984,2.259766,0.865234,2.544922,0.850098,1.754883,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2,HOBBIES_1_010_CA_1_evaluation,1,0.0,HOBBIES_1_010,HOBBIES_1,HOBBIES,0,3.169922,3.169922,2.970703,0.046356,2.980469,1.0,2.0,20,,1.064453,1.043945,,,,,0,0,0,29,4,1,0,5,5,1,0.708984,2.259766,0.865234,2.544922,0.611328,0.863281,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3,HOBBIES_1_012_CA_1_evaluation,1,0.0,HOBBIES_1_012,HOBBIES_1,HOBBIES,0,5.980469,6.519531,5.980469,0.115967,6.46875,0.916992,3.0,71,,0.921875,0.958984,,,,,0,0,0,29,4,1,0,5,5,1,0.708984,2.259766,0.865234,2.544922,0.384766,0.692871,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
4,HOBBIES_1_015_CA_1_evaluation,1,4.0,HOBBIES_1_015,HOBBIES_1,HOBBIES,0,0.700195,0.720215,0.680176,0.011337,0.706543,0.972168,3.0,16,,0.990234,1.001953,,,,,0,0,0,29,4,1,0,5,5,1,0.708984,2.259766,0.865234,2.544922,4.441406,6.703125,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
5,HOBBIES_1_016_CA_1_evaluation,1,5.0,HOBBIES_1_016,HOBBIES_1,HOBBIES,0,0.700195,0.720215,0.680176,0.011337,0.706543,0.972168,3.0,16,,0.990234,1.001953,,,,,0,0,0,29,4,1,0,5,5,1,0.708984,2.259766,0.865234,2.544922,3.166016,5.328125,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
6,HOBBIES_1_022_CA_1_evaluation,1,2.0,HOBBIES_1_022,HOBBIES_1,HOBBIES,0,6.859375,7.179688,6.859375,0.08313,7.144531,0.955566,3.0,8,,0.95752,0.984863,,,,,0,0,0,29,4,1,0,5,5,1,0.708984,2.259766,0.865234,2.544922,0.384766,0.692871,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
7,HOBBIES_1_023_CA_1_evaluation,1,2.0,HOBBIES_1_023,HOBBIES_1,HOBBIES,0,3.439453,3.439453,2.970703,0.126221,3.404297,1.0,2.0,7,,1.011719,1.0,,,,,0,0,0,29,4,1,0,5,5,1,0.708984,2.259766,0.865234,2.544922,1.082031,1.695312,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
8,HOBBIES_1_028_CA_1_evaluation,1,0.0,HOBBIES_1_028,HOBBIES_1,HOBBIES,0,6.671875,7.980469,6.671875,0.446777,7.667969,0.835938,4.0,7,,0.86084,0.958984,,,,,0,0,0,29,4,1,0,5,5,1,0.708984,2.259766,0.865234,2.544922,0.427002,0.708008,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
9,HOBBIES_1_029_CA_1_evaluation,1,2.0,HOBBIES_1_029,HOBBIES_1,HOBBIES,0,7.441406,8.976562,7.441406,0.740723,8.335938,0.828613,3.0,8,,0.888184,1.0,,,,,0,0,0,29,4,1,0,5,5,1,0.708984,2.259766,0.865234,2.544922,0.952148,1.28418,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


In [None]:
########################### Aux Models
# If we want to use pretrained models we can     
# Here is some 'logs' that can compare
#Train CA_1
#[100]	valid_0's rmse: 2.02289
#[200]	valid_0's rmse: 2.0017
#[300]	valid_0's rmse: 1.99239
#[400]	valid_0's rmse: 1.98471
#[500]	valid_0's rmse: 1.97923
#[600]	valid_0's rmse: 1.97284
#[700]	valid_0's rmse: 1.96763
#[800]	valid_0's rmse: 1.9624
#[900]	valid_0's rmse: 1.95673
#[1000]	valid_0's rmse: 1.95201
#[1100]	valid_0's rmse: 1.9476
#[1200]	valid_0's rmse: 1.9434
#[1300]	valid_0's rmse: 1.9392
#[1400]	valid_0's rmse: 1.93446

#Train CA_2
#[100]	valid_0's rmse: 1.88949
#[200]	valid_0's rmse: 1.84767
#[300]	valid_0's rmse: 1.83653
#[400]	valid_0's rmse: 1.82909
#[500]	valid_0's rmse: 1.82265
#[600]	valid_0's rmse: 1.81725
#[700]	valid_0's rmse: 1.81252
#[800]	valid_0's rmse: 1.80736
#[900]	valid_0's rmse: 1.80242
#[1000]	valid_0's rmse: 1.79821
#[1100]	valid_0's rmse: 1.794
#[1200]	valid_0's rmse: 1.78973
#[1300]	valid_0's rmse: 1.78552
#[1400]	valid_0's rmse: 1.78158

In [None]:
########################### Train Models
#################################################################################
if not USE_AUX: 
  for store_id in STORES_IDS:
    print('Train', store_id)
    
    # Get grid for current store
    grid_df, features_columns = get_data_by_store(store_id)
    
    # Masks for 
    # Train (All data less than or equal to 1941 i.e. [1, 1941])
    # Validation (Last 28 days - not real validation set i.e. (1913, 1941])
    # Test (All data greater than 1913 day, 
    #       with some gap for recursive features i.e. i.e. (1841, 1969])
    train_mask = grid_df['d']<=END_TRAIN
    valid_mask = train_mask&(grid_df['d']>(END_TRAIN-P_HORIZON))
    preds_mask = grid_df['d']>(END_TRAIN-100)
    
    # Apply masks and save lgb dataset as bin
    # to reduce memory spikes during dtype convertations
    # https://github.com/Microsoft/LightGBM/issues/1032
    # "To avoid any conversions, you should always use np.float32"
    # or save to bin before start training
    # https://www.kaggle.com/c/talkingdata-adtracking-fraud-detection/discussion/53773
    train_data = lgb.Dataset(grid_df[train_mask][features_columns], 
                      label=grid_df[train_mask][TARGET])
    train_data.save_binary('train_data.bin')
    train_data = lgb.Dataset('train_data.bin')
    
    valid_data = lgb.Dataset(grid_df[valid_mask][features_columns], 
                      label=grid_df[valid_mask][TARGET])
    
    # Saving part of the dataset for later predictions
    # Removing features that we need to calculate recursively 
    grid_df = grid_df[preds_mask].reset_index(drop=True)
    keep_cols = [col for col in list(grid_df) if '_tmp_' not in col]
    grid_df = grid_df[keep_cols]
    grid_df.to_pickle('test_'+store_id+'.pkl')
    del grid_df
    
    # Launch seeder again to make lgb training 100% deterministic
    # with each "code line" np.random "evolves" 
    # so we need (may want) to "reset" it
    seed_everything(SEED)
    estimator = lgb.train(lgb_params,
                          train_data,
                          valid_sets = [valid_data],
                          verbose_eval = 100)
    
    # Save model - it's not real '.bin' but a pickle file
    # estimator = lgb.Booster(model_file='model.txt')
    # can only predict with the best iteration (or the saving iteration)
    # pickle.dump gives us more flexibility
    # like estimator.predict(TEST, num_iteration=100)
    # num_iteration - number of iteration want to predict with, 
    # NULL or <= 0 means use best iteration
    model_name = 'lgb_model_'+store_id+'_v'+str(VER)+'.bin'
    pickle.dump(estimator, open(model_name, 'wb'))

    # Remove temporary files and objects 
    # to free some hdd space and ram memory
    !rm train_data.bin
    del train_data, valid_data, estimator
    gc.collect()
    
    # "Keep" models features for predictions
    MODEL_FEATURES = features_columns

Train CA_1
[100]	valid_0's rmse: 2.01468
[200]	valid_0's rmse: 1.98217
[300]	valid_0's rmse: 1.97222
[400]	valid_0's rmse: 1.96496
[500]	valid_0's rmse: 1.95892
[600]	valid_0's rmse: 1.95333
[700]	valid_0's rmse: 1.94761
[800]	valid_0's rmse: 1.94271
[900]	valid_0's rmse: 1.93721
[1000]	valid_0's rmse: 1.93308
[1100]	valid_0's rmse: 1.92835
[1200]	valid_0's rmse: 1.92378
[1300]	valid_0's rmse: 1.91907
[1400]	valid_0's rmse: 1.91481
Train CA_2
[100]	valid_0's rmse: 1.94592
[200]	valid_0's rmse: 1.88849
[300]	valid_0's rmse: 1.87446
[400]	valid_0's rmse: 1.86574
[500]	valid_0's rmse: 1.85981
[600]	valid_0's rmse: 1.85406
[700]	valid_0's rmse: 1.84825
[800]	valid_0's rmse: 1.84245
[900]	valid_0's rmse: 1.83703
[1000]	valid_0's rmse: 1.83222
[1100]	valid_0's rmse: 1.82787
[1200]	valid_0's rmse: 1.823
[1300]	valid_0's rmse: 1.81871
[1400]	valid_0's rmse: 1.81398
Train CA_3
[100]	valid_0's rmse: 2.39039
[200]	valid_0's rmse: 2.34239
[300]	valid_0's rmse: 2.32603
[400]	valid_0's rmse: 2.31642


https://www.kaggle.com/c/m5-forecasting-accuracy/discussion/141515

* Iterative feature engineering seems to be the key in this competition.  

* Shifted sales and rolling mean on those shifted sales are very important features. 

* But when calculating shifted sales for lags under 28, you'll find some NA values in prediction part. The idea is to keep those features and calculate the missing values with your prediction by row.

**Question) I'm wondering how you didn't face overfitting? Adding a lag_1 feature increased my local score a lot, which resulted in 0.6+ LB score - obviously overfitting**

**Ans 1)** Hello, if you check top public kernels they don't use lag_1. If you use recent demand values, for example lag_1 the model will give high importance to that feature. Then if you are using predictions to make other predictions from period t1 to t28, each time you predict you will increase the error of the prediction. In the public notebooks they use rolling windows and don't use close lags. You can still overfit heavily. I believe that this technique should only be applied if you have a really good cv strategy, loss function and evaluation metric, then you can measure if the technique is good or overfits. 

**Ans 2)** This was an example to show the idea behind iterative prediction. I don't use lags under 7 in my model. If you use recent demand values, the model will predict almost the same values for the next 28 days. And lag_1 seems to be the worst one to be added.



In [None]:
########################### Predict
#################################################################################
# Create Dummy DataFrame to store predictions
all_preds = pd.DataFrame()

# Join back the Test dataset with 
# a small part of the training data 
# to make recursive features
base_test = get_base_test()

# Timer to measure predictions time 
main_time = time.time()

# Loop over each prediction day
# As rolling lags are the most timeconsuming
# we will calculate it for whole day
for PREDICT_DAY in range(1,29):    
    print('Predict | Day:', PREDICT_DAY)
    start_time = time.time()

    # Make temporary grid to calculate rolling lags
    grid_df = base_test.copy()
    grid_df = pd.concat([grid_df, df_parallelize_run(make_lag_roll, ROLS_SPLIT)], axis=1)
        
    for store_id in STORES_IDS:
        
        # Read all our models and make predictions
        # for each day/store pairs
        model_path = 'lgb_model_'+store_id+'_v'+str(VER)+'.bin' 
        if USE_AUX:
            model_path = AUX_MODELS + model_path

        estimator = pickle.load(open(model_path, 'rb'))

        day_mask = base_test['d']==(END_TRAIN+PREDICT_DAY)
        store_mask = base_test['store_id']==store_id
        mask = (day_mask)&(store_mask)
        
        base_test[TARGET][mask] = estimator.predict(grid_df[mask][MODEL_FEATURES])
    
    # Make good column naming and add 
    # to all_preds DataFrame
    temp_df = base_test[day_mask][['id',TARGET]]
    temp_df.columns = ['id','F'+str(PREDICT_DAY)]
    if 'id' in list(all_preds):
        all_preds = all_preds.merge(temp_df, on=['id'], how='left')
    else:
        all_preds = temp_df.copy()
        
    print('#'*10, ' %0.2f min round |' % ((time.time() - start_time) / 60),
                  ' %0.2f min total |' % ((time.time() - main_time) / 60),
                  ' %0.2f day sales |' % (temp_df['F'+str(PREDICT_DAY)].sum()))
    del temp_df
    
all_preds = all_preds.reset_index(drop=True)
all_preds

Predict | Day: 1
##########  3.08 min round |  3.08 min total |  39886.16 day sales |
Predict | Day: 2
##########  2.81 min round |  5.90 min total |  37211.32 day sales |
Predict | Day: 3
##########  2.87 min round |  8.77 min total |  37144.22 day sales |
Predict | Day: 4
##########  2.74 min round |  11.51 min total |  37086.93 day sales |
Predict | Day: 5
##########  2.72 min round |  14.24 min total |  42119.04 day sales |
Predict | Day: 6
##########  2.70 min round |  16.94 min total |  50305.81 day sales |
Predict | Day: 7
##########  2.70 min round |  19.64 min total |  51210.81 day sales |
Predict | Day: 8
##########  2.73 min round |  22.37 min total |  45091.40 day sales |
Predict | Day: 9
##########  2.75 min round |  25.13 min total |  39173.95 day sales |
Predict | Day: 10
##########  2.77 min round |  27.90 min total |  44194.20 day sales |
Predict | Day: 11
##########  2.72 min round |  30.62 min total |  45357.45 day sales |
Predict | Day: 12
##########  2.77 min round

Unnamed: 0,id,F1,F2,F3,F4,F5,F6,F7,F8,F9,F10,F11,F12,F13,F14,F15,F16,F17,F18,F19,F20,F21,F22,F23,F24,F25,F26,F27,F28
0,HOBBIES_1_001_CA_1_evaluation,0.849718,0.766912,0.748945,0.813540,0.984850,1.272751,1.094678,1.097630,0.818244,0.913100,0.848337,1.068129,1.309270,1.186695,0.911021,0.860743,0.810315,0.839017,0.979872,1.351622,1.152457,0.944095,0.850070,0.872288,0.927590,1.143401,1.319041,1.108332
1,HOBBIES_1_002_CA_1_evaluation,0.228174,0.194656,0.200360,0.214169,0.237317,0.272192,0.280836,0.247342,0.200689,0.251530,0.216942,0.258313,0.342075,0.297990,0.224338,0.245843,0.214052,0.230149,0.253136,0.337708,0.392192,0.245855,0.247194,0.254134,0.271694,0.294546,0.373396,0.429278
2,HOBBIES_1_003_CA_1_evaluation,0.541504,0.475263,0.508078,0.533174,0.687919,0.923820,0.830342,0.550683,0.525573,0.529625,0.495401,0.779132,0.862535,0.747308,0.507045,0.547849,0.539542,0.509137,0.700348,0.879990,0.805982,0.507383,0.479823,0.496301,0.527945,0.709358,0.829932,0.808339
3,HOBBIES_1_004_CA_1_evaluation,1.541005,1.270471,1.294067,1.364068,1.922668,2.466031,2.871018,2.106472,1.421426,1.297285,1.401627,2.108221,2.847047,2.927931,1.552260,1.546171,1.335854,1.373311,1.871098,2.416235,2.930809,1.587068,1.371584,1.399922,1.418863,1.946459,2.526661,2.642849
4,HOBBIES_1_005_CA_1_evaluation,1.041648,0.922786,0.852681,0.931059,1.077178,1.350164,1.403953,1.241066,0.995059,1.190673,1.061835,1.370120,1.668253,1.723640,1.071921,1.086633,0.998949,1.041960,1.299930,1.540042,1.506391,1.011189,0.922414,0.988061,1.038545,1.297870,1.578138,1.359324
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
30485,FOODS_3_823_WI_3_evaluation,0.466215,0.465679,0.437403,0.432958,0.543110,0.552366,0.635405,0.501099,0.516669,0.479204,0.682886,0.787413,0.766937,0.883945,0.737072,0.596781,0.619113,0.680866,0.609067,0.791860,0.928372,0.602856,0.637558,0.583942,0.475903,0.532390,0.573909,0.708841
30486,FOODS_3_824_WI_3_evaluation,0.273066,0.266707,0.215658,0.196562,0.215162,0.287363,0.300698,0.249392,0.250467,0.250110,0.383151,0.361797,0.369360,0.437544,0.408875,0.355632,0.363115,0.395374,0.269893,0.406288,0.443581,0.352459,0.420123,0.385973,0.283890,0.252263,0.299552,0.341406
30487,FOODS_3_825_WI_3_evaluation,0.644329,0.478499,0.486395,0.479846,0.508039,0.648407,0.739915,0.689965,0.549346,0.693051,1.043529,1.078145,0.869566,1.363269,1.166346,0.882813,1.038217,1.007224,0.803633,1.226270,1.440649,1.097728,1.068747,1.132445,0.788369,0.710622,0.843203,1.003375
30488,FOODS_3_826_WI_3_evaluation,1.069656,1.049936,1.032709,1.030447,1.089109,1.274393,1.199792,1.287380,1.084836,1.077667,1.332568,1.508025,1.361980,1.633923,1.284973,1.197610,1.208442,1.145599,1.120248,1.550011,1.634884,1.276400,1.653819,1.405048,1.209627,1.223637,1.432732,1.384436


In [None]:
########################### Export
#################################################################################
# Reading competition sample submission and
# merging our predictions
# As we have predictions only for "_validation" data
# we need to do fillna() for "_evaluation" items
submission = pd.read_csv(ORIGINAL+'sample_submission.csv')[['id']]
submission = submission.merge(all_preds, on=['id'], how='left').fillna(0)
submission.to_csv('submission_v'+str(VER)+'.csv', index=False)

In [None]:
# Summary

# Of course here is no magic at all.
# No "Novel" features and no brilliant ideas.
# We just carefully joined all
# our previous fe work and created a model.

# Also!
# In my opinion this strategy is a "dead end".
# Overfits a lot LB and with 1 final submission 
# you have no option to risk.


# Improvement should come from:
# Loss function
# Data representation
# Stable CV
# Good features reduction strategy
# Predictions stabilization with NN
# Trend prediction
# Real zero sales detection/classification


# Good kernels references 
## (the order is random and the list is not complete):
# https://www.kaggle.com/ragnar123/simple-lgbm-groupkfold-cv
# https://www.kaggle.com/jpmiller/grouping-items-by-stockout-pattern
# https://www.kaggle.com/headsortails/back-to-predict-the-future-interactive-m5-eda
# https://www.kaggle.com/sibmike/m5-out-of-stock-feature
# https://www.kaggle.com/mayer79/m5-forecast-attack-of-the-data-table
# https://www.kaggle.com/yassinealouini/seq2seq
# https://www.kaggle.com/kailex/m5-forecaster-v2
# https://www.kaggle.com/aerdem4/m5-lofo-importance-on-gpu-via-rapids-xgboost


# Features were created in these kernels:
## 
# Mean encodings and PCA options
# https://www.kaggle.com/kyakovlev/m5-custom-features
##
# Lags and rolling lags
# https://www.kaggle.com/kyakovlev/m5-lags-features
##
# Base Grid and base features (calendar/price/etc)
# https://www.kaggle.com/kyakovlev/m5-simple-fe


# Personal request
# Please don't upvote any ensemble and copypaste kernels
## The worst case is ensemble without any analyse.
## The best choice - just ignore it.
## I would like to see more kernels with interesting and original approaches.
## Don't feed copypasters with upvotes.

## It doesn't mean that you should not fork and improve others kernels
## but I would like to see params and code tuning based on some CV and analyse
## and not only on LB probing.
## Small changes could be shared in comments and authors can improve their kernel.

## Feel free to criticize this kernel as my knowlege is very limited
## and I can be wrong in code and descriptions. 
## Thank you.