# Introduction

In this competition, the fifth iteration, we will use hierarchical sales data from Walmart, the world’s largest company by revenue, to forecast daily sales for the next 28 days.
    The data, covers stores in three US States :
    
    -California,
    -Texas,
    -Wisconsin
    
   and includes item level:
   
    -department,
    -product categories,
    -store details. 
    
   In addition, it has explanatory variables such as:
   
    -price,
    -promotions,
    -day of the week,
    -special events. 
    
   Together, this robust dataset can be used to improve forecasting accuracy.
   
   ## Datasets
   
   

**calendar.csv** - Contains information about the dates on which the products are sold.

**sales_train_validation.csv** - Contains the historical daily unit sales data per product and store [d_1 - d_1913]

**sell_prices.csv** - Contains information about the price of the products sold per store and date.


## Imports

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))


## Loading and defining our data

In [None]:
sell_prices = pd.read_csv('/kaggle/input/m5-forecasting-accuracy/sell_prices.csv')
sell_prices.head()

In [None]:
sell_prices['id'] = sell_prices['item_id'] + '_' + sell_prices['store_id']
sell_prices['sell_price'].describe()

In [None]:
sales = pd.read_csv('/kaggle/input/m5-forecasting-accuracy/sales_train_evaluation.csv')
sales.head()

In [None]:
sales.info()

In [None]:
calendar = pd.read_csv('/kaggle/input/m5-forecasting-accuracy/calendar.csv')
calendar.head()

In [None]:
calendar.info()

In [None]:
sales_training = sales.iloc[:,6:]
sales_training.head()

### Now let's take a  look at our target

In [None]:
sample_submission = pd.read_csv('/kaggle/input/m5-forecasting-accuracy/sample_submission.csv')
sample_submission

As expected, we will have to prepare a 56 days forecast. First 28 will serve for a validation and the following 28 will remain unknown and will evaluate our final scorein the overall ranking

# Exploration and visualization
For that, we will first start with a simple exploration.
So let's plot some of our time series to see what are we gonna be dealing with.

In [None]:
rows = [0, 42, 1024, 10024]
fig, axes = plt.subplots(nrows=len(rows), ncols=1, figsize=(10,15))
for ax, row in zip(axes, rows):
    sales_training.iloc[row].plot(ax=ax)

As we can see, the data seems to be changing on a rapid daily basis. 
Interestingly enough, as promissed in the competition description, there seems to be an occasional **intermittency** - that is - the data sometimes stops changing and just stays at zero.

We will look into this later, for now, let's take a one more look, this time at the average monthly sales.

In [None]:
rows = [0, 42, 1024, 10024]
fig, axes = plt.subplots(nrows=len(rows), ncols=1, figsize=(10,15))
for ax, row in zip(axes, rows):
    sales_training.iloc[row].rolling(30).mean().plot(ax=ax)

# Preprocessing and data engineering

So far, no strong trends differ those series. But to make sure that we can efficiently train on all time series simultaneously, we can 'integrate' them by making them all stationery, and then normalize them.

We will do that by **discrete-differentiating** each and every one of them untill thy **Augmented Dickey-Fueller test** will give us an 95% chance that each series has no trend left!

To keep those series still representing out data, we will only put all those levels of differentiation and scales aside for the sake of forecasting and then integrate and scale them back up for a final result.

In [None]:
from statsmodels.tsa.stattools import adfuller

In [None]:
D = []

try:
    stationarity_differences = pd.read_csv('/kaggle/input/stationary-differences/stationarity_differences.csv').iloc[:,0]
except FileNotFoundError:
    for index, row in sales_training.iterrows():
        d = 0
        p_val = adfuller(row, autolag='AIC')[1]
        while p_val > 0.05:
            d += 1
            row = row.diff()[1:]
            p_val = adfuller(row, autolag='AIC')[1]
        D.append(d)
    pd.Series(D).to_csv('./stationarity_differences.csv', index=False)
    stationarity_differences = pd.read_csv('/kaggle/input/stationary-differences/stationarity_differences.csv').iloc[:,0]

In [None]:
stationarity_differences.value_counts()

In [None]:
sales_training.iloc[12791].plot()

Well.. we definitely should address that intermittency at some point.. But other that that, it looks pretty good.
Let's differenciate our data one single time (we will ignore this single anomaly, it's non-statonarity will not do us nearly any charm), and then scale them for their final form

In [None]:
stationary_train_sales = np.diff(sales_training.values, axis=1)

In [None]:
from sklearn.preprocessing import StandardScaler

In [None]:
scaler = StandardScaler(with_mean=False)
scaler.fit(stationary_train_sales.T)
X_train = scaler.transform(stationary_train_sales.T).T
scales = scaler.scale_

In [None]:
calendar

In [None]:
sales_normalized = calendar[['wm_yr_wk','d']].iloc[:1941]
sales_normalized = pd.DataFrame(X_train, columns=sales_normalized['d'][1:])
sales_normalized.insert(0, 'id', sales['item_id'] + '_' + sales['store_id'])
sales_normalized

In [None]:
rows = [0, 42, 1024, 10024]
fig, axes = plt.subplots(nrows=len(rows), ncols=1, figsize=(10,15))
for ax, row in zip(axes, rows):
    sales_normalized.iloc[row, 1:].plot(ax=ax)

**And there we have it!**
Every row is now stationery and scaled to a common point of reference! 

But before we go any further, let's check what happens if we try to bring it back.

In [None]:
sales_normalized

In [None]:
rows = [42, 10024]
fig, axes = plt.subplots(nrows=len(rows), ncols=1, figsize=(10,15))
for ax, row in zip(axes, rows):
    integrated_series = np.cumsum(sales_normalized.iloc[row, 1:]*scales[row])
    c = sales_training.iloc[row, 0]
    integrated_series = pd.Series(integrated_series + c).shift(1)
    integrated_series[:100].plot(ax=ax, style='r--', legend=True, label='re-integrated')
    sales_training.iloc[row][:100].plot(ax=ax, legend=True, label='original')
    total_numerical_error = np.abs(np.array(pd.Series(integrated_series)[1:].to_numpy() - sales_training.iloc[row,1:-1].to_numpy())).sum()
    ax.set_title('Total numerical error: {:.2f}'.format(total_numerical_error))

Great! It can be brought back to original function without nearly any numerical loss.


Now we just have to addres the intermittency and we are good to go! Let's start with analysing item-store relations in our sell_price data, maybe when the products were out of the store, their price was 0 or missing.

In [None]:
sns.distplot(sell_prices['id'].value_counts(), kde=False, axlabel='number of weeks the product was priced on')

Interesting, it seems like item-store relations differ in amount of days they were priced on. That might be responsible for our intermittency.

**Unfortunately** after some exploration and research into sell_price data, we had conclude that it was not very helpfull for figuring out intermittency. Apparently, sometimes the price for a product was just missing in the database, sometimes it was there even though not a single unit was sold in weeks. We had to figure out a different way to recognize this itermittency.

Let's try making subtitute assumption: 

**if the amount of items sold stayed as 0 for at least 2 weeks, we treat this part as intermittent**. That way, we will be able to very quickly evaluate which days not to consider for training. Even if that means ignoring some products that were being on the shelves, but rarely bought.. well we can't really do everythng here given the time we have, and that would be more in 'rare event analysis' category so the best model for that would probably be a standard exponential distribution anyway.

In [None]:
sales_two_week_sum = sales_training.rolling(14, axis=1).sum()

In [None]:
for col in range(13):
    sales_two_week_sum.iloc[:, col] = sales_two_week_sum.iloc[:, 13]
    
is_off_the_shelf = sales_two_week_sum == 0
#to the days when the products were off for 14 last days we add those 14 days
is_off_the_shelf = is_off_the_shelf | is_off_the_shelf.shift(-13, axis=1)
is_on_the_shelf = is_off_the_shelf == False
# True/False to 1/0
# is_on_the_shelf = is_on_the_shelf.astype('int')

In [None]:
is_on_the_shelf

Great! Let's see how our data works when we add this on-the-shelf information

In [None]:
rows = [0, 42, 1024, 10024]
fig, axes = plt.subplots(nrows=len(rows), ncols=1, figsize=(10,15))
for ax, row in zip(axes, rows):
    shelf = pd.DataFrame(is_on_the_shelf.iloc[row])
    shelf.columns = ['is_on_the_shelf']
    shelf['sold'] = sales_training.iloc[row]
    shelf = shelf.reset_index()
    shelf.drop('index', inplace=True, axis=1)
    shelf[shelf['is_on_the_shelf'] == True]['sold'].plot(legend=True, label='on shelf', ax=ax)
    shelf[shelf['is_on_the_shelf'] == False]['sold'].plot(style='o', legend=True, label='not on shelf', ax=ax)

# Feature engineering

Now that we have or data cleaned up, we can finally go to the next stage of our project - feature engineering!

So, the plan is - since data looks pretty tidy already we will simply encode all the categorical features, match them to the data and prepare a sort of a pipeline to get the series to generate 'train_x' and 'train_y'.

Let's pick features one by one, examine them and choose the suitable encoding.

In [None]:
sales['dept_id'].value_counts()

In [None]:
sales['cat_id'].value_counts()

In [None]:
sales['state_id'].value_counts()

In [None]:
from sklearn.preprocessing import OneHotEncoder

In [None]:
encoder = OneHotEncoder()
dept_encoded = encoder.fit_transform(sales['dept_id'].values.reshape(-1,1))
cat_encoded = encoder.fit_transform(sales['cat_id'].values.reshape(-1,1))
state_encoded = encoder.fit_transform(sales['state_id'].values.reshape(-1,1))

*sale_prices* table was quite in the wrong format, so we had to do some merging and pivoting to match the format of the rest of our data. All that needed a lot of RAM and time, so it was done on a separate kernel, that can be found [here](https://www.kaggle.com/patrykradon/train-price-parser/notebook). 

Another problem was that there was about 20% missing data there. Thats not ideal, especially that as we have mentioned - usually products were actally sold, despite their missing price.
But since it is that way and we have already taken care of products being off the shelf, let's just make another assumption:

**If someone did not care to note the price down, it probably did not change**, so all that we had to do was to impute the null values with those from before they went missing. If the beggining was missing we would fill it with one from after it first appeared. 

Since this other notebook was for parsing anyway, we did Pandas ffil() followed by bfill() before saving it. Of course we had to differentiate it once, because we have to be consistent about differentiation when it comes to time series.

Now all we have to do is fetch ready parsed data from there.

In [None]:
train_prices = pd.read_csv('/kaggle/input/train-price-parser/train_prices.csv')
#sort IDs to match our order of IDs
train_prices['id'] = train_prices['id'].astype("category")
correct_order = sales['item_id'] + '_' + sales['store_id']
train_prices['id'].cat.set_categories(correct_order, inplace=True)
train_prices = train_prices.sort_values(["id"]).reset_index().drop(columns=['index'])

In [None]:
print('train_prices - observed {:.2f}% missing values'.format(train_prices.isnull().sum(axis=1).mean()/1970 * 100))

In [None]:
from sklearn.impute import SimpleImputer

In [None]:
imputer = SimpleImputer(strategy='constant',fill_value='no_event')
imputed_calendar_primary = imputer.fit_transform(calendar['event_name_1'].to_numpy().reshape(-1,1))
imputed_calendar_secondary = imputer.fit_transform(calendar['event_name_2'].to_numpy().reshape(-1,1))

In [None]:
imputed_calendar = np.hstack((imputed_calendar_primary,imputed_calendar_secondary))

In [None]:
# a quick note - this has for some reason already beed 'differenciated', by which a mean the holidays lasting for few days
# are denoted as beggining and end of the holliday
encoder = OneHotEncoder()
calendar_encoded = encoder.fit_transform(imputed_calendar)

# the line meaning that no event happens dubled so we throw one out
calendar_encoded = calendar_encoded[:,:-1]

In [None]:
# never forget to equally differentiate every time series!
is_on_the_shelf_diff = is_on_the_shelf.diff(axis=1).iloc[:,1:]
is_on_the_shelf_diff = is_on_the_shelf_diff.astype('int')

## Train/test generator

In total, with the information we have gathered, we can plan out what our algorithm will try to do:

* t - any point in time where our observations take place
* p - amount of days that we will try to predict
* k - amount of days before the time window we want to forecast used by our algorithm 

## Intermittency

Now, we do not have a reliable data about products being on the shelf or not, so we will have to use our own estimate made by us in a previous section. Normally, it would be up for client to establish if the productss will be on the shelves or not during the forecasted time, but since it is a competiton we will make another asumption:

**if at the end of known period the produt was off the shelf, we predict that it will be off the shelf for the next p days**

## Algorithm

(on purple, we have coloured the <span style="color:purple">facts</span>, so the things we know about the values we try to predict.) 

**algorithm** <- sales_normalized(t : t+k), sales_normalized(t-365+k : t-365+k+p), <span style="color:purple">is_on_the_shelf(t+k : t+k+p), calendar_encoded(t+k : t+k+p), dept_encoded(t+k : t+k+p), cat_encoded(t+k : t+k+p), state_encoded(t+k : t+k+p)</span> 


**algorithm** -> sales_training(t+k : t+k+p) * <span style="color:purple">is_on_the_shelf(t+k : t+k+p)</span>


Keeping all that in mind, let's build a generator that will produce the input data.

In [None]:
class M5_SeriesGenerator:
    def __init__(self):
        self.day_zero = 1941
        self.max_rows = 30490
        self.rows_remaining = np.arange(self.max_rows)
        
    def reset(self):
        self.rows_remaining = np.arange(self.max_rows)
        
    def next_batch(self, in_points=30, out_points=3, batch_size=10):
        X_batch = []
        X_past_batch = []
        scale_batch = []
        c_batch = []
        facts_batch = []
        y_batch = []
        
        for _ in range(batch_size):
            if self.rows_remaining.shape[0] == 0:
                return False, (None, None)
        
            row = np.random.randint(self.rows_remaining.shape[0])
            self.rows_remaining = np.delete(self.rows_remaining, row)
            X_train_start = self.day_zero-366-in_points
            X_prev_year_start = self.day_zero-2*365

            while is_on_the_shelf.iloc[row, X_train_start+in_points] == False:
                if self.rows_remaining.shape[0] == 0:
                    return False, (None, None)
                row = np.random.randint(self.rows_remaining.shape[0])
                self.rows_remaining = np.delete(self.rows_remaining, row)

            Xsales_train = sales_normalized.iloc[row, X_train_start+1:X_train_start+in_points+1].values.astype(np.float32)
            Xsales_prev_year = sales_normalized.iloc[row, X_prev_year_start:X_prev_year_start+out_points].values.astype(np.float32)

            Y_train_start = X_train_start+in_points
            Yprices_train = train_prices.iloc[row, Y_train_start+2:Y_train_start+out_points+2].values.astype(np.float32)
            Yevents_train = calendar_encoded[Y_train_start+1:Y_train_start+out_points+1, :].toarray().astype(int)
            Ydept_train = np.tile(dept_encoded[row].toarray().astype(int),(out_points,1))
            Ycat_train = np.tile(cat_encoded[row].toarray().astype(int),(out_points,1))
            Ystate_train = np.tile(state_encoded[row].toarray().astype(int),(out_points,1))
            Ysales_train = sales_training.iloc[row, Y_train_start+1:Y_train_start+out_points+1].values.astype(int).flatten()
            
            Yfacts = np.hstack((Yprices_train.reshape(-1, 1), Yevents_train, Ydept_train, Ycat_train, Ystate_train))
            integral_constant = sales_training.iloc[row, X_train_start+in_points]
            scale = scales[row]
            
            X_batch.append(Xsales_train.reshape(-1, 1))
            X_past_batch.append(Xsales_prev_year.reshape(-1, 1))
            scale_batch.append(scale)
            c_batch.append(integral_constant)
            facts_batch.append(Yfacts)
            y_batch.append(Ysales_train)
        return True, ((np.asarray(X_batch), np.concatenate((np.asarray(X_past_batch), np.asarray(facts_batch)), axis=2), np.asarray(scale_batch), np.asarray(c_batch)), np.asarray(y_batch))

In [None]:
def eval_series_data_gen(in_points = 120, out_points=28, end_of_data=1913, max_row=30490):    
    row = 0
    while row < max_row:
        X_batch = []
        X_past_batch = []
        scale_batch = []
        c_batch = []
        facts_batch = []
        y_batch = []
        X_train_start = end_of_data-1-in_points
        X_prev_year_start = end_of_data-365
        
        if is_on_the_shelf.iloc[row, X_train_start+in_points] == False:
            row += 1
            yield False, None
        else:
            Xsales_train = sales_normalized.iloc[row, X_train_start+1:X_train_start+in_points+1].values.astype(np.float32)
            Xsales_prev_year = sales_normalized.iloc[row, X_prev_year_start:X_prev_year_start+out_points].values.astype(np.float32)

            Y_train_start = X_train_start+in_points
            Yprices_train = train_prices.iloc[row, Y_train_start+2:Y_train_start+out_points+2].values.astype(np.float32)
            Yevents_train = calendar_encoded[Y_train_start+1:Y_train_start+out_points+1, :].toarray().astype(int)
            Ydept_train = np.tile(dept_encoded[row].toarray().astype(int),(out_points,1))
            Ycat_train = np.tile(cat_encoded[row].toarray().astype(int),(out_points,1))
            Ystate_train = np.tile(state_encoded[row].toarray().astype(int),(out_points,1))
            Ysales_train = sales_training.iloc[row, Y_train_start+1:Y_train_start+out_points+1].values.astype(int).flatten()

            Yfacts = np.hstack((Yprices_train.reshape(-1, 1), Yevents_train, Ydept_train, Ycat_train, Ystate_train))
            integral_constant = sales_training.iloc[row, X_train_start+in_points]
            scale = scales[row]

            X_batch.append(Xsales_train.reshape(-1, 1))
            X_past_batch.append(Xsales_prev_year.reshape(-1, 1))
            scale_batch.append(scale)
            c_batch.append(integral_constant)
            facts_batch.append(Yfacts)
            y_batch.append(Ysales_train)
            row += 1
            yield True, (np.asarray(X_batch), np.concatenate((np.asarray(X_past_batch), np.asarray(facts_batch)), axis=2), np.asarray(scale_batch), np.asarray(c_batch))

# Model selection

Now that we have our inputs and desired outputs, it is time to build a model that will learn to connect these two.

For this project we will use  **RNN layers with GRU memory units** for the time series inputs, along with auxhilary inputs for our facts. All that will go through one **final dense layer** hopefully giving us complex enough model to succesfully forecast the 56 following days.

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import BatchNormalization

In [None]:
class M5_Net(keras.Model):
    def __init__(self, input_timesteps, output_timesteps, batch_size=1):
        super(M5_Net, self).__init__()
        self.input_timesteps = input_timesteps
        self.output_timesteps = output_timesteps
        self.batch_size = batch_size

        self.gru1 = tf.keras.layers.GRU(32, return_sequences=True)
        self.gru1a = tf.keras.layers.GRU(64, return_sequences=True)
        self.gru2 = tf.keras.layers.GRU(64, return_sequences=True)
        self.gru2a = tf.keras.layers.GRU(32, return_sequences=True)
        self.gru_out = tf.keras.layers.GRU(1, return_sequences=True)
        self.dense1 = keras.layers.Dense(self.output_timesteps, activation="selu", kernel_initializer="lecun_normal")
        
    def call(self, input_data):
        series_data, historical_data, scale, integral_constant = input_data
        
        x = BatchNormalization()(self.gru1(series_data))
        x = BatchNormalization()(self.gru1a(x))
        x = tf.reshape(x, [self.batch_size, -1])
        x = BatchNormalization()(self.dense1(x))
        x = tf.reshape(x, [self.batch_size, -1, 1])
        x = tf.concat([x,
                       historical_data,
                       np.expand_dims(np.tile(integral_constant, (self.output_timesteps,1)).T, axis=2),
                       np.expand_dims(np.tile(scale, (self.output_timesteps,1)).T, axis=2)
                      ], axis=2)
        x = BatchNormalization()(self.gru2(x))
        x = BatchNormalization()(self.gru2a(x))
        x = BatchNormalization()(self.gru_out(x))
        x = tf.reshape(x, [self.batch_size, -1])
        
        @tf.function
        def inverse_normalize(x):
            sales_pred = tf.transpose(tf.math.multiply(tf.transpose(x), y=scale))
            sales_pred = tf.math.cumsum(sales_pred, axis=1)
            sales_pred += np.tile(integral_constant, (self.output_timesteps,1)).T
            return sales_pred
        
        sales_pred = inverse_normalize(x)
        return sales_pred

# Training and tuning the model

It's time to make a training loop, put on some metrics, initialize our parameters and run our model through a couple of epochs.

In [None]:
from math import sqrt

IN_POINTS = 120
OUT_POINTS = 28
BATCH_SIZE = 16
model = M5_Net(input_timesteps=IN_POINTS, output_timesteps=OUT_POINTS, batch_size=BATCH_SIZE)

loss_object = tf.keras.losses.MeanSquaredError()
optimizer = tf.keras.optimizers.Adam(learning_rate=0.01)

def loss(model, x, y, training):
    y_ = model(x, training=training)

    return loss_object(y_true=y, y_pred=y_)

def grad(model, inputs, targets):
    with tf.GradientTape() as tape:
        loss_value = loss(model, inputs, targets, training=True)
    return loss_value, tape.gradient(loss_value, model.trainable_variables)


In [None]:
# model.load_weights('/kaggle/input/checkpoints2/croc_model2.ckpt')

In [None]:

M5_series_gen = M5_SeriesGenerator()
batch_sequence = [4, 4, 4, 4]
training = []
VAL_SIZE = 1000
validation = []
for epoch in range(len(batch_sequence)):
    BATCH_SIZE = batch_sequence[epoch]
    model.batch_size = BATCH_SIZE
    epoch_loss = []
    more_data_available, (X_train, y_train) = M5_series_gen.next_batch(in_points=IN_POINTS, out_points=OUT_POINTS, batch_size=BATCH_SIZE)
    while True:
        more_data_available, (X_train, y_train) = M5_series_gen.next_batch(in_points=IN_POINTS, out_points=OUT_POINTS, batch_size=BATCH_SIZE)
        if more_data_available == False:
            break;
            
        loss_value, grads = grad(model, X_train, y_train)
        optimizer.apply_gradients(zip(grads, model.trainable_variables))
        clipped_loss = loss_object(y_true=y_train, y_pred=tf.clip_by_value(model(X_train, training=True), clip_value_min=0, clip_value_max=np.inf))
        epoch_loss.append(sqrt(clipped_loss))
    training.append(np.array(epoch_loss).mean())
    epoch_val = []
    for on, X_val in eval_series_data_gen(in_points=IN_POINTS, out_points=OUT_POINTS, end_of_data=1913, max_row=VAL_SIZE):
        model.batch_size = 1
        if on:
            val = tf.clip_by_value(model(X_val, training=True), clip_value_min=0, clip_value_max=np.inf).numpy().squeeze()
            epoch_val.append(val)
        else:
            epoch_val.append(np.zeros(OUT_POINTS))
        model.batch_size = BATCH_SIZE
    validation.append(np.array(epoch_val).mean())
    print(training[-1], validation[-1])
    print(f'Epoch {epoch} training loss: {training[-1]}, Epoch {epoch} validation loss: {validation[-1]}')
    model.save_weights('./croc_model{}.ckpt'.format(epoch))
    M5_series_gen.reset()

In [None]:
pd.Series(training).plot(legend=True, label='training')
pd.Series(validation).plot(legend=True, label='validation')

# Visualizing the outcome

Finally, we can visualize our outcome by drawing it along with ground truth

In [None]:
N = 16
model.batch_size = N
_, (X_train, y_train) = M5_series_gen.next_batch(in_points=IN_POINTS, out_points=OUT_POINTS, batch_size=N)
rows = np.arange(N)
fig, axes = plt.subplots(nrows=len(rows), ncols=1, figsize=(10,N*3))
y_ = tf.clip_by_value(model(X_train, training=True), clip_value_min=0, clip_value_max=np.inf)
for ax, row in zip(axes, rows):
    pd.Series(y_train[row]).plot(legend=True, label='ground truth',ax=ax)
    pd.Series(y_[row]).plot(legend=True, label='forecast',ax=ax)


So to sum up, it seems like it sometimes recognizes some specific behafior it tries to mimic, and sometimes it just classifies the outcome as too random and just outcomes an average.

# Final results

We can now try to predict our validation and evaluation data, throw it at the website to rank us and visualize the outcome. First, let's see how we did - the result will be seend as the score for this notebook.

In [None]:
IN_POINTS = 120
OUT_POINTS = 28
model.batch_size = 1

validation = []
evaluation = []
iteration = 0
for  on, X in eval_series_data_gen(in_points=IN_POINTS, out_points=OUT_POINTS, end_of_data=1913):
    if on:
        val = tf.clip_by_value(model(X, training=True), clip_value_min=0, clip_value_max=np.inf).numpy().squeeze()
        validation.append(val)
    else:
        validation.append(np.zeros(OUT_POINTS))
        
for  on, X in eval_series_data_gen(in_points=IN_POINTS, out_points=OUT_POINTS, end_of_data=1941):
    if on:
        ev = tf.clip_by_value(model(X, training=True), clip_value_min=0, clip_value_max=np.inf).numpy().squeeze()
        evaluation.append(ev)
    else:
        evaluation.append(np.zeros(OUT_POINTS))

sample_submission.iloc[:30490, 1:] = validation
sample_submission.iloc[30490:, 1:] = evaluation
        
sample_submission.to_csv('./final_prediction.csv', index=False)
    

# Summary

The project turned out to be very edecational in very different ways. We know that training it specificaly for validation days was a little bit of a cheat and we wish we had enough time to make it an general tool for such tasks but for what we had, it turned out very satisfying. 