# Carts Prediction
The prediction of aids user is going to add to cart is made in this notebook. The notebook uses input from "Carts Model" notebook, where the carts model is fitted and two "parallel" notebooks that produce w2vec features for carts, one for cross-validation set and half of the test set and the other one for the other half of the test set.

It was impossible to fit the model and make predictions in the same notebook, because of limitations of kaggle platform. On kaggle platform, notebooks with GPU have less memory available, and it was hard to fit all the required data into 13 GB of available RAM, so I had to move prediction to a different notebook without GPU support, but with 30Gb RAM available.
## Imports and definitions

In [1]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

import gc
from humanize import naturalsize
from lightgbm.sklearn import LGBMRanker
import joblib

# functions and classes common for several notebooks of current project
import otto_common

In [2]:
# This function was used to test new features before adding them to the pipeline.
# Now it only deletes the day_of_week column, which is used to construct some features.
def prepare_df(df):
    del df['day_of_week']
    return df

## Load the model and make predictions

In [3]:
# Load the model.
model = joblib.load('/kaggle/input/otto-model-carts/lgb.pkl')

In [4]:
# Load the chunks of test data and make predictions also in chunks.
# Paths to both chunks of test data.
file_path_0 = '/kaggle/input/otto-carts-w2vec/test_features_with_w2v_cart_part_0.parquet'
file_path_1 = '/kaggle/input/otto-carts-w2vec-part1/test_features_with_w2v_cart_part_1.parquet'

for i in range(2):
    print('Start predicting '+ str(i))
    j_max = 3
    for j in range(j_max):
        # Load and prepare the data.
        print('start loading')
        if i == 0:
            df_test = pd.read_parquet(file_path_0)
        else:
            df_test = pd.read_parquet(file_path_1)
        df_test = otto_common.divide_df_by_column(df_test, j_max, j, 'session')
        df_test = prepare_df(df_test)
        gc.collect()
        print('data prepared')
        x_cols = list(df_test.columns[2:])
        # Prediction itself.
        df_test['gbdt_prediction'] = model.predict(df_test[x_cols])
        print('Predictiion_made '+ str(i) + '__' + str(j))
        # Remove the features and combine predictions into a single dataframe.
        df_test = df_test[['session','cart_predictions','gbdt_prediction']]
        gc.collect()
        if (i == 0) & (j == 0):
            df_test_all = df_test
        else:
            df_test_all = pd.concat([df_test_all, df_test])
        del df_test
        gc.collect()
    print('Predictions made '+ str(i))

Start predicting 0
start loading
data prepared
Predictiion_made 0__0
start loading
data prepared
Predictiion_made 0__1
start loading
data prepared
Predictiion_made 0__2
Predictions made 0
Start predicting 1
start loading
data prepared
Predictiion_made 1__0
start loading
data prepared
Predictiion_made 1__1
start loading
data prepared
Predictiion_made 1__2
Predictions made 1


In [5]:
# Select top 20 candidates and format the prediction as required by organizers.
df_test_all = otto_common.select_top_20_and_format(df_test_all, 'cart_predictions','gbdt_prediction')

In [6]:
# Export the data to file.
df_test_all.to_parquet('gbdt_predictions.parquet')