<center><h2>Jane Street Market Prediction | fast inference by xgb with treelite | katsu1110 </h2></center><hr>

First of all, sorry for those who have seen my previous kernel with very similar content. It was published unintentinally while I was experimenting, so I deleted the one.

Here, the concept is the same: using [Treelite](https://treelite.readthedocs.io/en/latest/index.html) for a faster inference with a GBDT model. 

![](https://treelite.readthedocs.io/en/latest/_static/benchmark_plot.svg)

Treelite has been used in work or even kaggle when the inference time of a GBDT plays an important role in deployment. In my naive experiment, I can confirm that **using treelite boosts my XGB's inference speed 2-3x**　(I noticed that how much faster varies everytime but consistently faster).

Such acceleration may be helpful for, say, model ensembles because the inference time in this competition is quite limited.

This notebook loads the feather data from [my another notebook](https://www.kaggle.com/code1110/janestreet-save-as-feather?scriptVersionId=47635784).

This notebook treats this task as a binary classification.


# Install treelite

In [None]:
!pip --quiet install ../input/treelite/treelite-0.93-py3-none-manylinux2010_x86_64.whl

In [None]:
!pip --quiet install ../input/treelite/treelite_runtime-0.93-py3-none-manylinux2010_x86_64.whl

In [None]:
import numpy as np
import pandas as pd

import os, sys
import gc
import math
import random
import pathlib
from tqdm import tqdm
from typing import List, NoReturn, Union, Tuple, Optional, Text, Generic, Callable, Dict
from sklearn.preprocessing import MinMaxScaler, StandardScaler, QuantileTransformer
from sklearn.decomposition import PCA
from sklearn import linear_model
import operator
import xgboost as xgb
import lightgbm as lgb
from tqdm import tqdm

# treelite
import treelite
import treelite_runtime 

# visualize
import matplotlib.pyplot as plt
import matplotlib.style as style
import seaborn as sns
from matplotlib_venn import venn2
from matplotlib import pyplot
from matplotlib.ticker import ScalarFormatter
sns.set_context("talk")
style.use('fivethirtyeight')
pd.options.display.max_columns = None

import warnings
warnings.filterwarnings('ignore')

# Config

In [None]:
SEED = 2021 # Happy new year!
# INPUT_DIR = '../input/jane-street-market-prediction/'
START_DATE = 85
INPUT_DIR = '../input/janestreet-save-as-feather/'
TRADING_THRESHOLD = 0.50 # 0 ~ 1: The smaller, the more aggressive

# Load data

In [None]:
os.listdir(INPUT_DIR)

In [None]:
%%time

def load_data(input_dir=INPUT_DIR):
    train = pd.read_feather(pathlib.Path(input_dir + 'train.feather'))
    features = pd.read_feather(pathlib.Path(input_dir + 'features.feather'))
    example_test = pd.read_feather(pathlib.Path(input_dir + 'example_test.feather'))
    ss = pd.read_feather(pathlib.Path(input_dir + 'example_sample_submission.feather'))
    return train, features, example_test, ss

train, features, example_test, ss = load_data(INPUT_DIR)

In [None]:
print(train.shape)
train.head()

In [None]:
del features, example_test, ss
gc.collect()

In [None]:
# reduce train
train = train.query(f'date > {START_DATE}')

# Model fitting
For now, let's use a simple XGBoost which is also used in the example in the Numerai Tournament.

In [None]:
# remove weight = 0 for saving memory 
original_size = train.shape[0]
train = train.query('weight > 0').reset_index(drop=True)

print('Train size reduced from {:,} to {:,}.'.format(original_size, train.shape[0]))

In [None]:
# feats
feats = train.columns[train.columns.str.startswith('feature')].values.tolist()

print('{} features used'.format(len(feats)))

In [None]:
# target
train['action'] = train['resp'] * train['weight']


In [None]:
%%time

# same hyperparameters from https://www.kaggle.com/hamditarek/market-prediction-xgboost-with-gpu-fit-in-1min?scriptVersionId=48127254
params = {
    'colsample_bytree': 0.72,                 
    'learning_rate': 0.08,
    'max_depth': 7,
    'subsample': 0.8,
    'seed': SEED,
    'n_estimators': 480,
#     'tree_method': 'gpu_hist' # Let's use GPU for a faster experiment
}
params["objective"] = 'binary:logistic'
params["eval_metric"] = 'logloss'
train['action'] = 1 * (train['action'] > 0) # binary classification
# model = xgb.XGBClassifier(**params)
# model.fit(train[feats], train['action'], verbose=100)

In [None]:
# fit
dtrain = xgb.DMatrix(train[feats].values, label=train['action'].values)
bst = xgb.train(params, dtrain, 100, [(dtrain, 'train')])

# Compile with Treelite
Simply follow the tutorial: https://treelite.readthedocs.io/en/latest/tutorials/first.html

In [None]:
# pass to treelite
model = treelite.Model.from_xgboost(bst)

In [None]:
# generate shared library
toolchain = 'gcc'
model.export_lib(toolchain=toolchain, libpath='./mymodel.so',
                 params={'parallel_comp': 32}, verbose=True)

In [None]:
# predictor from treelite
predictor = treelite_runtime.Predictor('./mymodel.so', verbose=True)

# Speed Test
I use a dummy data to see how faster the inference with treelite can get.

In [None]:
# dummy data
np.random.seed(SEED)
N = 10000
dummy_data = np.random.rand(N, len(feats))

In [None]:
%%time

# normal xgb
predicted_normal = bst.predict(xgb.DMatrix(dummy_data))

In [None]:
%%time

# treelite
batch = treelite_runtime.Batch.from_npy2d(dummy_data)
predicted_treelite = predictor.predict(batch)

In [None]:
predicted_normal == predicted_treelite

So, at least 2x (maybe 3x) faster with the same prediction results?

# Submit

In [None]:
import janestreet
env = janestreet.make_env() # initialize the environment
iter_test = env.iter_test() # an iterator which loops over the test set

In [None]:
for (test_df, pred_df) in tqdm(iter_test):
    if test_df['weight'].item() > 0:
        # inference with treelite
        batch = treelite_runtime.Batch.from_npy2d(test_df[feats].values)
        pred_df.action = (predictor.predict(batch) > TRADING_THRESHOLD).astype('int')
    else:
        pred_df.action = 0
    env.predict(pred_df)