# Transpiling Models with m2cgen

This is for the [Jane Street Market Prediction][1] competition, which presents us with the unique challenge of predicting for one row of the test set at a time.

Here is an idea that might help: *transpile* models to C/C++, *compile* that to a library, *load* the library and *execute* native code, from Python.

It is easier than it sounds! I will use [m2cgen][2], here is their documentation:

<blockquote>
    <b>m2cgen</b> (Model 2 Code Generator) - is a lightweight library which provides an easy way to transpile trained statistical models into a native code (Python, C, Java, Go, JavaScript, Visual Basic, C#, PowerShell, R, PHP, Dart, Haskell, Ruby, F#).
</blockquote>

As mentioned in this [link][2], they support all these models:


### Classification
[AdaGradClassifier](https://duckduckgo.com/?q=AdaGradClassifier),
[CDClassifier](https://duckduckgo.com/?q=CDClassifier),
[DecisionTreeClassifier](https://duckduckgo.com/?q=DecisionTreeClassifier),
[ExtraTreeClassifier](https://duckduckgo.com/?q=ExtraTreeClassifier),
[ExtraTreesClassifier](https://duckduckgo.com/?q=ExtraTreesClassifier),
[FistaClassifier](https://duckduckgo.com/?q=FistaClassifier),
[KernelSVC](https://duckduckgo.com/?q=KernelSVC),
[LGBMClassifier](https://duckduckgo.com/?q=LGBMClassifier),
[LGBMClassifier](https://duckduckgo.com/?q=LGBMClassifier),
[LinearSVC](https://duckduckgo.com/?q=LinearSVC),
[LinearSVC](https://duckduckgo.com/?q=LinearSVC),
[LogisticRegression](https://duckduckgo.com/?q=LogisticRegression),
[LogisticRegressionCV](https://duckduckgo.com/?q=LogisticRegressionCV),
[NuSVC](https://duckduckgo.com/?q=NuSVC),
[PassiveAggressiveClassifier](https://duckduckgo.com/?q=PassiveAggressiveClassifier),
[Perceptron](https://duckduckgo.com/?q=Perceptron),
[RandomForestClassifier](https://duckduckgo.com/?q=RandomForestClassifier),
[RidgeClassifier](https://duckduckgo.com/?q=RidgeClassifier),
[RidgeClassifierCV](https://duckduckgo.com/?q=RidgeClassifierCV),
[SAGAClassifier](https://duckduckgo.com/?q=SAGAClassifier),
[SAGClassifier](https://duckduckgo.com/?q=SAGClassifier),
[SDCAClassifier](https://duckduckgo.com/?q=SDCAClassifier),
[SGDClassifier](https://duckduckgo.com/?q=SGDClassifier),
[SGDClassifier](https://duckduckgo.com/?q=SGDClassifier),
[SVC](https://duckduckgo.com/?q=SVC),
[XGBClassifier](https://duckduckgo.com/?q=XGBClassifier)
and
[XGBRFClassifier](https://duckduckgo.com/?q=XGBRFClassifier).

### Regression
[ARDRegression](https://duckduckgo.com/?q=ARDRegression),
[AdaGradRegressor](https://duckduckgo.com/?q=AdaGradRegressor),
[BayesianRidge](https://duckduckgo.com/?q=BayesianRidge),
[CDRegressor](https://duckduckgo.com/?q=CDRegressor),
[DecisionTreeRegressor](https://duckduckgo.com/?q=DecisionTreeRegressor),
[ElasticNet](https://duckduckgo.com/?q=ElasticNet),
[ElasticNetCV](https://duckduckgo.com/?q=ElasticNetCV),
[ExtraTreeRegressor](https://duckduckgo.com/?q=ExtraTreeRegressor),
[ExtraTreesRegressor](https://duckduckgo.com/?q=ExtraTreesRegressor),
[FistaRegressor](https://duckduckgo.com/?q=FistaRegressor),
[GammaRegressor](https://duckduckgo.com/?q=GammaRegressor),
[HuberRegressor](https://duckduckgo.com/?q=HuberRegressor),
[LGBMRegressor](https://duckduckgo.com/?q=LGBMRegressor),
[LGBMRegressor](https://duckduckgo.com/?q=LGBMRegressor),
[Lars](https://duckduckgo.com/?q=Lars),
[LarsCV](https://duckduckgo.com/?q=LarsCV),
[Lasso](https://duckduckgo.com/?q=Lasso),
[LassoCV](https://duckduckgo.com/?q=LassoCV),
[LassoLars](https://duckduckgo.com/?q=LassoLars),
[LassoLarsCV](https://duckduckgo.com/?q=LassoLarsCV),
[LassoLarsIC](https://duckduckgo.com/?q=LassoLarsIC),
[LinearRegression](https://duckduckgo.com/?q=LinearRegression),
[LinearSVR](https://duckduckgo.com/?q=LinearSVR),
[LinearSVR](https://duckduckgo.com/?q=LinearSVR),
[NuSVR](https://duckduckgo.com/?q=NuSVR),
[OrthogonalMatchingPursuit](https://duckduckgo.com/?q=OrthogonalMatchingPursuit),
[OrthogonalMatchingPursuitCV](https://duckduckgo.com/?q=OrthogonalMatchingPursuitCV),
[PassiveAggressiveRegressor](https://duckduckgo.com/?q=PassiveAggressiveRegressor),
[PoissonRegressor](https://duckduckgo.com/?q=PoissonRegressor),
[RANSACRegressor](https://duckduckgo.com/?q=RANSACRegressor),
[RandomForestRegressor](https://duckduckgo.com/?q=RandomForestRegressor),
[Ridge](https://duckduckgo.com/?q=Ridge),
[RidgeCV](https://duckduckgo.com/?q=RidgeCV),
[SAGARegressor](https://duckduckgo.com/?q=SAGARegressor),
[SAGRegressor](https://duckduckgo.com/?q=SAGRegressor),
[SDCARegressor](https://duckduckgo.com/?q=SDCARegressor),
[SGDRegressor](https://duckduckgo.com/?q=SGDRegressor),
[SVR](https://duckduckgo.com/?q=SVR),
[TheilSenRegressor](https://duckduckgo.com/?q=TheilSenRegressor),
[TweedieRegressor](https://duckduckgo.com/?q=TweedieRegressor),
[XGBRFRegressor](https://duckduckgo.com/?q=XGBRFRegressor),
and
[XGBRegressor](https://duckduckgo.com/?q=XGBRegressor).

<!--
pbpaste | perl -ne 'chomp; print "[", $_, "](https://duckduckgo.com/?q=$_),\n"' | pbcopy
-->

[2]: https://github.com/BayesWitnesses/m2cgen
[1]: https://www.kaggle.com/c/jane-street-market-prediction


In [1]:
!pip install m2cgen --quiet --quiet

In [2]:
import numpy as np
import pandas as pd
import m2cgen as m2c
import lightgbm as lgb
import xgboost as xgb
import ctypes
import io
from numpy.ctypeslib import ndpointer

In [3]:
PATH = '../input/jane-street-market-prediction'
NA_REPLACEMENT = -9999.0
N_TREES = 500
N_TEST_ROWS_TO_SIMULATE = None # None means all
N_JOBS = 4

Just using a subset of data to speed up training.

In [4]:
train = pd.read_csv(f'{PATH}/train.csv', nrows=300000)
train.shape

In [5]:
train = train[train['weight'] != 0]
train.shape

In [6]:
exclude = [
    'date', 'weight', 'resp_1', 'resp_2', 'resp_3', 'resp_4', 'resp', 'ts_id'
]
inputs = train.drop(exclude, axis=1)
inputs.shape

In [7]:
features = inputs.columns
len(features)

- `m2cgen` does not seem to handle NaN input well so we have to replace it
- `m2cgen` creates a model that uses the `double` data type which is 64 bits - we *could* train on float32 but I am using the x array for making predictions later and using float64 simplifies things.
- The compiled model expects a pointer to a single row of input (`double*`), so we need input arrays to be in "C" order where rows follow rows in the memory layout

In [8]:
x = inputs.fillna(NA_REPLACEMENT).astype('float64').values.copy('C')
x.shape

In [9]:
y = (train.eval('weight * resp') > 0).astype('int')
y.shape

In [10]:
%%time
reg = lgb.LGBMRegressor(n_estimators=N_TREES, n_jobs=N_JOBS)
reg.fit(x, y)

In [11]:
training_predictions = pd.Series(reg.predict(x))
training_predictions.head()

In [12]:
%%time
code = m2c.export_to_c(reg)
len(code)

In [13]:
with open('model.c', 'w') as f:
    f.write(code)

In [14]:
!wc model.c

We just generated over 60,000 lines of source code, in ~10 seconds. Or: 6,000 LoC per *second*. :)

Notice that whilst LGB bins the data columns into histograms for training, the model itself does not refer to them, it holds raw feature values.

In [15]:
!head model.c

Now compile the model into a shared code library.

*This code first seen in this excellent Notebook [fast scoring using C (42 usec)][1] by [sekrier][2] (Python notebook using data from [Santa's Workshop Tour 2019][3]).*

[1]: https://www.kaggle.com/sekrier/fast-scoring-using-c-42-usec
[2]: https://www.kaggle.com/sekrier
[3]: https://www.kaggle.com/c/santa-workshop-tour-2019

In [16]:
!gcc -Ofast -shared -o lgb_score.so -fPIC model.c
!ls -l lgb_score.so

We can load the library right away, into our running notebook

In [17]:
lib = ctypes.CDLL('./lgb_score.so')
score = lib.score
# Define the types of the output and arguments of this function.
score.restype = ctypes.c_double
score.argtypes = [ndpointer(ctypes.c_double)]

In [18]:
compiled_predictions = pd.Series([score(row) for row in x])
compiled_predictions.head()

Looks like they match! But test them all:

In [19]:
(training_predictions - compiled_predictions).describe()

Some tiny discrepancies - probably to do with how LightGBM uses threading and aggregates results.

Now slice one row of data to test timing:

In [20]:
one_row = x[[0], :]
one_row.shape

# Timing Test

`reg` is a traditional LGBMRegressor and `score` is our compiled model function.

In [21]:
%timeit reg.predict(one_row)

In [22]:
%timeit score(one_row)

In microseconds, we went from over 200 down to ~15, over 10x faster :)

Beware that this is optimistic.
We ran a prediction for the same row over and over again.
CPU's have excellent [branch prediction algorithms][1] that can actually *learn* which pathways in our compiled code will be taken in the future.
(Learning patterns from data collected in the past to improve performance in the future... hmm.)

The real test is later - is it actually faster to create a submission?

[1]: https://en.wikipedia.org/wiki/Branch_predictor

# Reusable Version

In [23]:
class CompiledModel:
    def __init__(self, library):
        self.lib = ctypes.CDLL(library)
        score = self.lib.score
        score.restype = ctypes.c_double
        score.argtypes = [ndpointer(ctypes.c_double)]
        self.score = score

    def predict(self, x):
        return self.score(x)
    
model = CompiledModel('./lgb_score.so')

# Simulate

Read the test set into memory then use Pandas to iterate over it in a slow way!

In [24]:
with open(f'{PATH}/example_test.csv', 'rb') as f:
    test_csv = f.read()

def simulator(nrows=N_TEST_ROWS_TO_SIMULATE):
    for chunk in pd.read_csv(io.BytesIO(test_csv), chunksize=1, nrows=nrows):
        yield chunk, pd.DataFrame({'action': [0]})

Yep, ~13 seconds for 1000 rows is &nbsp;&nbsp;&nbsp;&nbsp;s&nbsp;&nbsp;&nbsp;l&nbsp;&nbsp;&nbsp;o&nbsp;&nbsp;&nbsp;w

In [25]:
%%time
len(list(simulator(nrows=1000)))

One array - copy features into this and use it for every iteration.

In [26]:
input_row = np.zeros((1, len(features)), dtype=np.float64, order='C')

# Test 1: Bare Loop

In [27]:
%%time
for (test_df, sample_prediction_df) in simulator():
    pass

# Test 2: Feature Prep Only

Setting up feature array adds only a little time.

In [28]:
%%time
for (test_df, sample_prediction_df) in simulator():
    input_row[:] = test_df[features].fillna(NA_REPLACEMENT)
    #env.predict(sample_prediction_df)

# Test 3: Feature Prep and LGB

Using LightGBM itself we see all four cores are used and are busy for ~18 minutes between them; but 4-5 minutes real time.

In [29]:
%%time
for (test_df, sample_prediction_df) in simulator():
    input_row[:] = test_df[features].fillna(NA_REPLACEMENT)
    sample_prediction_df.action = (reg.predict(input_row) > 0.5) * 1
    #env.predict(sample_prediction_df)

# Test 4: Feature Prep and Compiled Model

Using our compiled model with one thread ***is*** faster and adds only 10-15 seconds to "Test 2: Feature Prep Only"

In [30]:
%%time
for (test_df, sample_prediction_df) in simulator():
    input_row[:] = test_df[features].fillna(NA_REPLACEMENT)
    sample_prediction_df.action = (model.predict(input_row) > 0.5) * 1
    #env.predict(sample_prediction_df)

# Submit

Try the real submission API just to check the timing against what we saw above.

In [31]:
import janestreet
env = janestreet.make_env()  # initialize the environment
iter_test = env.iter_test()  # an iterator which loops over the test set

In [32]:
%%time
for (test_df, sample_prediction_df) in iter_test:
    input_row[:] = test_df[features].fillna(NA_REPLACEMENT)
    pred = model.predict(input_row)
    sample_prediction_df.action = (pred > 0.5) * 1
    env.predict(sample_prediction_df)

# Conclusions

`m2cgen` looks really good!

A 500 tree LGBM model is very fast, nearly all the time is taken up by the slow Python iteration loop.

Some compiled models could benefit from compiler optimisations, for example if a condition like `(input[42] < 1.23456)` appears many times the result can be cached on the stack or in a register and re-used.
(Compiler optimisations can be impressively complex, making it a hard puzzle to work out how the compiler got from the source code to the machine code!)

Settings are important: deeper trees will require more time per tree.
LightGBM's default settings do not limit the depth of trees; a deeper tree may require more feature tests on average.
But `m2cgen` works with XGBoost and all those other `sklearn` models.

It's a very easy step to add to the end of your training Notebook, save an `.so` library as an extra output.
But it is also flexible: if you save the model as text or a pickle it's possible to "transpile" to a language like C later.

### Alternate Uses

Even if you don't want the added complexity of this in your submission code, it might be useful for faster predictions when doing permutation testing (shuffle one column of validation data at a time, make predictions and track the validation metrics to see how important each feature is).

As mentioned in the [Hacker News thread][1] - super light-weight, dependency free deployment of real systems.


### Alternatives

LightGBM has a [convert_model][3] command line option which supports "cpp".

Someone listed other options from that same [Hacker News thread][1]:

> <em><b>jononor</b> on Mar 5, 2019 [–]</em>
> 
> Other similar projects:
> 
> https://github.com/nok/sklearn-porter Supports many scikit-learn models to Java/C/JavaScript/Go/Ruby, at least since 2016.
> 
> https://github.com/konstantint/SKompiler transpiles to Excel/SQL
> 
> https://github.com/jonnor/emlearn To C only, focus on microcontrollers/embedded devices. Includes feature extraction tools also. Disclaimer: I wrote it. 

**Update** This [recent notebook][5] shows another good alternative for XGBoost models using [Treelite][4]:

https://www.kaggle.com/code1110/janestreet-faster-inference-by-xgb-with-treelite


[1]: https://news.ycombinator.com/item?id=19307740
[2]: http://wiki.c2.com/?PrematureOptimization
[3]: https://lightgbm.readthedocs.io/en/latest/Parameters.html#convert-parameters
[4]: https://treelite.readthedocs.io/en/latest
[5]: https://www.kaggle.com/code1110/janestreet-faster-inference-by-xgb-with-treelite
