KPHP ML implementation: a fast tiny xgboost/catboost prediction kernel by mkornaukhov · Pull Request #983 · VKCOM/kphp

mkornaukhov · 2024-04-18T14:22:12Z

About .kml files and kphp_ml in general

KML means "KPHP ML", since it was invented for KPHP and VK.com.
KML unites xgboost and catboost (prediction only, not learning).
KML models are stored in files with .kml extension.

KML is several times faster compared to native xgboost and almost identical compared to native catboost.

A final structure integrated into KPHP consists of the following:

custom xgboost implementation
custom catboost implementation
.kml files reader
buffers and kml models storage related to master-worker specifics
api to be called from PHP code

To use ML from PHP code, call any function from kphp_ml_interface.h (KPHP only).
In plain PHP, there are no polyfills, and they are not planned to be implemented.

About "ml_experiments" private vkcom repo

The code in the kphp_ml namespace is a final, production solution.

While development, we tested lots of various implementations (both for xgboost/catboost) in order to find an optimal one — they are located in the ml_experiments repository.

All in all, ml_experiments repo contains:

lots of C++ implementations of algorithms that behave exactly like xgboost/catboost
tooling for testing and benchmarking them
ML models to be tested and benchmarked (some of them are from real production)
python scripts for learning and converting models
a converter to .kml

Note, that some files exist both in KPHP and ml_experiments.
They are almost identical, besides include paths and input types (array vs unordered_map).
In the future development, they should be maintained synchronized.

Application-specific information in kml

When a learned model is exported to xgboost .model file or catboost .cbm file, it does not contain enough information to be evaluated.
Some information exists only at the moment of learning and thus must also be saved along with xgboost/catboost exported models.

For example, a prediction might need calibration (*MULT+BIAS or log) AFTER xgboost calculation.

For example, input [1234 => 0.98] (feature_id #1234) must be remapped before passing to xgboost, because this feature was #42 while training, but a valid input is #1234. Hence, [1234 => 42] exists in reindex map.

For example, some models were trained without zero values, and zeroes in input must be excluded.

Ideally, an input should always contain correct indexes and shouldn't contain zeroes it the last case, but in practice in VK.com, inputs are collected universally, and later applied to some model. That's why one and the same input is remapped by model1 in a way 1, and by model2 in its own way.

As a conclusion, training scripts must export not only xgboost/catboost models, but a .json file with additional properties also — for converting to .kml and evaluating. See KmlPropertiesInJsonFile in ml_experiments.

.kml files, on the contrary, already contain all additional information inside, because exporting to kml requires all that stuff.

InputKind

Ideally, backend code must collect input that should be passed to a model directly.
For example, if a model was trained with features #1...#100, an input could look like [ 70 => 1.0, 23 => 7.42, ... ].

But in practice and due to historical reasons, vkcom backend collects input in a different way, and it can't be passed directly. It needs some transformations. Available types of input and its transformation is enum InputKind:

ht_remap_str_keys_to_fvalue — [ 'user_city_99' => 1.0, 'user_topic_weights_17' => 7.42, ...], uses reindex_map
ht_remap_int_keys_to_fvalue — [ 12934 => 1.0, 8923 => 7.42, ... ], uses reindex_map
ht_direct_int_keys_to_fvalue — [ 70 => 1, 23 => 7.42, ... ], no keys reindex, pass directly
vectors_fvalue_and_catstr — [ 1.23, 4.56, ... ] and [ "red", "small" ]: floats and cat separately, pass directly
vectors_fvalue_and_catstr_multi — the same, but a model is a catboost multiclassificator (returns an array of predictions, not one)
ht_remap_str_keys_to_fvalue_or_catnum — [ 'emb_7' => 19.98, ..., 'user_os' => 2, ... ]: also in one ht, but categorials are numbers also
ht_remap_str_keys_to_fvalue_or_catnum_multi — the same, but multiclassificator (returns an array of predictions, not one)

KML inference speed compared to xgboost/catboost

Benchmarking shows, that a final KML predictor works 3–10 times faster compared to native xgboost.

This is explained by several reasons and optimizations:

compressed size of a tree node (8 bytes only)
coordinates remapping
better cache locality
input vectorization and avoiding ifs in code

Remember, that KPHP workers are single-threaded, that's why it's compared with xgboost working on a single thread, no GPU.

.kml files are much more lightweight than .model xgboost files, since nodes are compressed and all learning info is omitted. They can be loaded into memory very quickly, almost as POD bytes reading.

When it comes to catboost, KML implementation is almost identical to native. But .kml files containing catboost models are also smaller than original .cbm files.

KPHP-specific implementation restrictions

After PHP code is compiled to a server binary, it's launched as a pre-fork server.

The master process loads all .kml files from the folder provided as a cmd line option. Note, that storage of models (and data of every model itself) is read-only, that's why it's not copied to every process, and we are allowed to use std containers there.

After fork, when PHP script is executed by every worker, it executes prediction, providing an input (PHP array).

KPHP internals should be very careful of using std containers inside workers, since they allocate in heap, which generally is bad because of signals handling. That's why KML evaluation doesn't use heap at all, but when it needs memory for performing calculations, it uses pre-allocated mutable_buffer. That mutable buffer is allocated once at every worker process start up, its size is max(calculate_mutable_buffer_size(i)). Hence, it can fit any model calculation.

A disappointing fact is that KPHP array is quite slow compared to std::unordered_map, that's why a native C++ implementation is faster than a KPHP one when an algorithm needs to iterate over input hashtables.

Looking backward: a brief history of ML in VK.com

Historically, ML infrastructure in production was quite weird: ML models were tons of .php files with autogenerated PHP code of decision trees, like

function some_long_model_name_xxx_score_tree_3(array $x) {
  if ($x[1657] < 0.00926230289) {
    if ($x[1703] < 0.00839830097) {
      if ($x[1657] < 0.00389328809) {
        if ($x[1656] < 0.00147941126) {
          return -0.216939136;}
        return -0.215985224;}
  ...
}

Hundreds of .php files, with hundreds of functions within each, with lots of lines if else if else accessing input hashtables, sometimes transformed into vectors.

That autogenerated code was placed in a separate repository, compiled with KPHP -M lib, and linked into vkcom binary upon final compilation. The amount of models was so huge, that they took about 600 MB of 1.5 GB production binary. The speed of inference, nevertheless, was quite fast, especially when hashtables were transformed to vectors in advance.

Time passed, and we decided to rewrite ML infrastructure from scratch. The goal was to

Get rid of codegenerated PHP code at all.
Greatly speed up current production.
Support catboost and categorial features.

Obviously, there were two possible directions:

Import native xgboost and catboost libraries into KPHP runtime and write some transformers from PHP input to native calls; store .model and .cbm files which can be loaded and executed.
Write a custom ML prediction kernel that works exactly like native xgboost/catboost, but (if possible) much faster and much more lightweight; implement some .kml file format storing ML models.

As one may guess, we finally head the second way.

Looking forward: possible future enhancements

For now, provided solution it more than enough and solves all problems we face nowadays.
In the future, the following points might be considered as areas of investigation.

Support embedded and text features in catboost.
Support onnx kernel for neural networks (also a custom implementation, of course).
Use something more effective than std::unordered_map for reindex maps.
Implement a thread pool in KPHP and parallelize inputs; it's safe, since they are read only.

Previously, we've added a prediction kernel for xgboost and catboost, KML (see #983). It wasn't supported in runtime-light till the current pull request. It includes: * move the main inference logic into runtime-common dir * get rid of exceptions * gather required globals (mutable buffer, loaded model information) into context * add php_info() function to write non-warning logs in runtime-common * make KML functions and types depend on allocator Co-authored-by: Alexander Polyakov <al.polyakov@vk.team>

mkornaukhov self-assigned this Apr 18, 2024

mkornaukhov added enhancement New feature or request runtime Feature related to runtime labels Apr 18, 2024

mkornaukhov force-pushed the kphp_ml_implementation branch 2 times, most recently from ef08ea8 to 31d77a9 Compare April 22, 2024 14:53

tolk-vm changed the title ~~Support inference of KML models~~ KPHP ML implementation: a fast embedded tiny xgboost/catboost prediction kernel Apr 23, 2024

tolk-vm changed the title ~~KPHP ML implementation: a fast embedded tiny xgboost/catboost prediction kernel~~ KPHP ML implementation: a fast tiny xgboost/catboost prediction kernel Apr 23, 2024

tolk-vm previously approved these changes Apr 23, 2024

View reviewed changes

mkornaukhov dismissed tolk-vm’s stale review via 2e2fbd4 April 25, 2024 16:14

mkornaukhov force-pushed the kphp_ml_implementation branch 10 times, most recently from a1e5da5 to 3a58809 Compare May 2, 2024 12:21

tolk-vm reviewed May 2, 2024

View reviewed changes

Comment thread runtime/kphp_ml/kphp_ml_interface.cpp Outdated

Comment thread runtime/kphp_ml/kphp_ml_interface.cpp Outdated

DrDet reviewed May 3, 2024

View reviewed changes

Comment thread runtime/kphp_ml/kphp_ml_init.cpp Outdated

Comment thread builtin-functions/_functions.txt Outdated

mkornaukhov added this to the next milestone May 3, 2024

mkornaukhov closed this May 6, 2024

mkornaukhov force-pushed the kphp_ml_implementation branch from 3a58809 to 6f2f88d Compare May 6, 2024 08:22

mkornaukhov reopened this May 6, 2024

mkornaukhov and others added 5 commits May 6, 2024 11:31

Copy sources from private ml_experiments repo

bf4db4a

Update sources from ml_experiments, now they are compilable

7643e59

Integrate kml into kphp runtime and create php wrappers

cc059d8

Add tests

799949c

Add detailed documentation to kphp_ml.h (the same as PR description)

9b37564

mkornaukhov force-pushed the kphp_ml_implementation branch from b9def45 to 9b37564 Compare May 6, 2024 09:00

tolk-vm approved these changes May 6, 2024

View reviewed changes

mkornaukhov merged commit 8a04ac2 into master May 6, 2024

mkornaukhov deleted the kphp_ml_implementation branch May 6, 2024 14:25

mkornaukhov mentioned this pull request May 8, 2024

Support of ML runtime using KML-format #956

Closed

mkornaukhov mentioned this pull request Sep 19, 2025

[k2] add KML support #1411

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KPHP ML implementation: a fast tiny xgboost/catboost prediction kernel#983

KPHP ML implementation: a fast tiny xgboost/catboost prediction kernel#983
mkornaukhov merged 5 commits into
masterfrom
kphp_ml_implementation

mkornaukhov commented Apr 18, 2024 •

edited by tolk-vm

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mkornaukhov commented Apr 18, 2024 • edited by tolk-vm Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

About .kml files and kphp_ml in general

About "ml_experiments" private vkcom repo

Application-specific information in kml

InputKind

KML inference speed compared to xgboost/catboost

KPHP-specific implementation restrictions

Looking backward: a brief history of ML in VK.com

Looking forward: possible future enhancements

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mkornaukhov commented Apr 18, 2024 •

edited by tolk-vm

Loading