KPHP ML implementation: a fast tiny xgboost/catboost prediction kernel#983
Merged
Conversation
ef08ea8 to
31d77a9
Compare
tolk-vm
previously approved these changes
Apr 23, 2024
a1e5da5 to
3a58809
Compare
tolk-vm
reviewed
May 2, 2024
DrDet
reviewed
May 3, 2024
3a58809 to
6f2f88d
Compare
b9def45 to
9b37564
Compare
tolk-vm
approved these changes
May 6, 2024
Merged
apolyakov
pushed a commit
that referenced
this pull request
Oct 15, 2025
Previously, we've added a prediction kernel for xgboost and catboost, KML (see #983). It wasn't supported in runtime-light till the current pull request. It includes: * move the main inference logic into runtime-common dir * get rid of exceptions * gather required globals (mutable buffer, loaded model information) into context * add php_info() function to write non-warning logs in runtime-common * make KML functions and types depend on allocator Co-authored-by: Alexander Polyakov <al.polyakov@vk.team>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
About .kml files and kphp_ml in general
KML means "KPHP ML", since it was invented for KPHP and VK.com.
KML unites xgboost and catboost (prediction only, not learning).
KML models are stored in files with .kml extension.
KML is several times faster compared to native xgboost and almost identical compared to native catboost.
A final structure integrated into KPHP consists of the following:
To use ML from PHP code, call any function from
kphp_ml_interface.h(KPHP only).In plain PHP, there are no polyfills, and they are not planned to be implemented.
About "ml_experiments" private vkcom repo
The code in the
kphp_mlnamespace is a final, production solution.While development, we tested lots of various implementations (both for xgboost/catboost) in order to find an optimal one — they are located in the
ml_experimentsrepository.All in all,
ml_experimentsrepo contains:Note, that some files exist both in KPHP and
ml_experiments.They are almost identical, besides include paths and input types (
arrayvsunordered_map).In the future development, they should be maintained synchronized.
Application-specific information in kml
When a learned model is exported to xgboost .model file or catboost .cbm file, it does not contain enough information to be evaluated.
Some information exists only at the moment of learning and thus must also be saved along with xgboost/catboost exported models.
For example, a prediction might need calibration (
*MULT+BIASorlog) AFTER xgboost calculation.For example, input
[1234 => 0.98](feature_id #1234) must be remapped before passing to xgboost, because this feature was #42 while training, but a valid input is #1234. Hence,[1234 => 42]exists in reindex map.For example, some models were trained without zero values, and zeroes in input must be excluded.
Ideally, an input should always contain correct indexes and shouldn't contain zeroes it the last case, but in practice in VK.com, inputs are collected universally, and later applied to some model. That's why one and the same input is remapped by model1 in a way 1, and by model2 in its own way.
As a conclusion, training scripts must export not only xgboost/catboost models, but a .json file with additional properties also — for converting to .kml and evaluating. See
KmlPropertiesInJsonFileinml_experiments..kml files, on the contrary, already contain all additional information inside, because exporting to kml requires all that stuff.
InputKind
Ideally, backend code must collect input that should be passed to a model directly.
For example, if a model was trained with features #1...#100, an input could look like
[ 70 => 1.0, 23 => 7.42, ... ].But in practice and due to historical reasons, vkcom backend collects input in a different way, and it can't be passed directly. It needs some transformations. Available types of input and its transformation is
enum InputKind:[ 'user_city_99' => 1.0, 'user_topic_weights_17' => 7.42, ...], uses reindex_map[ 12934 => 1.0, 8923 => 7.42, ... ], uses reindex_map[ 70 => 1, 23 => 7.42, ... ], no keys reindex, pass directly[ 1.23, 4.56, ... ]and[ "red", "small" ]: floats and cat separately, pass directly[ 'emb_7' => 19.98, ..., 'user_os' => 2, ... ]: also in one ht, but categorials are numbers alsoKML inference speed compared to xgboost/catboost
Benchmarking shows, that a final KML predictor works 3–10 times faster compared to native xgboost.
This is explained by several reasons and optimizations:
ifs in codeRemember, that KPHP workers are single-threaded, that's why it's compared with xgboost working on a single thread, no GPU.
.kml files are much more lightweight than .model xgboost files, since nodes are compressed and all learning info is omitted. They can be loaded into memory very quickly, almost as POD bytes reading.
When it comes to catboost, KML implementation is almost identical to native. But .kml files containing catboost models are also smaller than original .cbm files.
KPHP-specific implementation restrictions
After PHP code is compiled to a server binary, it's launched as a pre-fork server.
The master process loads all .kml files from the folder provided as a cmd line option. Note, that storage of models (and data of every model itself) is read-only, that's why it's not copied to every process, and we are allowed to use
stdcontainers there.After fork, when PHP script is executed by every worker, it executes prediction, providing an input (PHP
array).KPHP internals should be very careful of using std containers inside workers, since they allocate in heap, which generally is bad because of signals handling. That's why KML evaluation doesn't use heap at all, but when it needs memory for performing calculations, it uses pre-allocated
mutable_buffer. That mutable buffer is allocated once at every worker process start up, its size ismax(calculate_mutable_buffer_size(i)). Hence, it can fit any model calculation.A disappointing fact is that KPHP
arrayis quite slow compared tostd::unordered_map, that's why a native C++ implementation is faster than a KPHP one when an algorithm needs to iterate over input hashtables.Looking backward: a brief history of ML in VK.com
Historically, ML infrastructure in production was quite weird: ML models were tons of .php files with autogenerated PHP code of decision trees, like
Hundreds of .php files, with hundreds of functions within each, with lots of lines
if else if elseaccessing input hashtables, sometimes transformed into vectors.That autogenerated code was placed in a separate repository, compiled with KPHP
-M lib, and linked intovkcombinary upon final compilation. The amount of models was so huge, that they took about 600 MB of 1.5 GB production binary. The speed of inference, nevertheless, was quite fast, especially when hashtables were transformed to vectors in advance.Time passed, and we decided to rewrite ML infrastructure from scratch. The goal was to
Obviously, there were two possible directions:
As one may guess, we finally head the second way.
Looking forward: possible future enhancements
For now, provided solution it more than enough and solves all problems we face nowadays.
In the future, the following points might be considered as areas of investigation.
std::unordered_mapfor reindex maps.