Calculate pandas categorical hash once. #12
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Build the pandas_categorical index hash when loading the model, instead of doing it on every
predict().Doing it on
predict()significantly slows down the prediction, especially for larger categorical feature lists and wastes CPU cycles since this can be just calculated once.After change:
Number of categorical features: 3
Biggest category length: 11945
Predict for 10.000 times: 0.3876569999847561 seconds
Before change:
Number of categorical features: 3
Biggest category length: 11945
Predict for 10.000 times: 15.50964699999895 seconds
Benchmark code:
Added a large categorical feature in the test model with almost 12k entries.