Be notified of new releases
Create your free GitHub account today to subscribe to this repository for new releases and build software alongside 28 million developers.Sign up
- Fixed loading of
epsilondataset into memory
- Fixed multiclass learning on GPU for >255 classes
- Improved error handling
- Some other minor fixes
- Fixed Python compatibility issue in dataset downloading
- Support saving models in ONNX format (only for models without categorical features).
- Added new dataset to our
catboost.datasets()-- dataset epsilon, a large dense dataset for binary classification.
- Speedup of Python
- Fixed creation of
- Class weights are now taken into account by
In previous versions the weights were ignored.
random-strengthfor pairwise training (
YetiRankPairwise) is not supported anymore.
- Simultaneous use of
MultiClassOneVsAllmetrics is now deprecated.
cvmethod is now supported on GPU.
- String labels for classes are supported in Python.
In multiclassification the string class names are inferred from the data.
In binary classification for using string labels you should employ
class_namesparameter and specify which class is negative (0) and which is positive (1).
You can also use
class_namesin multiclassification mode to pass all possible class names to the fit function.
- Borders can now be saved and reused.
To save the feature quantization information obtained during training data preprocessing into a text file use cli option
To use the borders for training use cli option
This functionanlity is now supported on CPU and GPU (it was GPU-only in previous versions).
File format for the borders is described here.
- CLI option
--eval-fileis now supported on GPU.
- Some cases in binary classification are fixed where training could diverge
- A great speedup of the Python applier (10x)
- Reduced memory consumption in Python
cvfunction (times fold count)
Benchmarks and tutorials:
- Added speed benchmarks for CPU and GPU on a variety of different datasets.
- Added benchmarks of different ranking modes. In this tutorial we compare different ranking modes in CatBoost, XGBoost and LightGBM.
- Added tutorial for applying model in Java.
- Added benchmarks of SHAP values calculation for CatBoost, XGBoost and LightGBM.
The benchmarks also contain explanation of complexity of this calculation in all the libraries.
We also made a list of stability improvements and stricter checks of input data and parameters.
- Pure GPU implementation of NDCG metric
- Enabled LQ loss function
- Fixed NDCG metric on CPU
model_summode to command line interface
- Added SHAP values benchmark (#566)
- Enabled passing a test pool to caret training (#544)
- Fixed a bug in exporting the model as python code (#556)
- Fixed label mapper for multiclassification custom labels (#523)
- Fixed hash type of categorical features (#558)
- Fixed handling of cross-validation fold count options in python package (#568)
- Accelerated formula evaluation by ~15%
- Improved model application interface
- Improved compilation time for building GPU version
- Better handling of stray commas in list arguments
- Added a benchmark that employs Rossman Store Sales dataset to compare quality of GBDT packages
- Added references to Catboost papers in R-package CITATION file (issue #488)
- Fixed a build issue in compilation for GPU
- Fixed a bug in model applicator
- Fixed model conversion (issue #533)
- Returned pre 0.11 behaviour for
- Make valid
.dist-info/RECORDin python wheel (issue #534)
- Changed default border count for float feature binarization to 254 on CPU to achieve better quality
- Fixed random seed to
- Support model with more than 254 feature borders or one hot values when doing predictions
- Added model summation support in python: use
catboost.sum_models()to sum models with provided weights.
- Added json model tutorial json_model_tutorial.ipynb
In python 3 some functions returned dictionaries with keys of type
bytes - particularly eval_metrics and get_best_score. These are fixed to have keys of type
- New metric NumErrors:greater_than=value
- New metric and objective L_q:q=value
- model.score(X, y) - can now work with Pool and labels from Pool
Added EvalResult output after GPU catboost training
Supported prediction type option on GPU
evals_result_property to model in python wrapper to allow user access metric values
Supported string labels for GPU training in cmdline mode
Many improvements in JNI wrapper
Updated NDCG metric: speeded up and added NDCG with exponentiation in numerator as a new NDCG mode
CatBoost doesn't drop unused features from model after training
Write training finish time and catboost build info to model metadata
Fix automatic pairs generation for GPU PairLogitPairwise target