@dbakshee dbakshee released this Dec 30, 2018 · 177 commits to master since this release

Assets 8

Changes:

  • Support saving models in ONNX format (only for models without categorical features).
  • Added new dataset to our catboost.datasets() -- dataset epsilon, a large dense dataset for binary classification.
  • Speedup of Python cv on GPU.
  • Fixed creation of Pool from pandas.DataFrame with pandas.Categorical columns.

@dbakshee dbakshee released this Dec 26, 2018 · 232 commits to master since this release

Assets 8

Release 0.12.0

Breaking changes:

  • Class weights are now taken into account by eval_metrics(), get_feature_importance(), and get_object_importance().
    In previous versions the weights were ignored.
  • Parameter random-strength for pairwise training (PairLogitPairwise, QueryCrossEntropy, YetiRankPairwise) is not supported anymore.
  • Simultaneous use of MultiClass and MultiClassOneVsAll metrics is now deprecated.

New functionality:

  • cv method is now supported on GPU.
  • String labels for classes are supported in Python.
    In multiclassification the string class names are inferred from the data.
    In binary classification for using string labels you should employ class_names parameter and specify which class is negative (0) and which is positive (1).
    You can also use class_names in multiclassification mode to pass all possible class names to the fit function.
  • Borders can now be saved and reused.
    To save the feature quantization information obtained during training data preprocessing into a text file use cli option --output-borders-file.
    To use the borders for training use cli option --input-borders-file.
    This functionanlity is now supported on CPU and GPU (it was GPU-only in previous versions).
    File format for the borders is described here.
  • CLI option --eval-file is now supported on GPU.

Quality improvement:

  • Some cases in binary classification are fixed where training could diverge

Optimizations:

  • A great speedup of the Python applier (10x)
  • Reduced memory consumption in Python cv function (times fold count)

Benchmarks and tutorials:

  • Added speed benchmarks for CPU and GPU on a variety of different datasets.
  • Added benchmarks of different ranking modes. In this tutorial we compare different ranking modes in CatBoost, XGBoost and LightGBM.
  • Added tutorial for applying model in Java.
  • Added benchmarks of SHAP values calculation for CatBoost, XGBoost and LightGBM.
    The benchmarks also contain explanation of complexity of this calculation in all the libraries.

We also made a list of stability improvements and stricter checks of input data and parameters.

And we are so grateful to our community members @canorbal and @neer201 for their contribution to this release. Thank you.

@kizill kizill released this Dec 10, 2018 · 522 commits to master since this release

Assets 8

Changes:

  • Pure GPU implementation of NDCG metric
  • Enabled LQ loss function
  • Fixed NDCG metric on CPU
  • Added model_sum mode to command line interface
  • Added SHAP values benchmark (#566)
  • fixed random_strength for Plain boosting (#448)
  • Enabled passing a test pool to caret training (#544)
  • Fixed a bug in exporting the model as python code (#556)
  • Fixed label mapper for multiclassification custom labels (#523)
  • Fixed hash type of categorical features (#558)
  • Fixed handling of cross-validation fold count options in python package (#568)

@kizill kizill released this Nov 13, 2018

Assets 8

Release 0.11.1

Changes:

  • Accelerated formula evaluation by ~15%
  • Improved model application interface
  • Improved compilation time for building GPU version
  • Better handling of stray commas in list arguments
  • Added a benchmark that employs Rossman Store Sales dataset to compare quality of GBDT packages
  • Added references to Catboost papers in R-package CITATION file (issue #488)
  • Fixed a build issue in compilation for GPU
  • Fixed a bug in model applicator
  • Fixed model conversion (issue #533)
  • Returned pre 0.11 behaviour for best_score_ and evals_result_ (issue #539)
  • Make valid .dist-info/RECORD in python wheel (issue #534)

@kizill kizill released this Nov 7, 2018

Assets 8

Changes:

  • Changed default border count for float feature binarization to 254 on CPU to achieve better quality
  • Fixed random seed to 0 by default
  • Support model with more than 254 feature borders or one hot values when doing predictions
  • Added model summation support in python: use catboost.sum_models() to sum models with provided weights.
  • Added json model tutorial json_model_tutorial.ipynb

@kizill kizill released this Oct 26, 2018 · 1081 commits to master since this release

Assets 8

Breaking changes:

In python 3 some functions returned dictionaries with keys of type bytes - particularly eval_metrics and get_best_score. These are fixed to have keys of type str.

Changes:

  • New metric NumErrors:greater_than=value
  • New metric and objective L_q:q=value
  • model.score(X, y) - can now work with Pool and labels from Pool

@kizill kizill released this Oct 11, 2018 · 1376 commits to master since this release

Assets 8

Changes:

  • Added EvalResult output after GPU catboost training

  • Supported prediction type option on GPU

  • Added get_evals_result() method and evals_result_ property to model in python wrapper to allow user access metric values

  • Supported string labels for GPU training in cmdline mode

  • Many improvements in JNI wrapper

  • Updated NDCG metric: speeded up and added NDCG with exponentiation in numerator as a new NDCG mode

  • CatBoost doesn't drop unused features from model after training

  • Write training finish time and catboost build info to model metadata

  • Fix automatic pairs generation for GPU PairLogitPairwise target

@kizill kizill released this Sep 20, 2018 · 1724 commits to master since this release

Assets 2

Main changes:

  • Fixed Python 3 support in catboost.FeaturesData
  • 40% speedup QuerySoftMax CPU training