Bump xgboost from 1.6.2 to 1.7.2 #562

dependabot · 2022-12-09T02:00:39Z

Bumps xgboost from 1.6.2 to 1.7.2.

Release notes

1.7.2 Patch Release

v1.7.2 (2022 Dec 8)

This is a patch release for bug fixes.

Work with newer thrust and libcudacxx (#8432)

Support null value in CUDA array interface namespace. (#8486)

Use getsockname instead of SO_DOMAIN on AIX. (#8437)

[pyspark] Make QDM optional based on a cuDF check (#8471)

[pyspark] sort qid for SparkRanker. (#8497)

[dask] Properly await async method client.wait_for_workers. (#8558)

[R] Fix CRAN test notes. (#8428)

[doc] Fix outdated document [skip ci]. (#8527)

[CI] Fix github action mismatched glibcxx. (#8551)

Artifacts

You can verify the downloaded packages by running this on your Unix shell:
echo "<hash> <artifact>" | shasum -a 256 --check
15be5a96e86c3c539112a2052a5be585ab9831119cd6bc3db7048f7e3d356bac  xgboost_r_gpu_linux_1.7.2.tar.gz
0dd38b08f04ab15298ec21c4c43b17c667d313eada09b5a4ac0d35f8d9ba15d7  xgboost_r_gpu_win64_1.7.2.tar.gz
1.7.1 Patch Release

v1.7.1 (2022 November 3)

This is a patch release to incorporate the following hotfix:

Add back xgboost.rabit for backwards compatibility (#8411)

Release 1.7.0 stable

Note. The source distribution of Python XGBoost 1.7.0 was defective (#8415). Since PyPI does not allow us to replace existing artifacts, we released 1.7.0.post0 version to upload the new source distribution. Everything in 1.7.0.post0 is identical to 1.7.0 otherwise.

v1.7.0 (2022 Oct 20)

We are excited to announce the feature packed XGBoost 1.7 release. The release note will walk through some of the major new features first, then make a summary for other improvements and language-binding-specific changes.

PySpark

XGBoost 1.7 features initial support for PySpark integration. The new interface is adapted from the existing PySpark XGBoost interface developed by databricks with additional features like QuantileDMatrix and the rapidsai plugin (GPU pipeline) support. The new Spark XGBoost Python estimators not only benefit from PySpark ml facilities for powerful distributed computing but also enjoy the rest of the Python ecosystem. Users can define a custom objective, callbacks, and metrics in Python and use them with this interface on distributed clusters. The support is labeled as experimental with more features to come in future releases. For a brief introduction please visit the tutorial on XGBoost's document page. (#8355, #8344, #8335, #8284, #8271, #8283, #8250, #8231, #8219, #8245, #8217, #8200, #8173, #8172, #8145, #8117, #8131, #8088, #8082, #8085, #8066, #8068, #8067, #8020, #8385)

Due to its initial support status, the new interface has some limitations; categorical features and multi-output models are not yet supported.

Development of categorical data support

More progress on the experimental support for categorical features. In 1.7, XGBoost can handle missing values in categorical features and features a new parameter max_cat_threshold, which limits the number of categories that can be used in the split evaluation. The parameter is enabled when the partitioning algorithm is used and helps prevent over-fitting. Also, the sklearn interface can now accept the feature_types parameter to use data types other than dataframe for categorical features. (#8280, #7821, #8285, #8080, #7948, #7858, #7853, #8212, #7957, #7937, #7934)

... (truncated)

Changelog

Sourced from xgboost's changelog.

XGBoost Change Log

This file records the changes in xgboost library in reverse chronological order.

v1.7.0 (2022 Oct 20)

We are excited to announce the feature packed XGBoost 1.7 release. The release note will walk through some of the major new features first, then make a summary for other improvements and language-binding-specific changes.

PySpark

XGBoost 1.7 features initial support for PySpark integration. The new interface is adapted from the existing PySpark XGBoost interface developed by databricks with additional features like QuantileDMatrix and the rapidsai plugin (GPU pipeline) support. The new Spark XGBoost Python estimators not only benefit from PySpark ml facilities for powerful distributed computing but also enjoy the rest of the Python ecosystem. Users can define a custom objective, callbacks, and metrics in Python and use them with this interface on distributed clusters. The support is labeled as experimental with more features to come in future releases. For a brief introduction please visit the tutorial on XGBoost's document page. (#8355, #8344, #8335, #8284, #8271, #8283, #8250, #8231, #8219, #8245, #8217, #8200, #8173, #8172, #8145, #8117, #8131, #8088, #8082, #8085, #8066, #8068, #8067, #8020, #8385)

Due to its initial support status, the new interface has some limitations; categorical features and multi-output models are not yet supported.

Development of categorical data support

More progress on the experimental support for categorical features. In 1.7, XGBoost can handle missing values in categorical features and features a new parameter max_cat_threshold, which limits the number of categories that can be used in the split evaluation. The parameter is enabled when the partitioning algorithm is used and helps prevent over-fitting. Also, the sklearn interface can now accept the feature_types parameter to use data types other than dataframe for categorical features. (#8280, #7821, #8285, #8080, #7948, #7858, #7853, #8212, #7957, #7937, #7934)

Experimental support for federated learning and new communication collective

An exciting addition to XGBoost is the experimental federated learning support. The federated learning is implemented with a gRPC federated server that aggregates allreduce calls, and federated clients that train on local data and use existing tree methods (approx, hist, gpu_hist). Currently, this only supports horizontal federated learning (samples are split across participants, and each participant has all the features and labels). Future plans include vertical federated learning (features split across participants), and stronger privacy guarantees with homomorphic encryption and differential privacy. See Demo with NVFlare integration for example usage with nvflare.

As part of the work, XGBoost 1.7 has replaced the old rabit module with the new collective module as the network communication interface with added support for runtime backend selection. In previous versions, the backend is defined at compile time and can not be changed once built. In this new release, users can choose between rabit and federated. (#8029, #8351, #8350, #8342, #8340, #8325, #8279, #8181, #8027, #7958, #7831, #7879, #8257, #8316, #8242, #8057, #8203, #8038, #7965, #7930, #7911)

The feature is available in the public PyPI binary package for testing.

Quantile DMatrix

Before 1.7, XGBoost has an internal data structure called DeviceQuantileDMatrix (and its distributed version). We now extend its support to CPU and renamed it to QuantileDMatrix. This data structure is used for optimizing memory usage for the hist and gpu_hist tree methods. The new feature helps reduce CPU memory usage significantly, especially for dense data. The new QuantileDMatrix can be initialized from both CPU and GPU data, and regardless of where the data comes from, the constructed instance can be used by both the CPU algorithm and GPU algorithm including training and prediction (with some overhead of conversion if the device of data and training algorithm doesn't match). Also, a new parameter ref is added to QuantileDMatrix, which can be used to construct validation/test datasets. Lastly, it's set as default in the scikit-learn interface when a supported tree method is specified by users. (#7889, #7923, #8136, #8215, #8284, #8268, #8220, #8346, #8327, #8130, #8116, #8103, #8094, #8086, #7898, #8060, #8019, #8045, #7901, #7912, #7922)

Mean absolute error

The mean absolute error is a new member of the collection of objectives in XGBoost. It's noteworthy since MAE has zero hessian value, which is unusual to XGBoost as XGBoost relies on Newton optimization. Without valid Hessian values, the convergence speed can be slow. As part of the support for MAE, we added line searches into the XGBoost training algorithm to overcome the difficulty of training without valid Hessian values. In the future, we will extend the line search to other objectives where it's appropriate for faster convergence speed. (#8343, #8107, #7812, #8380)

XGBoost on Browser

With the help of the pyodide project, you can now run XGBoost on browsers. (#7954, #8369)

Experimental IPv6 Support for Dask

With the growing adaption of the new internet protocol, XGBoost joined the club. In the latest release, the Dask interface can be used on IPv6 clusters, see XGBoost's Dask tutorial for details. (#8225, #8234)

Optimizations

We have new optimizations for both the hist and gpu_hist tree methods to make XGBoost's training even more efficient.

Hist Hist now supports optional by-column histogram build, which is automatically configured based on various conditions of input data. This helps the XGBoost CPU hist algorithm to scale better with different shapes of training datasets. (#8233, #8259). Also, the build histogram kernel now can better utilize CPU registers (#8218)

GPU Hist GPU hist performance is significantly improved for wide datasets. GPU hist now supports batched node build, which reduces kernel latency and increases throughput. The improvement is particularly significant when growing deep trees with the default depthwise policy. (#7919, #8073, #8051, #8118, #7867, #7964, #8026)

Breaking Changes

... (truncated)

Commits

62ed8b5 Bump release version to 1.7.2. (#8569)
a980e10 Properly await async method client.wait_for_workers (#8558) (#8567)
59c54e3 [pyspark] Make QDM optional based on cuDF check (#8471) (#8556)
60a8c8e [pyspark] sort qid for SparkRanker (#8497) (#8555)
58bc225 [backport] [CI] Fix github action mismatched glibcxx. (#8551) (#8552)
850b531 [backport] [doc] Fix outdated document [skip ci] (#8527) (#8553)
67b657d SO_DOMAIN do not support on IBM i, using getsockname instead (#8437) (#8500)
db14e3f Support null value in CUDA array interface. (#8486) (#8499)
9372370 Work with newer thrust and libcudacxx (#8432)
1136a7e Fix CRAN note on cleanup. (#8447)
Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR
@dependabot recreate will recreate this PR, overwriting any edits that have been made to it
@dependabot merge will merge this PR after your CI passes on it
@dependabot squash and merge will squash and merge this PR after your CI passes on it
@dependabot cancel merge will cancel a previously requested merge and block automerging
@dependabot reopen will reopen this PR if it is closed
@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [xgboost](https://github.com/dmlc/xgboost) from 1.6.2 to 1.7.2. - [Release notes](https://github.com/dmlc/xgboost/releases) - [Changelog](https://github.com/dmlc/xgboost/blob/master/NEWS.md) - [Commits](dmlc/xgboost@v1.6.2...v1.7.2) --- updated-dependencies: - dependency-name: xgboost dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>

dependabot · 2023-01-09T02:01:32Z

Superseded by #567.

dependabot bot added the dependencies Pull requests that update a dependency file label Dec 9, 2022

dependabot bot mentioned this pull request Dec 9, 2022

Bump xgboost from 1.6.2 to 1.7.1 #558

Closed

dependabot bot closed this Jan 9, 2023

dependabot bot deleted the dependabot/pip/xgboost-1.7.2 branch January 9, 2023 02:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bump xgboost from 1.6.2 to 1.7.2 #562

Bump xgboost from 1.6.2 to 1.7.2 #562

dependabot bot commented on behalf of github Dec 9, 2022

dependabot bot commented on behalf of github Jan 9, 2023

Bump xgboost from 1.6.2 to 1.7.2 #562

Bump xgboost from 1.6.2 to 1.7.2 #562

Conversation

dependabot bot commented on behalf of github Dec 9, 2022

1.7.2 Patch Release

v1.7.2 (2022 Dec 8)

Artifacts

1.7.1 Patch Release

v1.7.1 (2022 November 3)

Release 1.7.0 stable

v1.7.0 (2022 Oct 20)

PySpark

Development of categorical data support

XGBoost Change Log

v1.7.0 (2022 Oct 20)

PySpark

Development of categorical data support

Experimental support for federated learning and new communication collective

Quantile DMatrix

Mean absolute error

XGBoost on Browser

Experimental IPv6 Support for Dask

Optimizations

Breaking Changes

dependabot bot commented on behalf of github Jan 9, 2023