Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dask] Add DaskXGBRanker #6576

Merged
merged 6 commits into from Jan 8, 2021
Merged

[dask] Add DaskXGBRanker #6576

merged 6 commits into from Jan 8, 2021

Conversation

trivialfis
Copy link
Member

@trivialfis trivialfis commented Jan 6, 2021

Initial support for distributed LTR using dask.

  • Support qid in libxgboost.
  • Refactor predict and n_features_in_, best_[score/iteration/ntree_limit] to avoid duplicated code.
  • Define DaskXGBRanker.

The dask ranker doesn't support group structure, instead it uses query id and convert to group ptr internally.

@trivialfis
Copy link
Member Author

Question to myself: Right now the qid is required to be sorted for input. Maybe we need to perform the sorting ourselves?

@trivialfis trivialfis changed the title [WIP] [dask] Add ranker [dask] Add DaskXGBRanker Jan 7, 2021
@trivialfis trivialfis requested a review from hcho3 January 7, 2021 13:06
@trivialfis trivialfis marked this pull request as ready for review January 7, 2021 13:23
@codecov-io
Copy link

codecov-io commented Jan 7, 2021

Codecov Report

Merging #6576 (fb50247) into master (96d3d32) will increase coverage by 0.25%.
The diff coverage is 85.93%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #6576      +/-   ##
==========================================
+ Coverage   80.23%   80.49%   +0.25%     
==========================================
  Files          13       13              
  Lines        3613     3665      +52     
==========================================
+ Hits         2899     2950      +51     
- Misses        714      715       +1     
Impacted Files Coverage Δ
python-package/xgboost/training.py 95.32% <ø> (ø)
python-package/xgboost/sklearn.py 89.77% <84.44%> (+0.42%) ⬆️
python-package/xgboost/dask.py 81.52% <85.13%> (+0.13%) ⬆️
python-package/xgboost/core.py 81.44% <100.00%> (+0.15%) ⬆️
python-package/xgboost/tracker.py 95.11% <0.00%> (+1.12%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 96d3d32...6c52c79. Read the comment docs.

tests/python/testing.py Show resolved Hide resolved
@trivialfis
Copy link
Member Author

trivialfis commented Jan 8, 2021

I need to disable the support for group weight for now. The use of qid is to avoid having too many data manipulation code in Python. But per-group weight is kind of unavoidable.

@trivialfis trivialfis force-pushed the dask-ranker branch 2 times, most recently from 54d0795 to 0474666 Compare January 8, 2021 02:26
python-package/xgboost/dask.py Show resolved Hide resolved
python-package/xgboost/dask.py Show resolved Hide resolved
python-package/xgboost/dask.py Outdated Show resolved Hide resolved
python-package/xgboost/sklearn.py Show resolved Hide resolved
* Support `qid` in libxgboost.
* Refactor `predict` and `n_features_in_`, `best_[score/iteration/ntree_limit]`
  to avoid duplicated code.
* Define `DaskXGBRanker`.

The dask ranker doesn't support group structure, instead it uses query id and
convert to group ptr internally.
@trivialfis trivialfis merged commit 80065d5 into dmlc:master Jan 8, 2021
@trivialfis trivialfis deleted the dask-ranker branch January 8, 2021 10:35
@trivialfis trivialfis mentioned this pull request Jan 12, 2021
23 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants