Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-implement PR-AUC. #7297

Merged
merged 5 commits into from Oct 26, 2021
Merged

Re-implement PR-AUC. #7297

merged 5 commits into from Oct 26, 2021

Conversation

trivialfis
Copy link
Member

@trivialfis trivialfis commented Oct 8, 2021

The new implementation supports binary/multi-class classification and learning to rank to binary relevance. Also, it handles empty datasets and irregular datasets by returning NaN instead of making an exception. Lastly, the GPU implementation has the same functionality as the CPU one instead of being ranking only.

  • Perf

n_samples = 1e7
runs = 16
n_classes = 8 (multi-class only)

Master AUCPR
CPU ROC-AUC Binary 8.385259628295898 8.928396701812744
CPU PR-AUC Binary 9.372312784194946 9.454226732254028
GPU ROC-AUC Binary 0.7207620143890381 0.711883544921875
GPU PR-AUC Binary NA 0.8523569107055664
CPU ROC-AUC Multi 64.72866940498352 66.78449654579163
CPU PR-AUC Multi NA 67.58437657356262
GPU ROC-AUC Multi 8.162241697311401 8.218487024307251
GPU PR-AUC Multi NA 8.903043985366821
  • Related

Close #6561 .
Close #6272 .
Close #6551 .
Close #6692 .

@trivialfis trivialfis changed the title [breaking] Re-implement PR-AUC. Re-implement PR-AUC. Oct 9, 2021
@trivialfis
Copy link
Member Author

LTR support is added. No breaking change.

@codecov-commenter
Copy link

codecov-commenter commented Oct 9, 2021

Codecov Report

Merging #7297 (202fe96) into master (69d3b1b) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #7297   +/-   ##
=======================================
  Coverage   83.62%   83.62%           
=======================================
  Files          13       13           
  Lines        3884     3884           
=======================================
  Hits         3248     3248           
  Misses        636      636           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 69d3b1b...202fe96. Read the comment docs.

@trivialfis trivialfis marked this pull request as ready for review October 9, 2021 19:51
Copy link
Member

@RAMitchell RAMitchell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I correct that you are trying to extract the scan and reduce phase of the AUC computation, but pass lambda functions to achieve the different variations in AUC (ROC-AUC vs PR-AUC, multiclass and ranking)?

This PR probably needs some performance testing on CPU and GPU.

if (!cache) {
cache.reset(new DeviceAUCCache);
}
cache->Init(predts, is_multi, device);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to use its constructor?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

init can handle a changed input matrix.

@trivialfis
Copy link
Member Author

Am I correct that you are trying to extract the scan and reduce phase of the AUC computation, but pass lambda functions to achieve the different variations in AUC (ROC-AUC vs PR-AUC, multiclass and ranking)?

Yes.

This PR probably needs some performance testing on CPU and GPU.

Will run some simple benchmarks.

@trivialfis
Copy link
Member Author

@RAMitchell I attached some benchmark results to the PR description.

Copy link
Member

@RAMitchell RAMitchell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmarks look good, no regression for existing code.

I think I'm happy to go ahead with this, but more testing is always helpful, if you can figure out how to do it for PR-AUC.

* Support binary/multi-class classification, ranking.
* Add documents.
* Handle missing data.
@trivialfis
Copy link
Member Author

Pushed a commit that prevents integer overflow inside cub for ROC-AUC.

@trivialfis trivialfis added this to 1.6 in 2.0 Roadmap Oct 21, 2021
@trivialfis trivialfis merged commit d434942 into dmlc:master Oct 26, 2021
@trivialfis trivialfis deleted the aucpr branch October 26, 2021 05:07
@trivialfis trivialfis moved this from 1.6 TO DO to 1.6 Done in 2.0 Roadmap Oct 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants