[REVIEW] Test FIL probabilities with absolute error thresholds in python #3582

levsnv · 2021-03-04T01:19:19Z

Probabilities are limited between [0.0, 1.0]. Also, we generally care more about large probabilities which are O(1/n_classes).
The largest relative probability errors are usually caused by a small ground truth probability (e.g. 1e-3), as opposed to a large absolute error.
Hence, relative probability error is not the best metric. Absolute probability error is more relevant.
Moreover, absolute probability error is more stable, as relative errors have a long tail. When training or even inferring on many rows, the chance of getting a ground truth probability sized 1e-3 or 1e-4 grows. In some cases, there is no reasonable and reliable threshold. Last, if the number of predicted probabilities (clipped values) per input row grows, so does the long tail of relative probability errors, due to less undersampling. This unfairly compares binary classification with regression, and multiclass classification with binary classification.

The changes below are based on collecting absolute errors under --run_unit, --run_quality and --run_stress. These thresholds are violated at most a couple times per million samples, in most cases never.

levsnv · 2021-03-04T01:26:52Z

Below are the absolute error distributions collected in the process
https://gist.github.com/levsnv/9af496ca27778724191d6b7658eddfd7

JohnZed

I think this is a clear win for clarity of the testing guarantees. The old rel thresholds were hard to reason about. Here, we are clarifying that we will test proba to a 3e-7 threshold or better in all cases. So I like it ;) Thanks!

JohnZed · 2021-03-05T00:28:04Z

I will wait for @canonizer ’s comment before merging though in case he as additional thoughts

levsnv · 2021-03-05T05:20:30Z

Thanks John! Could you also review #2894 ? Just don't merge it yet, so that we have a proper documentation of where changes came from.

levsnv · 2021-03-05T22:04:39Z

rerun tests
./test/ml: symbol lookup error: ./test/ml: undefined symbol: _ZN5faiss3gpu20StandardGpuResources20setCudaMallocWarningEb

codecov-io · 2021-03-06T02:19:24Z

Codecov Report

Merging #3582 (4e61c31) into branch-0.19 (9fa6e17) will increase coverage by 45.58%.
The diff coverage is n/a.

@@               Coverage Diff                @@
##           branch-0.19    #3582       +/-   ##
================================================
+ Coverage        35.24%   80.83%   +45.58%     
================================================
  Files              378      227      -151     
  Lines            26910    17737     -9173     
================================================
+ Hits              9485    14337     +4852     
+ Misses           17425     3400    -14025

Flag	Coverage Δ
dask	`45.30% <ø> (+0.30%)`	⬆️
non-dask	`73.09% <ø> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
python/cuml/dask/solvers/cd.py	`100.00% <0.00%> (ø)`
python/cuml/common/numba_utils.py	`0.00% <0.00%> (ø)`
python/cuml/neighbors/__init__.py	`100.00% <0.00%> (ø)`
python/cuml/internals/global_settings.py	`100.00% <0.00%> (ø)`
python/cuml/dask/preprocessing/encoders.py	`100.00% <0.00%> (ø)`
cuml/dask/solvers/__init__.py
cuml/dask/naive_bayes/naive_bayes.py
cuml/datasets/__init__.py
cuml/experimental/explainer/__init__.py
cuml/dask/linear_model/linear_regression.py
... and 285 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9fa6e17...4e61c31. Read the comment docs.

dantegd · 2021-03-09T13:36:15Z

@gpucibot merge

switching to absolute tolerances

03f7f8c

levsnv requested a review from a team as a code owner March 4, 2021 01:19

github-actions bot added the Cython / Python Cython or Python issue label Mar 4, 2021

levsnv requested a review from JohnZed March 4, 2021 01:19

copyright year

0c429c5

levsnv added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Mar 4, 2021

levsnv requested a review from canonizer March 4, 2021 04:46

levsnv mentioned this pull request Mar 4, 2021

[FEA] Add predict_proba() to XGBoost-style models in FIL C++ #2894

Merged

levsnv added the 4 - Waiting on Reviewer Waiting for reviewer to review or respond label Mar 4, 2021

JohnZed approved these changes Mar 5, 2021

View reviewed changes

unify thresholds as per @canonizer's suggestion

4e61c31

canonizer approved these changes Mar 6, 2021

View reviewed changes

levsnv assigned JohnZed, levsnv and dantegd and unassigned JohnZed Mar 9, 2021

rapids-bot bot merged commit cd220fc into rapidsai:branch-0.19 Mar 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REVIEW] Test FIL probabilities with absolute error thresholds in python #3582

[REVIEW] Test FIL probabilities with absolute error thresholds in python #3582

levsnv commented Mar 4, 2021

levsnv commented Mar 4, 2021 •

edited

JohnZed left a comment

JohnZed commented Mar 5, 2021

levsnv commented Mar 5, 2021

levsnv commented Mar 5, 2021

codecov-io commented Mar 6, 2021

dantegd commented Mar 9, 2021

[REVIEW] Test FIL probabilities with absolute error thresholds in python #3582

[REVIEW] Test FIL probabilities with absolute error thresholds in python #3582

Conversation

levsnv commented Mar 4, 2021

levsnv commented Mar 4, 2021 • edited

JohnZed left a comment

Choose a reason for hiding this comment

JohnZed commented Mar 5, 2021

levsnv commented Mar 5, 2021

levsnv commented Mar 5, 2021

codecov-io commented Mar 6, 2021

Codecov Report

dantegd commented Mar 9, 2021

levsnv commented Mar 4, 2021 •

edited