Add `zero_division` option to the precision, recall, f1, fbeta. #2198

i-aki-y · 2023-11-02T11:07:18Z

What does this PR do?

I want to add zero_division option to the precision, recall, f1, fbeta metrics as well as the sklearn counterparts.
cf, https://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_score.html#sklearn-metrics-precision-score

The zero_division is important when we use samplewise metrics (multidim_average="samplewise") where some samples have no positive targets.

The following example shows that the preds1 and preds2 have the same f1-scores (0.3333).
But the pred2 perfectly matches the target while the pred1 does not.
This means that we could not distinguish the two models: The model can correctly predict no positive sample from the model that returns many false positives.
We can fix this by setting the zero_division=1.
For this example, the preds2 would become (f1=1.0) while the preds1 is still (f1=0.3333).

import torch
from torchmetrics.functional.classification import f1_score

targets = torch.tensor([
    [0, 0, 0, 0],  # sample1
    [0, 0, 0, 0],  # sample2
    [0, 0, 1, 1],  # sample3
])

preds1 = torch.tensor([
    [0, 1, 1, 1],
    [1, 0, 1, 1],
    [0, 0, 1, 1],
])

preds2 = torch.tensor([
    [0, 0, 0, 0],
    [0, 0, 0, 0],
    [0, 0, 1, 1],
])

## default behavior
scores1 = f1_score(preds1, targets, task="binary", multidim_average="samplewise")
print(scores1, scores1.mean())
#=> tensor([0., 0., 1.]) tensor(0.3333)

scores2 = f1_score(preds2, targets, task="binary", multidim_average="samplewise")
print(scores2, scores2.mean())
#=> tensor([0., 0., 1.]) tensor(0.3333)

## If zero_division = 1
scores1 = f1_score(preds1, targets, task="binary", multidim_average="samplewise", zero_division=1)
print(scores1, scores1.mean())
#=> tensor([0., 0., 1.]) tensor(0.3333)

scores2 = f1_score(preds2, targets, task="binary", multidim_average="samplewise", zero_division=1)
print(scores2, scores2.mean())
#=> tensor([1., 1., 1.]) tensor(1.)

Note:
The latest sklearn (ver 1.3) has a bug in f1_score when zero_division=1.
scikit-learn/scikit-learn#27577

So, some test cases that compare the results with the sklearn will fail until the bug is fixed.

Before submitting

Was this discussed/agreed via a Github issue? (no need for typos and docs improvements)
Did you read the contributor guideline, Pull Request section?
Did you make sure to update the docs?
Did you write any new necessary tests?

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

📚 Documentation preview 📚: https://torchmetrics--2198.org.readthedocs.build/en/2198/

SkafteNicki · 2023-11-14T12:38:42Z

Hi @i-aki-y,
I am okay with this change but would I think it would make sense base StatScores class and then use Precision, Recall, FScore etc.

i-aki-y · 2023-11-17T09:24:19Z

@SkafteNicki Thank you for your comment.

I tried refactoring StatScores to have zero_division.
And removed __init__ and *_precision_recall_score_arg_validation methods, which I added, from the Precision and Recall.

I avoid declaring zero_division argument in StatScores' constructor because StatScores (and some other sub-classes) does not use zero_division.
So, the zero_division argument is passed through the kwargs and is poped before the supre().__init__(**kwargs) code.

ex.

class BinaryStatScores(_AbstractStatScores):
        ...
    def __init__(
        self,
        threshold: float = 0.5,
        multidim_average: Literal["global", "samplewise"] = "global",
        ignore_index: Optional[int] = None,
        validate_args: bool = True,
        **kwargs: Any,
    ) -> None:
        zero_division = kwargs.pop("zero_division", 0)
        super(_AbstractStatScores, self).__init__(**kwargs)
        if validate_args:
            _binary_stat_scores_arg_validation(threshold, multidim_average, ignore_index, zero_division)

Borda · 2023-12-18T15:43:09Z

@i-aki-y seems that the results are different then expected...

i-aki-y · 2023-12-19T06:01:32Z

@Borda Thanks

I found some bugs and fixed them.

Since the PR (scikit-learn/scikit-learn#27577) was merged, I confirmed that the fix passes related test cases (ex. pytest tests/unittests/classification) using the dev version of sklearn==1.4.dev0 and pytorch==2.0.1 in my local machine.

Borda · 2024-01-09T12:44:31Z

@i-aki-y could you pls have a look at all the failing cases, some with the wrong value...
turning it to draft till the tests are resolved, pls make it ready when test are mostly green :)

i-aki-y · 2024-01-10T00:56:40Z

@Borda OK, I put the [WIP] in this PR title.

As mentioned above, the current sklearn (1.3.2) has a bug that mishandles zero_division.
I think it will cause the CI errors.
Fortunately, the bugfix has been merged recently, so I expect the next sklearn's version (1.3.3 or 1.4?) will fix these problems.

for more information, see https://pre-commit.ci

robmarkcole · 2024-03-23T07:34:24Z

Appears this has gone cold? Keen to see support for zero_division elsewhere too, particularly JaccardIndex

lantiga · 2024-03-29T17:03:05Z

@SkafteNicki let's revive this

i-aki-y · 2024-04-02T08:29:31Z

@robmarkcole The jaccard index seems to be implemented by using _safe_devide in the _jaccard_index_reduce, so I think a similar fix is possible.

SkafteNicki · 2024-05-02T18:52:12Z

@robmarkcole and @i-aki-y I added support in jaccard index now.
I make sure to get the remaining test passing so we can finally land this PR either today or tomorrow and then have a release afterwards.

SkafteNicki · 2024-05-02T19:57:01Z

The PR has been failing for the sensitivity_at_specificity metric for python 3.8 for some time now. Reason being that a bugfix only present in scikit-learn>=1.3.0 for their implementation was needed to match our implementation but this PR makes the required version of scikit-learn<1.3 for python 3.8, so that was the reason those test were failing.
They are now being skipped for old sklearn versions because that is the only solution where we can land this PR.

…tning-AI#2198) * Add support of zero_division parameter * fix overlooked * Fix type error * Fix type error * Fix missing comma * Doc fix wrong math expression * Fixed StatScores to have zero_division * fix missing zero_division arg * fix device mismatch * use scikit-learn 1.4.0 * fix scikit-learn min ver * fix for new sklearn version * fix scikit-learn requirements * fix incorrect requirements condition * fix test code to pass in multiple sklearn versions * changelog * better docstring * add jaccardindex * fix tests * skip for old sklearn versions --------- Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Nicki Skafte <skaftenicki@gmail.com> Co-authored-by: Jirka <jirka.borovec@seznam.cz>

i-aki-y added 2 commits November 2, 2023 16:52

Add support of zero_division parameter

462b276

fix overlooked

a79cea9

i-aki-y requested review from SkafteNicki, Borda, justusschock and stancld as code owners November 2, 2023 11:07

github-actions bot added the topic: Classif label Nov 2, 2023

i-aki-y and others added 5 commits November 2, 2023 21:30

Fix type error

b698f87

Fix type error

c6fbe6a

Fix missing comma

3fbd633

Doc fix wrong math expression

cf6d3f1

Merge branch 'master' into add-zerodivision-support

cc69671

Fixed StatScores to have zero_division

bea5bd2

Borda added 3 commits November 26, 2023 21:47

Merge branch 'master' into add-zerodivision-support

ef881c5

Merge branch 'master' into add-zerodivision-support

ab8081a

Merge branch 'master' into add-zerodivision-support

9d943a6

i-aki-y added 2 commits December 19, 2023 12:48

fix missing zero_division arg

cec6adc

fix device mismatch

f68ec72

Borda added 3 commits December 19, 2023 09:16

Merge branch 'master' into add-zerodivision-support

306ebbc

Merge branch 'master' into add-zerodivision-support

737443d

Merge branch 'master' into add-zerodivision-support

8b1f1ec

Borda marked this pull request as draft January 9, 2024 12:44

i-aki-y changed the title ~~Add zero_division option to the precision, recall, f1, fbeta.~~ [WIP] Add zero_division option to the precision, recall, f1, fbeta. Jan 10, 2024

Borda force-pushed the master branch from 04684b9 to 80a7b68 Compare January 11, 2024 12:55

[pre-commit.ci] auto fixes from pre-commit.com hooks

bb4641d

for more information, see https://pre-commit.ci

mergify bot removed the has conflicts label Feb 15, 2024

mergify bot added the has conflicts label Feb 26, 2024

Merge branch 'master' into add-zerodivision-support

d32ce82

mergify bot added has conflicts and removed has conflicts labels Feb 26, 2024

Borda force-pushed the master branch from 306bb3d to 4ed43e6 Compare March 14, 2024 12:36

Merge branch 'master' into add-zerodivision-support

fa20b74

mergify bot removed the has conflicts label Mar 16, 2024

Merge branch 'master' into add-zerodivision-support

36a82e3

Borda and others added 6 commits April 17, 2024 11:55

Merge branch 'master' into add-zerodivision-support

b738292

Merge branch 'master' into add-zerodivision-support

67ff903

Merge branch 'master' into add-zerodivision-support

0bbe57a

changelog

640b3fe

better docstring

48f2d1c

add jaccardindex

905cc7e

SkafteNicki added the Priority Critical task/issue label May 2, 2024

SkafteNicki added 2 commits May 2, 2024 21:14

fix tests

b82bc0e

skip for old sklearn versions

e26e748

SkafteNicki approved these changes May 2, 2024

View reviewed changes

Borda approved these changes May 2, 2024

View reviewed changes

mergify bot added the ready label May 2, 2024

SkafteNicki merged commit 335ebe6 into Lightning-AI:master May 3, 2024
70 checks passed

robmarkcole mentioned this pull request May 12, 2024

Clarify that nan is supported in zero_division #2535

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `zero_division` option to the precision, recall, f1, fbeta. #2198

Add `zero_division` option to the precision, recall, f1, fbeta. #2198

i-aki-y commented Nov 2, 2023 •

edited

SkafteNicki commented Nov 14, 2023

i-aki-y commented Nov 17, 2023

Borda commented Dec 18, 2023

i-aki-y commented Dec 19, 2023

Borda commented Jan 9, 2024

i-aki-y commented Jan 10, 2024

robmarkcole commented Mar 23, 2024

lantiga commented Mar 29, 2024

i-aki-y commented Apr 2, 2024

SkafteNicki commented May 2, 2024

SkafteNicki commented May 2, 2024

Add zero_division option to the precision, recall, f1, fbeta. #2198

Add zero_division option to the precision, recall, f1, fbeta. #2198

Conversation

i-aki-y commented Nov 2, 2023 • edited

What does this PR do?

Did you have fun?

SkafteNicki commented Nov 14, 2023

i-aki-y commented Nov 17, 2023

Borda commented Dec 18, 2023

i-aki-y commented Dec 19, 2023

Borda commented Jan 9, 2024

i-aki-y commented Jan 10, 2024

robmarkcole commented Mar 23, 2024

lantiga commented Mar 29, 2024

i-aki-y commented Apr 2, 2024

SkafteNicki commented May 2, 2024

SkafteNicki commented May 2, 2024

Add `zero_division` option to the precision, recall, f1, fbeta. #2198

Add `zero_division` option to the precision, recall, f1, fbeta. #2198

i-aki-y commented Nov 2, 2023 •

edited