Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UndefinedMetricWarnings while running classification/main_train.py on SEN12MS #52

Open
suryagutta opened this issue Mar 21, 2021 · 3 comments
Projects

Comments

@suryagutta
Copy link
Collaborator

Getting the following warnings...Need to investigate and see if it's going to impact the results, if yes, we need to fix it.

/home/taeil/anaconda3/envs/hptest/lib/python3.7/site-packages/sklearn/metrics/_classification.py:1493: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no true nor predicted samples. Use zero_division parameter to control this behavior.
average, "true nor predicted", 'F-score is', len(true_sum)
/home/taeil/anaconda3/envs/hptest/lib/python3.7/site-packages/sklearn/metrics/_classification.py:1493: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no true nor predicted samples. Use zero_division parameter to control this behavior.
average, "true nor predicted", 'F-score is', len(true_sum)
/home/taeil/anaconda3/envs/hptest/lib/python3.7/site-packages/sklearn/metrics/_classification.py:1245: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use zero_division parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/taeil/anaconda3/envs/hptest/lib/python3.7/site-packages/sklearn/metrics/_classification.py:1245: UndefinedMetricWarning: Recall is ill-defined and being set to 0.0 in labels with no true samples. Use zero_division parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
Validation microPrec: 0.540000 microF1: 0.540000 sampleF1: 0.540000 microF2: 0.540000 sampleF2: 0.540000

@suryagutta suryagutta added this to To do in sen12ms via automation Mar 21, 2021
@taeil taeil added this to the next milestone milestone Mar 21, 2021
@suryagutta
Copy link
Collaborator Author

suryagutta commented Mar 21, 2021

Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use zero_division parameter to control this behavior. Recall is ill-defined and being set to 0.0 in labels with no true samples. Use zero_division parameter to control this behavior.

This was done intentionally to give a warning based on the discussion: scikit-learn/scikit-learn#14876
The code is in https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/metrics/_classification.py
Corresponding pull request which got merged: scikit-learn/scikit-learn#14900

Summary from the sklearn/metrics/_classification.py code:
When ``true positive + false positive == 0``, precision is undefined. When ``true positive + false negative == 0``, recall is undefined. In such cases, by default the metric will be set to 0, as will f-score, and ``UndefinedMetricWarning`` will be raised. This behavior can be modified with ``zero_division``.
Code:
`

Divide, and on zero-division, set scores and/or warn according to zero_division:
precision = _prf_divide(tp_sum, pred_sum, 'precision',
                        'predicted', average, warn_for, zero_division)
recall = _prf_divide(tp_sum, true_sum, 'recall',
                     'true', average, warn_for, zero_division)`
warn for f-score only if zero_division is warn, it is in warn_for and BOTH prec and rec are ill-defined
if zero_division == "warn" and ("f-score",) == warn_for:
    if (pred_sum[true_sum == 0] == 0).any():
        _warn_prf(
            average, "true nor predicted", 'F-score is', len(true_sum)
        )

`
Basically, the default behavior is to set to zero and show a warning. If we want, we can hide the warning by using the flag. I think it's not required to change the behavior at present as it's a warning, and no need to hide it as we might miss some important information in the future if we hide it.

@taeil
Copy link
Collaborator

taeil commented Mar 21, 2021

Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use zero_division parameter to control this behavior. Recall is ill-defined and being set to 0.0 in labels with no true samples. Use zero_division parameter to control this behavior.

This was done intentionally to give a warning based on the discussion: scikit-learn/scikit-learn#14876
The code is in https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/metrics/_classification.py
Corresponding pull request which got merged: scikit-learn/scikit-learn#14900

Summary from the sklearn/metrics/_classification.py code:
''' Whentrue positive + false positive == 0, precision is undefined. When true positive + false negative == 0, recall is undefined. In such cases, by default the metric will be set to 0, as will f-score, and UndefinedMetricWarningwill be raised. This behavior can be modified withzero_division. '''

Code:
`

Divide, and on zero-division, set scores and/or warn according to

# zero_division:
precision = _prf_divide(tp_sum, pred_sum, 'precision',
                        'predicted', average, warn_for, zero_division)
recall = _prf_divide(tp_sum, true_sum, 'recall',
                     'true', average, warn_for, zero_division)

# warn for f-score only if zero_division is warn, it is in warn_for
# and BOTH prec and rec are ill-defined
if zero_division == "warn" and ("f-score",) == warn_for:
    if (pred_sum[true_sum == 0] == 0).any():
        _warn_prf(
            average, "true nor predicted", 'F-score is', len(true_sum)
        )

`
Basically, the default behavior is to set to zero and show a warning. If we want, we can hide the warning by using the flag. I think it's not required to change the behavior at present as it's a warning, and no need to hide it as we might miss some important information in the future if we hide it.

One idea is to remove the classes that do not have samples, not sure how complicated it is.

image

@suryagutta
Copy link
Collaborator Author

The code is in https://github.com/Berkeley-Data/SEN12MS/blob/master/classification/metrics.py which calls the sklearn.metrics functions for different metrics. Those functions can take zero_division parameter.
zero_division (Sets the value to return when there is a zero division): "warn", 0 or 1
default="warn" . If set to "warn", this acts as 0, but warnings are also raised.

@suryagutta suryagutta moved this from To do to In progress in sen12ms Mar 21, 2021
@taeil taeil moved this from In progress to On Hold in sen12ms Apr 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
sen12ms
  
On Hold
Development

No branches or pull requests

2 participants