-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add f1_binary and f1_macro #190
Conversation
The CI is failing due to docker pull rate limit because it uses an anonymous user to |
flake8 error:
|
|
Thanks, fixed. |
Sorry, another docker pull rate limit error. Is the PR ready other than flake8? I can start running integration tests if ready. |
I can't see the failed tests, and not 100% sure if there is any other issues besides the pull rate limit error. I could trigger another dummy commit tomorrow morning to confirm. |
@iyerr3 Could you have a look of this change? Thanks. |
if preds.size > 0: | ||
labels = dtrain.get_label() | ||
pred_labels = margin_to_class_label(preds) | ||
# this function is used only for AutoPilot and the least frequent label is already encoded as 1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's nothing that stops external customers from using this directly, so I wouldn't make this assumption.
Also I'm not sure what the implications of the 'least frequent label is already encoded as 1' is. Can you restate the assumption here to ensure behavior is clear.
Also do we need validation for the number of labels to be 2
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this assumption matter? For F1, we just care what's positive/negative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make f1 binary work, the only requirement is the label is encoded in {0, 1}. Taking least frequent label as 1 is only a behavior AutoPilot enforces. The customer could also use more frequent label as 1.
A question here is that are the labels already encoded in {0, 1}? If not, we need to have another label encoder to make f1 binary works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@huibinshen Note that you get the labels from margin_to_class_label
function, which always encodes in {0,1}
. So your code is fine. More importantly, XGBoost is a product itself. You shouldn't add comments about Autopilot's behavior here. Just remove the comment, which is the root cause of confusions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Haifeng, will remove the comment in the next revision.
@iyerr3 Do you have other comments? I hope the explanation above is clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for removing the comment.
My question on validation still stands: if customer uses f1_binary
with multi-class, then they should get a Customer Error. Is that validation happening somewhere that I missed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @iyerr3 , I added a customer error in the new revision. Let me know if this is not the right way.
@@ -42,6 +44,7 @@ | |||
|
|||
LOGISTIC_REGRESSION_LABEL_RANGE_ERROR = "label must be in [0,1] for logistic regression" | |||
MULTI_CLASS_LABEL_RANGE_ERROR = "label must be in [0, num_class)" | |||
MULTI_CLASS_F1_BINARY_ERROR = "Target is multiclass but average='binary'" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we throw this error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When user uses F1 binary score for multiclass dataset, f1_sccore(..., average="binary") will throw this error. This is because a mis-configuration, so it is now thrown as a CustomerError.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes including customer error look good to me. Integration tests all succeeded.
Change
f1
tof1macro
to make it more explicit on what it does.f1macro
should be used for multi-class classification datasets.For binary classification tasks, use
f1
(with the average option 'binary').By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.