New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Breaking] Change default evaluation metric for classification to logloss / mlogloss #6183
Conversation
f01af21
to
e40a9eb
Compare
This pull request does not yet add a warning for performing early stopping with a default metric. I cannot approve it as it is. Would you like to give it a try and try to add the warning? |
Let me know if you need help. |
@hcho3 Thanks for the super fast feedback. Yes, I think I'll need some help. |
@lorentzenchr You should update this line too: xgboost/plugin/example/custom_obj.cc Lines 58 to 60 in 444131a
This should fix the error in the Google C++ test. (We use Google Test framework, so hence the name.) Also, many unit tests for the R package have assertions that checks the default evaluation metrics, so they need to be updated as well. For example: xgboost/R-package/tests/testthat/test_basic.R Lines 18 to 21 in 444131a
There are many of such occurrences in other R unit tests. |
To make it easier, let's just throw a warning whenever no Lines 1033 to 1036 in dda9e1e
If you think this is a good idea, I'll add a commit to this pull request. EDIT. See my latest commit. |
6954595
to
c67c81c
Compare
Codecov Report
@@ Coverage Diff @@
## master #6183 +/- ##
==========================================
- Coverage 79.02% 78.93% -0.10%
==========================================
Files 12 12
Lines 3104 3104
==========================================
- Hits 2453 2450 -3
- Misses 651 654 +3
Continue to review full report at Codecov.
|
@hcho3 Now, it is starting to look good. Thanks a lot for your help! |
@@ -1031,6 +1031,18 @@ class LearnerImpl : public LearnerIO { | |||
std::ostringstream os; | |||
os << '[' << iter << ']' << std::setiosflags(std::ios::fixed); | |||
if (metrics_.size() == 0 && tparam_.disable_default_eval_metric <= 0) { | |||
auto warn_default_eval_metric = [](const std::string& objective, const std::string& before, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we place this warning inside objective function DefaultEvalMetric
? Also, are we sure this warning is only emitted once during training?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think moving the warning to ObjFunction::DefaultEvalMetric()
is a good idea, since then the warning code would appear in multiple places. I'd like to keep it here so that we can easily remove it later.
Also, are we sure this warning is only emitted once during training?
Yes. The warning is emitted just before the default evaluation metric gets added to the vector metrics_
. Once the default metric is in metrics_
, the warning will not be thrown.
@mayer79 Can you review? |
@@ -236,7 +238,7 @@ test_that("early stopping xgb.train works", { | |||
test_that("early stopping using a specific metric works", { | |||
set.seed(11) | |||
expect_output( | |||
bst <- xgb.train(param, dtrain, nrounds = 20, watchlist, eta = 0.6, | |||
bst <- xgb.train(param[-2], dtrain, nrounds = 20, watchlist, eta = 0.6, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this for? @lorentzenchr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
param[2] sets eval_metric. I exclude it here, because it is already specified by kwarg. Otherwise it throws a warnung and the test fails.
Should I add a comment or alternatively remove the kwarg?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Let's keep it there.
@hcho3 and @lorentzenchr : I had a look at the changes. They lgtm. I was wondering about the related objective |
I prefer to keep the default for binary:logitraw. Is AUC not a proper scoring metric? |
Unfortunately, not at all. There is no property/functional of the target distribution that is consistently estimated my maximizing AUC (or nobody has found one yet or I’m not aware of 😏). |
If I am not wrong, binary:logitraw trains the same model as binary:logistic, just without backtransforming the prediction to probability scale? |
Yes, binary:logitraw produces models with logit output. Is logloss a proper scoring rule when output is logit? |
FYI, I found this link that calls AUC a "semi-proper" scoring rule: https://stats.stackexchange.com/questions/339919/what-does-it-mean-that-auc-is-a-semi-proper-scoring-rule |
@hcho3 Although I find it a bit inconsistent to have different default values of |
@lorentzenchr Sure, we can always come back to |
Closes #6070.
Change default evaluation metric for binary classification to
"logloss"
, and for muli-class classification to "mlogloss
".