[Breaking] Change default evaluation metric for classification to logloss / mlogloss #6183

lorentzenchr · 2020-09-29T14:29:09Z

Closes #6070.
Change default evaluation metric for binary classification to "logloss", and for muli-class classification to "mlogloss".

hcho3 · 2020-09-29T15:12:22Z

This pull request does not yet add a warning for performing early stopping with a default metric. I cannot approve it as it is. Would you like to give it a try and try to add the warning?

hcho3 · 2020-09-29T15:14:59Z

Let me know if you need help.

lorentzenchr · 2020-09-29T15:42:13Z

@hcho3 Thanks for the super fast feedback. Yes, I think I'll need some help.
First of all, how can I make CI happy?
I thought, I'll first do that and only when it's green, remove the [WIP] tag and add warnings if the default evaluation metric is not set explicitly.

hcho3 · 2020-09-29T16:49:43Z

@lorentzenchr You should update this line too:

xgboost/plugin/example/custom_obj.cc

Lines 58 to 60 in 444131a

    
           const char* DefaultEvalMetric() const override { 
        
             return "error"; 
        
           }

This should fix the error in the Google C++ test. (We use Google Test framework, so hence the name.)

Also, many unit tests for the R package have assertions that checks the default evaluation metrics, so they need to be updated as well. For example:

xgboost/R-package/tests/testthat/test_basic.R

Lines 18 to 21 in 444131a

    
           expect_output( 
        
             bst <- xgboost(data = train$data, label = train$label, max_depth = 2, 
        
                           eta = 1, nthread = 2, nrounds = nrounds, objective = "binary:logistic") 
        
           , "train-error")

There are many of such occurrences in other R unit tests.

hcho3 · 2020-09-30T01:49:50Z

@lorentzenchr

To make it easier, let's just throw a warning whenever no eval_metric is explicitly set, regardless of whether early stopping is enabled. This way, we only have to change a single place in the codebase, as follows:

xgboost/src/learner.cc

Lines 1033 to 1036 in dda9e1e

    
           if (metrics_.size() == 0 && tparam_.disable_default_eval_metric <= 0) { 
        
             metrics_.emplace_back(Metric::Create(obj_->DefaultEvalMetric(), &generic_parameters_)); 
        
             metrics_.back()->Configure({cfg_.begin(), cfg_.end()}); 
        
           }

If you think this is a good idea, I'll add a commit to this pull request.

EDIT. See my latest commit.

codecov-commenter · 2020-09-30T05:35:10Z

Codecov Report

Merging #6183 into master will decrease coverage by 0.09%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #6183      +/-   ##
==========================================
- Coverage   79.02%   78.93%   -0.10%     
==========================================
  Files          12       12              
  Lines        3104     3104              
==========================================
- Hits         2453     2450       -3     
- Misses        651      654       +3

Impacted Files	Coverage Δ
python-package/xgboost/tracker.py	`93.97% <0.00%> (-1.21%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 444131a...161885d. Read the comment docs.

lorentzenchr · 2020-09-30T06:02:46Z

@hcho3 Now, it is starting to look good. Thanks a lot for your help!

trivialfis · 2020-09-30T06:07:35Z

src/learner.cc

@@ -1031,6 +1031,18 @@ class LearnerImpl : public LearnerIO {
    std::ostringstream os;
    os << '[' << iter << ']' << std::setiosflags(std::ios::fixed);
    if (metrics_.size() == 0 && tparam_.disable_default_eval_metric <= 0) {
+      auto warn_default_eval_metric = [](const std::string& objective, const std::string& before,


Can we place this warning inside objective function DefaultEvalMetric? Also, are we sure this warning is only emitted once during training?

I don't think moving the warning to ObjFunction::DefaultEvalMetric() is a good idea, since then the warning code would appear in multiple places. I'd like to keep it here so that we can easily remove it later.

Also, are we sure this warning is only emitted once during training?

Yes. The warning is emitted just before the default evaluation metric gets added to the vector metrics_. Once the default metric is in metrics_, the warning will not be thrown.

hcho3 · 2020-09-30T21:49:33Z

@mayer79 Can you review?

hcho3 · 2020-10-01T02:52:45Z

R-package/tests/testthat/test_callbacks.R

@@ -236,7 +238,7 @@ test_that("early stopping xgb.train works", {
 test_that("early stopping using a specific metric works", {
  set.seed(11)
  expect_output(
-    bst <- xgb.train(param, dtrain, nrounds = 20, watchlist, eta = 0.6,
+    bst <- xgb.train(param[-2], dtrain, nrounds = 20, watchlist, eta = 0.6,


What is this for? @lorentzenchr

param[2] sets eval_metric. I exclude it here, because it is already specified by kwarg. Otherwise it throws a warnung and the test fails.

Should I add a comment or alternatively remove the kwarg?

Got it. Let's keep it there.

mayer79 · 2020-10-01T16:32:59Z

@hcho3 and @lorentzenchr : I had a look at the changes. They lgtm. I was wondering about the related objective
"binary:logitraw", which uses "auc" as default. Consistently, this should be "logloss" as well. What do you think?

hcho3 · 2020-10-01T17:05:50Z

I prefer to keep the default for binary:logitraw. Is AUC not a proper scoring metric?

lorentzenchr · 2020-10-01T17:14:27Z

I prefer to keep the default for binary:logitraw. Is AUC not a proper scoring metric?

Unfortunately, not at all. There is no property/functional of the target distribution that is consistently estimated my maximizing AUC (or nobody has found one yet or I’m not aware of 😏).

mayer79 · 2020-10-01T17:47:06Z

I prefer to keep the default for binary:logitraw. Is AUC not a proper scoring metric?

If I am not wrong, binary:logitraw trains the same model as binary:logistic, just without backtransforming the prediction to probability scale?

hcho3 · 2020-10-01T17:51:28Z

Yes, binary:logitraw produces models with logit output. Is logloss a proper scoring rule when output is logit?

hcho3 · 2020-10-01T20:25:57Z

FYI, I found this link that calls AUC a "semi-proper" scoring rule: https://stats.stackexchange.com/questions/339919/what-does-it-mean-that-auc-is-a-semi-proper-scoring-rule

lorentzenchr · 2020-10-02T12:11:00Z

@hcho3 Although I find it a bit inconsistent to have different default values of eval_metric for binary:logitraw and binary:logistic (and reg:logistic), most use cases are covered with this PR changing the default only for binary:logistic. Let's move forward?

hcho3 · 2020-10-02T19:06:33Z

@lorentzenchr Sure, we can always come back to binary:logitraw later.

Change DefaultEvalMetric of classification from error to logloss

e40a9eb

lorentzenchr force-pushed the logloss_default branch from f01af21 to e40a9eb Compare September 29, 2020 14:57

lorentzenchr added 5 commits September 29, 2020 19:05

Change default binary metric in plugin/example/custom_obj.cc

0c8a90b

Set old error metric in python tests

023adda

Set old error metric in R tests

864e23e

Fix missed eval metrics and typos in R tests

bd83eb5

Fix setting eval_metric twice in R tests

ee77ce7

hcho3 changed the title ~~[WIP] Change DefaultEvalMetric of binary classification from error to logloss~~ [WIP] Change default evaluation metric for classification to logloss / mlogloss Sep 30, 2020

Add warning for empty eval_metric for classification

c67c81c

hcho3 force-pushed the logloss_default branch from 6954595 to c67c81c Compare September 30, 2020 03:38

hcho3 changed the title ~~[WIP] Change default evaluation metric for classification to logloss / mlogloss~~ Change default evaluation metric for classification to logloss / mlogloss Sep 30, 2020

hcho3 changed the title ~~Change default evaluation metric for classification to logloss / mlogloss~~ [Breaking] Change default evaluation metric for classification to logloss / mlogloss Sep 30, 2020

Fix Dask tests

161885d

trivialfis reviewed Sep 30, 2020

View reviewed changes

hcho3 requested a review from RAMitchell September 30, 2020 18:53

hcho3 added the status: need review label Sep 30, 2020

hcho3 reviewed Oct 1, 2020

View reviewed changes

hcho3 merged commit cf4f019 into dmlc:master Oct 2, 2020

lorentzenchr deleted the logloss_default branch October 2, 2020 19:22

harupy mentioned this pull request Nov 30, 2020

Cross version tests for MLflow Models / autologging integrations mlflow/mlflow#3731

Merged

27 tasks

hcho3 mentioned this pull request Jan 19, 2021

Change default evaluation metric for binary:logitraw to logloss, to be consistent with binary:logistic #6618

Closed

hcho3 mentioned this pull request Jan 28, 2021

[Breaking] Change default evaluation metric for binary:logitraw objective to logloss #6647

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Breaking] Change default evaluation metric for classification to logloss / mlogloss #6183

[Breaking] Change default evaluation metric for classification to logloss / mlogloss #6183

lorentzenchr commented Sep 29, 2020

hcho3 commented Sep 29, 2020 •

edited

hcho3 commented Sep 29, 2020

lorentzenchr commented Sep 29, 2020

hcho3 commented Sep 29, 2020

hcho3 commented Sep 30, 2020 •

edited

codecov-commenter commented Sep 30, 2020 •

edited

lorentzenchr commented Sep 30, 2020

trivialfis Sep 30, 2020

hcho3 Sep 30, 2020 •

edited

hcho3 commented Sep 30, 2020

hcho3 Oct 1, 2020

lorentzenchr Oct 1, 2020

hcho3 Oct 1, 2020

mayer79 commented Oct 1, 2020

hcho3 commented Oct 1, 2020

lorentzenchr commented Oct 1, 2020 •

edited

mayer79 commented Oct 1, 2020

hcho3 commented Oct 1, 2020

hcho3 commented Oct 1, 2020

lorentzenchr commented Oct 2, 2020

hcho3 commented Oct 2, 2020

[Breaking] Change default evaluation metric for classification to logloss / mlogloss #6183

[Breaking] Change default evaluation metric for classification to logloss / mlogloss #6183

Conversation

lorentzenchr commented Sep 29, 2020

hcho3 commented Sep 29, 2020 • edited

hcho3 commented Sep 29, 2020

lorentzenchr commented Sep 29, 2020

hcho3 commented Sep 29, 2020

hcho3 commented Sep 30, 2020 • edited

codecov-commenter commented Sep 30, 2020 • edited

Codecov Report

lorentzenchr commented Sep 30, 2020

trivialfis Sep 30, 2020

Choose a reason for hiding this comment

hcho3 Sep 30, 2020 • edited

Choose a reason for hiding this comment

hcho3 commented Sep 30, 2020

hcho3 Oct 1, 2020

Choose a reason for hiding this comment

lorentzenchr Oct 1, 2020

Choose a reason for hiding this comment

hcho3 Oct 1, 2020

Choose a reason for hiding this comment

mayer79 commented Oct 1, 2020

hcho3 commented Oct 1, 2020

lorentzenchr commented Oct 1, 2020 • edited

mayer79 commented Oct 1, 2020

hcho3 commented Oct 1, 2020

hcho3 commented Oct 1, 2020

lorentzenchr commented Oct 2, 2020

hcho3 commented Oct 2, 2020

hcho3 commented Sep 29, 2020 •

edited

hcho3 commented Sep 30, 2020 •

edited

codecov-commenter commented Sep 30, 2020 •

edited

hcho3 Sep 30, 2020 •

edited

lorentzenchr commented Oct 1, 2020 •

edited