Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use customized evaluation function but still evaluate on rmse #3598

Closed
Yiyiyimu opened this issue Aug 16, 2018 · 13 comments
Closed

Use customized evaluation function but still evaluate on rmse #3598

Yiyiyimu opened this issue Aug 16, 2018 · 13 comments

Comments

@Yiyiyimu
Copy link

Hi,

First of all, thank you so much for making xgboost available, it is so great!

The problem is, I try to use customized evaluation function in xgboost, but there is a built-in function listed at the front of the customized one. I also tried the the sample function in custom_objective.py, the result is the same, extra rmse would be at the front, so I'm not sure where is wrong.

I'm still fresh to this, so I only can find in training.py, the output of msg = bst_eval_set.decode() is already contains the extra rmse. Maybe that would be of help.

The code is

import xgboost as xgb

def Prec(preds,dtrain): 
    labels=dtrain.get_label() 
    preds=1.0 / (1.0 + np.exp(-preds))
    return 'MaxPrec', (precision_recall_curve(labels,preds, pos_label=1))[0][0]

dtrain=xgb.DMatrix(X_train,label=y_green_train)
dtest=xgb.DMatrix(X_test,label=y_green_test)
param = {'max_depth': 2, 'eta': 1, 'silent': 1}
num_round = 5
watchlist  = [(dtrain,'train'), (dtest,'test')]
bst = xgb.train(param, dtrain, num_round, watchlist,feval=Prec)

But the result is

[0]	train-rmse:0.162775	test-rmse:0.151532	train-MaxPrec:0.046266	test-MaxPrec:0.02681
[1]	train-rmse:0.142818	test-rmse:0.150354	train-MaxPrec:0.046266	test-MaxPrec:0.02681
[2]	train-rmse:0.124537	test-rmse:0.150097	train-MaxPrec:0.046266	test-MaxPrec:0.02681
[3]	train-rmse:0.110327	test-rmse:0.15799	train-MaxPrec:0.046266	test-MaxPrec:0.02681
[4]	train-rmse:0.101278	test-rmse:0.158164	train-MaxPrec:0.046266	test-MaxPrec:0.02681

Besides, this is a bug I think, that if there is a ':' in the return of customized function,
like return 'MaxPrec', (precision_recall_curve(labels,preds, pos_label=1))[0][0], it would report

D:\Anaconda\lib\site-packages\xgboost\training.py in _train_internal(params, dtrain, num_boost_round, evals, obj, feval, xgb_model, callbacks)
     88                 msg = bst_eval_set.decode()
     89             res = [x.split(':') for x in msg.split()]
---> 90             evaluation_result_list = [(k, float(v)) for k, v in res[1:]]
     91         try:
     92             for cb in callbacks_after_iter:
too many values to unpack (expected 2)

which I think the code thinks there should be a number behind each ':', but there is a default colon so it is unnecessary. Maybe you should mark that when introducing custom evaluation function.

Thank you for your help!

Working environment:
Windows 7_64
python 3.6.3
conda 4.5.9
xgboost 0.80

@hcho3
Copy link
Collaborator

hcho3 commented Aug 16, 2018

The problem is, I try to use customized evaluation function in xgboost, but there is a built-in function listed at the front of the customized one

If you don't specify the parameter eval_metric, XGBoost will print RMSE by default. (The default metric depends on the task. For classification, the default metric is error.) This is the case even when the custom loss function is supplied.

And yes, currently, you cannot have colon inside the name of the custom evaluation function. We need to document this fact.

@Yiyiyimu
Copy link
Author

@hcho3
Sorry to disturb again. But is there any way to set eval_metric to None? Because right now it is evaluated by RMSE.

@hcho3
Copy link
Collaborator

hcho3 commented Aug 16, 2018

@Yiyiyimu We'll have to modify C++ codebase to disable default evaluation metric. In particular, we need to modify these lines

xgboost/src/learner.cc

Lines 406 to 408 in 7c82dc9

if (metrics_.size() == 0) {
metrics_.emplace_back(Metric::Create(obj_->DefaultEvalMetric()));
}

which adds a default metric if eval_metric is not specified. Is there a need for specifically disabling default metric?

@Yiyiyimu
Copy link
Author

Yiyiyimu commented Aug 17, 2018

@hcho3
Yes I think it's better to disable default metric when either eval_metric or feval is specified.

Besides, I didn't find other people meeting with this problem, is it because of the newest version? And could you tell me what can I do right now.

@hcho3
Copy link
Collaborator

hcho3 commented Aug 17, 2018

This behavior is consistent with previous versions. For now, you can comment out the quoted lines.

Yes I think it's better to disable default metric when either eval_metric or feval is specified.

This takes some work, since C++ code currently has no way of knowing whether feval is specified or not.

@hcho3
Copy link
Collaborator

hcho3 commented Aug 17, 2018

@Yiyiyimu A work-around is to add new option eval_metric=None to explicitly suppress all evaluation metrics, including the default. Let me work on it.

@hcho3
Copy link
Collaborator

hcho3 commented Aug 17, 2018

@Yiyiyimu In #3606, I added a parameter disable_default_eval_metric with which you can turn off the default metric.

@Yiyiyimu
Copy link
Author

@hcho3
Thank you so much for your help!! But there is still some problems.

I cloned and installed the newest version, but it's seems not working. The code and results are listed below.

dtrain=xgb.DMatrix(X_train,label=y_green_train)
dtest=xgb.DMatrix(X_test,label=y_green_test)
param = {'max_depth': 3, 'eta': 0.1, 'silent': 1,'min_child_weight':1,'disable_default_eval_metric':1, 
        'subsample':0.8,'colsample_bytree':0.6,'gamma':0.2,'alpha': 0, 'lambda':0.01}
num_round = 5
watchlist  = [(dtrain,'train'), (dtest,'test')]
bst = xgb.train(param, dtrain, num_round, watchlist,feval=Prec)
-----------------
[0]	train-rmse:0.454391	test-rmse:0.454253	train-MaxPrec:0.046266	test-MaxPrec:0.0271
[1]	train-rmse:0.413867	test-rmse:0.413755	train-MaxPrec:0.046296	test-MaxPrec:0.02681
[2]	train-rmse:0.377606	test-rmse:0.377239	train-MaxPrec:0.046296	test-MaxPrec:0.02681
[3]	train-rmse:0.344661	test-rmse:0.344605	train-MaxPrec:0.046296	test-MaxPrec:0.02681
[4]	train-rmse:0.316434	test-rmse:0.316334	train-MaxPrec:0.046388	test-MaxPrec:0.02681

I still can't find which place is wrong, and sorry to disturb again.

@hcho3
Copy link
Collaborator

hcho3 commented Aug 20, 2018

@Yiyiyimu Did you re-compile XGBoost? Run make -j again

@Yiyiyimu
Copy link
Author

@hcho3 Sorry, I'm not sure what does make -j and re-compile means.
What I did is, in anaconda prompt, pip uninstall xgboost and python setup.py install, and re-run the jupyter notebook, and it didn't work out.

@hcho3
Copy link
Collaborator

hcho3 commented Aug 20, 2018

@Yiyiyimu I changed the native code, on which the Python package xgboost depends. See https://xgboost.readthedocs.io/en/latest/build.html#building-the-shared-library for compiling the native library libxgboost.so. After re-compiling libxgboost.so, you'll have to re-install the Python package.

@Yiyiyimu
Copy link
Author

@hcho3 Thank you! I followed the document to compile xgboost, but it's still the same, evaluated on rmse. Is there something else I can show to you to let you know where could be the problem?

@hcho3
Copy link
Collaborator

hcho3 commented Aug 20, 2018

Looks like your Python is picking up an older version of XGBoost. Try setting PYTHONPATH explicitly:

PYTHONPATH=./python-package python3 [your script]

to make sure that the latest master version is being used.

@lock lock bot locked as resolved and limited conversation to collaborators Nov 18, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants