re-use of same model automatically continues training even if not desired #6536

pseudotensor · 2020-12-20T20:13:46Z

@trivialfis , just by changing nightly we found the continuation commit broke some things.

125b3c0 is good
ca3da55 is bad

Issue is that before one had to pass xgb_model explicitly to continue training, but now it always happens, ruining 'best iterations".

In original training we see:

[0]	validation_0-mae:1.13277
[1]	validation_0-mae:0.90054
[2]	validation_0-mae:0.81062
[3]	validation_0-mae:0.77450
[4]	validation_0-mae:0.76209
[5]	validation_0-mae:0.75798
[6]	validation_0-mae:0.75608
[7]	validation_0-mae:0.75518
[8]	validation_0-mae:0.75476
[9]	validation_0-mae:0.75456
[10]	validation_0-mae:0.75445
[11]	validation_0-mae:0.75440
[12]	validation_0-mae:0.75440
[13]	validation_0-mae:0.75440
[14]	validation_0-mae:0.75440
[15]	validation_0-mae:0.75440
[16]	validation_0-mae:0.75440
[17]	validation_0-mae:0.75440
[18]	validation_0-mae:0.75440
[19]	validation_0-mae:0.75440
[20]	validation_0-mae:0.75440

Then after loading pickled state and fitting freshly we see:

[0]     validation_0-mae:0.75438
[1]     validation_0-mae:0.75438
[2]     validation_0-mae:0.75438
[3]     validation_0-mae:0.75438
[4]     validation_0-mae:0.75438
[5]     validation_0-mae:0.75438
[6]     validation_0-mae:0.75438
[7]     validation_0-mae:0.75438
[8]     validation_0-mae:0.75438
[9]     validation_0-mae:0.75438

This can be mimicked by loading from the pickled state and then also re-establish the model from a bare class and compare.

import pickle

model, X, y, sample_weight, kwargsscaled_fit = pickle.load(open("glm_trialin2.pkl", "rb"))
params = model.get_params()
model.fit(X, y, sample_weight=sample_weight, **kwargsscaled_fit)
print(model.get_booster().best_iteration)
print(model.get_booster().get_dump()[0])
print("Here1")

model2 = model.__class__(**params)
model2.fit(X, y, sample_weight=sample_weight, **kwargsscaled_fit)
print(model.get_booster().best_iteration)
print(model.get_booster().get_dump()[0])
print("Here2")

[0]     validation_0-mae:0.75438
[1]     validation_0-mae:0.75438
[2]     validation_0-mae:0.75438
[3]     validation_0-mae:0.75438
[4]     validation_0-mae:0.75438
[5]     validation_0-mae:0.75438
[6]     validation_0-mae:0.75438
[7]     validation_0-mae:0.75438
[8]     validation_0-mae:0.75438
[9]     validation_0-mae:0.75438
0
bias:
-0.125237
weight:
1.88663
0.0213428
-0.0363957
0.0181865
0.0448639
4.55768e-06
0.0207689
0.0528597
0.0381599

[0]     validation_0-mae:1.13277
[1]     validation_0-mae:0.90054
[2]     validation_0-mae:0.81062
[3]     validation_0-mae:0.77450
[4]     validation_0-mae:0.76209
[5]     validation_0-mae:0.75798
[6]     validation_0-mae:0.75608
[7]     validation_0-mae:0.75518
[8]     validation_0-mae:0.75476
[9]     validation_0-mae:0.75456
[10]    validation_0-mae:0.75445
[11]    validation_0-mae:0.75440
[12]    validation_0-mae:0.75440
[13]    validation_0-mae:0.75440
[14]    validation_0-mae:0.75440
[15]    validation_0-mae:0.75440
[16]    validation_0-mae:0.75440
[17]    validation_0-mae:0.75440
[18]    validation_0-mae:0.75440
[19]    validation_0-mae:0.75440
[20]    validation_0-mae:0.75440
11
bias:
-0.125222
weight:
1.88636
0.0213169
-0.0363786
0.0181704
0.0449472
9.11537e-06
0.0207794
0.052849
0.0381668

That is, the pickle loaded state should just load the model, and model.fit should be same as fresh case, but instead it (without us asking) continues off the original fit case and has best_iterations=0. This makes things like asking how many iterations are 'best' all wrong, since now it would say 0.

Previously one only got continued training if explicitly passing xgb_model, but not I think the code passes "model" always, which is a problem.

We are working around this by re-instantiating the model from params + class, but I don't think this should be required. It's certainly a breaking change if it was done.

glmpklissue.zip

The text was updated successfully, but these errors were encountered:

trivialfis · 2020-12-20T22:34:10Z

I made the change with bug fixing in mind. It's unusual to load a model just to train it from fresh start. Bug fixes are breaking change but in a good way, so I didn't make a big announcement. But suggestions are welcomed.

cc @hcho3 .

trivialfis · 2020-12-20T22:36:01Z

Also this ca3da55 will get training continuation with early stopping working.

pseudotensor · 2020-12-20T23:35:58Z

The bug fixes in the PR are fine, assuming user wants to continue training is not

trivialfis · 2020-12-21T01:49:39Z

As it isn't documented nor specified in any skl guide line (so far I haven't found, feel free to correct me), I have to make an assumption when users call fit with an existing model loaded. It's either one of:

The user loaded a model but want to start anew.
The user loaded a model and want to continue the training on loaded model.

I prefer second one ... I can make a patch to revert the behavior on 1.3, but still want to keep it on master for next release.

We are working around this by re-instantiating the model from params + class,

To me that's quite reasonable thing to do instead of calling it "workaround". If you want to fit a new model, initiating a new object makes sense.

params = old_reg.get_params()   # Or if you save the parameters via other means.
new_reg = xgb.XGBRegressor()
new_reg.set_params(params)
new_reg.fit(X, y)

hcho3 · 2020-12-21T01:51:13Z

@trivialfis ca3da55 is not part of 1.3, so we didn't break 1.3 here.

trivialfis · 2020-12-21T01:53:35Z

@hcho3 Oops, didn't look into the history. Thanks for reminding. Also, WDYT?

pseudotensor · 2020-12-21T01:55:46Z

FYI I'm definitely good with making continued training easier, it's just a surprising change. It would be good to have a bool to pass if one wants to continue training, instead of assuming. If I'm mistaken and this is what @hcho3 expects from xgboost in first place, then I can accept and just alter my usage.

trivialfis · 2020-12-21T01:58:12Z

Got it. Let me put a breaking notice on 1.4 roadmap. Thanks for you feedback. Instead of adding another parameter, I want to deprecate the parameter xgb_model in fit, and use load_model exclusively, WDYT?

hcho3 · 2020-12-21T02:03:26Z

@trivialfis @pseudotensor Here is my take on this matter: In scikit-learn API, the expectation is that calling fit() will clear an existing model. So I have to push back a bit here. At least for the scikit-learn API, we may want to explicitly require specification of a parameter to enable training continuation. (I'm neutral on how xgb.train() behave.) See https://stackoverflow.com/a/49841366

trivialfis · 2020-12-21T02:09:15Z

Thanks for the reference. Em ... training continuation on xgboost doesn't exactly match the partial fit, as we should not do mini batch with xgboost natively. warm_start ?

hcho3 · 2020-12-21T02:18:46Z

@trivialfis Let's keep xgb_model option? What's the issue with it?

trivialfis · 2020-12-21T02:23:58Z

If we make it explicit that existing model will be disregarded then I'm fine with it. Before that breaking change, the existing model is always silently ignored.

trivialfis · 2020-12-21T02:24:18Z

To me that's quite surprising.

hcho3 · 2020-12-21T02:26:44Z

@trivialfis Got it. Let's explicitly document that calling fit() will discard existing model, unless xgb_model option is given. Clearing the model upon second invocation of fit() is consistent with how other sklearn models work.

trivialfis · 2020-12-21T02:27:12Z

Also, if we keep that parameter we will have 2 conflicting sources of input model. So we need to somehow inform users the model loaded by load_model doesn't affect fit method if we were to keep xgb_model

trivialfis · 2020-12-21T02:29:10Z

Got it. I will open a PR tomorrow.

hcho3 · 2020-12-22T23:38:40Z

@pseudotensor Do you agree with my description of how fit() should behave?

pseudotensor · 2020-12-22T23:48:56Z

Yes, thanks.

hcho3 mentioned this issue Dec 31, 2020

Calling XGBModel.fit() should clear the Booster by default #6562

Merged

hcho3 closed this as completed in #6562 Dec 31, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

re-use of same model automatically continues training even if not desired #6536

re-use of same model automatically continues training even if not desired #6536

pseudotensor commented Dec 20, 2020

trivialfis commented Dec 20, 2020 •

edited

trivialfis commented Dec 20, 2020

pseudotensor commented Dec 20, 2020

trivialfis commented Dec 21, 2020 •

edited

hcho3 commented Dec 21, 2020

trivialfis commented Dec 21, 2020

pseudotensor commented Dec 21, 2020

trivialfis commented Dec 21, 2020

hcho3 commented Dec 21, 2020 •

edited

trivialfis commented Dec 21, 2020

hcho3 commented Dec 21, 2020

trivialfis commented Dec 21, 2020 •

edited

trivialfis commented Dec 21, 2020

hcho3 commented Dec 21, 2020 •

edited

trivialfis commented Dec 21, 2020 •

edited

trivialfis commented Dec 21, 2020

hcho3 commented Dec 22, 2020

pseudotensor commented Dec 22, 2020

re-use of same model automatically continues training even if not desired #6536

re-use of same model automatically continues training even if not desired #6536

Comments

pseudotensor commented Dec 20, 2020

trivialfis commented Dec 20, 2020 • edited

trivialfis commented Dec 20, 2020

pseudotensor commented Dec 20, 2020

trivialfis commented Dec 21, 2020 • edited

hcho3 commented Dec 21, 2020

trivialfis commented Dec 21, 2020

pseudotensor commented Dec 21, 2020

trivialfis commented Dec 21, 2020

hcho3 commented Dec 21, 2020 • edited

trivialfis commented Dec 21, 2020

hcho3 commented Dec 21, 2020

trivialfis commented Dec 21, 2020 • edited

trivialfis commented Dec 21, 2020

hcho3 commented Dec 21, 2020 • edited

trivialfis commented Dec 21, 2020 • edited

trivialfis commented Dec 21, 2020

hcho3 commented Dec 22, 2020

pseudotensor commented Dec 22, 2020

trivialfis commented Dec 20, 2020 •

edited

trivialfis commented Dec 21, 2020 •

edited

hcho3 commented Dec 21, 2020 •

edited

trivialfis commented Dec 21, 2020 •

edited

hcho3 commented Dec 21, 2020 •

edited

trivialfis commented Dec 21, 2020 •

edited