Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

re-use of same model automatically continues training even if not desired #6536

Closed
pseudotensor opened this issue Dec 20, 2020 · 18 comments · Fixed by #6562
Closed

re-use of same model automatically continues training even if not desired #6536

pseudotensor opened this issue Dec 20, 2020 · 18 comments · Fixed by #6562

Comments

@pseudotensor
Copy link
Contributor

@trivialfis , just by changing nightly we found the continuation commit broke some things.

125b3c0 is good
ca3da55 is bad

Issue is that before one had to pass xgb_model explicitly to continue training, but now it always happens, ruining 'best iterations".

In original training we see:

[0]	validation_0-mae:1.13277
[1]	validation_0-mae:0.90054
[2]	validation_0-mae:0.81062
[3]	validation_0-mae:0.77450
[4]	validation_0-mae:0.76209
[5]	validation_0-mae:0.75798
[6]	validation_0-mae:0.75608
[7]	validation_0-mae:0.75518
[8]	validation_0-mae:0.75476
[9]	validation_0-mae:0.75456
[10]	validation_0-mae:0.75445
[11]	validation_0-mae:0.75440
[12]	validation_0-mae:0.75440
[13]	validation_0-mae:0.75440
[14]	validation_0-mae:0.75440
[15]	validation_0-mae:0.75440
[16]	validation_0-mae:0.75440
[17]	validation_0-mae:0.75440
[18]	validation_0-mae:0.75440
[19]	validation_0-mae:0.75440
[20]	validation_0-mae:0.75440

Then after loading pickled state and fitting freshly we see:

[0]     validation_0-mae:0.75438
[1]     validation_0-mae:0.75438
[2]     validation_0-mae:0.75438
[3]     validation_0-mae:0.75438
[4]     validation_0-mae:0.75438
[5]     validation_0-mae:0.75438
[6]     validation_0-mae:0.75438
[7]     validation_0-mae:0.75438
[8]     validation_0-mae:0.75438
[9]     validation_0-mae:0.75438

This can be mimicked by loading from the pickled state and then also re-establish the model from a bare class and compare.

import pickle

model, X, y, sample_weight, kwargsscaled_fit = pickle.load(open("glm_trialin2.pkl", "rb"))
params = model.get_params()
model.fit(X, y, sample_weight=sample_weight, **kwargsscaled_fit)
print(model.get_booster().best_iteration)
print(model.get_booster().get_dump()[0])
print("Here1")

model2 = model.__class__(**params)
model2.fit(X, y, sample_weight=sample_weight, **kwargsscaled_fit)
print(model.get_booster().best_iteration)
print(model.get_booster().get_dump()[0])
print("Here2")
[0]     validation_0-mae:0.75438
[1]     validation_0-mae:0.75438
[2]     validation_0-mae:0.75438
[3]     validation_0-mae:0.75438
[4]     validation_0-mae:0.75438
[5]     validation_0-mae:0.75438
[6]     validation_0-mae:0.75438
[7]     validation_0-mae:0.75438
[8]     validation_0-mae:0.75438
[9]     validation_0-mae:0.75438
0
bias:
-0.125237
weight:
1.88663
0.0213428
-0.0363957
0.0181865
0.0448639
4.55768e-06
0.0207689
0.0528597
0.0381599

[0]     validation_0-mae:1.13277
[1]     validation_0-mae:0.90054
[2]     validation_0-mae:0.81062
[3]     validation_0-mae:0.77450
[4]     validation_0-mae:0.76209
[5]     validation_0-mae:0.75798
[6]     validation_0-mae:0.75608
[7]     validation_0-mae:0.75518
[8]     validation_0-mae:0.75476
[9]     validation_0-mae:0.75456
[10]    validation_0-mae:0.75445
[11]    validation_0-mae:0.75440
[12]    validation_0-mae:0.75440
[13]    validation_0-mae:0.75440
[14]    validation_0-mae:0.75440
[15]    validation_0-mae:0.75440
[16]    validation_0-mae:0.75440
[17]    validation_0-mae:0.75440
[18]    validation_0-mae:0.75440
[19]    validation_0-mae:0.75440
[20]    validation_0-mae:0.75440
11
bias:
-0.125222
weight:
1.88636
0.0213169
-0.0363786
0.0181704
0.0449472
9.11537e-06
0.0207794
0.052849
0.0381668

That is, the pickle loaded state should just load the model, and model.fit should be same as fresh case, but instead it (without us asking) continues off the original fit case and has best_iterations=0. This makes things like asking how many iterations are 'best' all wrong, since now it would say 0.

Previously one only got continued training if explicitly passing xgb_model, but not I think the code passes "model" always, which is a problem.

We are working around this by re-instantiating the model from params + class, but I don't think this should be required. It's certainly a breaking change if it was done.

glmpklissue.zip

@trivialfis
Copy link
Member

trivialfis commented Dec 20, 2020

I made the change with bug fixing in mind. It's unusual to load a model just to train it from fresh start. Bug fixes are breaking change but in a good way, so I didn't make a big announcement. But suggestions are welcomed.

cc @hcho3 .

@trivialfis
Copy link
Member

Also this ca3da55 will get training continuation with early stopping working.

@pseudotensor
Copy link
Contributor Author

The bug fixes in the PR are fine, assuming user wants to continue training is not

@trivialfis
Copy link
Member

trivialfis commented Dec 21, 2020

As it isn't documented nor specified in any skl guide line (so far I haven't found, feel free to correct me), I have to make an assumption when users call fit with an existing model loaded. It's either one of:

  • The user loaded a model but want to start anew.
  • The user loaded a model and want to continue the training on loaded model.

I prefer second one ... I can make a patch to revert the behavior on 1.3, but still want to keep it on master for next release.

We are working around this by re-instantiating the model from params + class,

To me that's quite reasonable thing to do instead of calling it "workaround". If you want to fit a new model, initiating a new object makes sense.

params = old_reg.get_params()   # Or if you save the parameters via other means.
new_reg = xgb.XGBRegressor()
new_reg.set_params(params)
new_reg.fit(X, y)

@hcho3
Copy link
Collaborator

hcho3 commented Dec 21, 2020

@trivialfis ca3da55 is not part of 1.3, so we didn't break 1.3 here.

@trivialfis
Copy link
Member

@hcho3 Oops, didn't look into the history. Thanks for reminding. Also, WDYT?

@pseudotensor
Copy link
Contributor Author

FYI I'm definitely good with making continued training easier, it's just a surprising change. It would be good to have a bool to pass if one wants to continue training, instead of assuming. If I'm mistaken and this is what @hcho3 expects from xgboost in first place, then I can accept and just alter my usage.

@trivialfis
Copy link
Member

Got it. Let me put a breaking notice on 1.4 roadmap. Thanks for you feedback. Instead of adding another parameter, I want to deprecate the parameter xgb_model in fit, and use load_model exclusively, WDYT?

@hcho3
Copy link
Collaborator

hcho3 commented Dec 21, 2020

@trivialfis @pseudotensor Here is my take on this matter: In scikit-learn API, the expectation is that calling fit() will clear an existing model. So I have to push back a bit here. At least for the scikit-learn API, we may want to explicitly require specification of a parameter to enable training continuation. (I'm neutral on how xgb.train() behave.) See https://stackoverflow.com/a/49841366

@trivialfis
Copy link
Member

Thanks for the reference. Em ... training continuation on xgboost doesn't exactly match the partial fit, as we should not do mini batch with xgboost natively. warm_start ?

@hcho3
Copy link
Collaborator

hcho3 commented Dec 21, 2020

@trivialfis Let's keep xgb_model option? What's the issue with it?

@trivialfis
Copy link
Member

trivialfis commented Dec 21, 2020

If we make it explicit that existing model will be disregarded then I'm fine with it. Before that breaking change, the existing model is always silently ignored.

@trivialfis
Copy link
Member

To me that's quite surprising.

@hcho3
Copy link
Collaborator

hcho3 commented Dec 21, 2020

@trivialfis Got it. Let's explicitly document that calling fit() will discard existing model, unless xgb_model option is given. Clearing the model upon second invocation of fit() is consistent with how other sklearn models work.

@trivialfis
Copy link
Member

trivialfis commented Dec 21, 2020

Also, if we keep that parameter we will have 2 conflicting sources of input model. So we need to somehow inform users the model loaded by load_model doesn't affect fit method if we were to keep xgb_model

@trivialfis
Copy link
Member

Got it. I will open a PR tomorrow.

@hcho3
Copy link
Collaborator

hcho3 commented Dec 22, 2020

@pseudotensor Do you agree with my description of how fit() should behave?

@pseudotensor
Copy link
Contributor Author

Yes, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants