Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Handle degenerate cases that result in NAN's in leaf outputs #5258

Open
wants to merge 6 commits into
base: master
from

Conversation

@hcho3
Copy link
Collaborator

hcho3 commented Feb 1, 2020

See discussion at https://discuss.xgboost.ai/t/still-getting-unexplained-nans-new-replication-code/1383?u=hcho3.

In certain degenerate cases, both the hessian sum and reg_lambda can be zero, leading to NANs in leaf outputs.

This PR eliminates NANs, by returning 0.0 whenever both hessian sum and reg_lambda are zero.

It also adds a check in the Learner class to ensure that there is no INF or NAN in the prediction result vector. I left out the check for now, since performance impact is uncertain.

Note: I think NANs in the prediction is a serious bug, so its fix should be part of 1.0.0 release. Will return after 1.0.0 release.

@dan-stromberg

@hcho3 hcho3 mentioned this pull request Feb 1, 2020
11 of 12 tasks complete
Copy link
Member

trivialfis left a comment

Why not using a kRtEps?

@hcho3

This comment has been minimized.

Copy link
Collaborator Author

hcho3 commented Feb 1, 2020

@trivialfis No, since the goal is to prevent a zero denominator. A denominator of 1e-6f is valid and thus not degenerate.

@hcho3

This comment has been minimized.

Copy link
Collaborator Author

hcho3 commented Feb 1, 2020

The test_regressor test is throwing TimeoutError exception at teardown. I will come back to it later.

@trivialfis

This comment has been minimized.

Copy link
Member

trivialfis commented Feb 1, 2020

I mean replacing 0 with Eps as denominator when degenerate case occurs, instead of returning 0 as result.

@hcho3

This comment has been minimized.

Copy link
Collaborator Author

hcho3 commented Feb 1, 2020

So apply a clipping operation to the denominator then? I see now that it would be more a reasonable choice.

@trivialfis

This comment has been minimized.

Copy link
Member

trivialfis commented Feb 1, 2020

I'm not sure what does "clipping" mean in this context. I think the heuristic adopted elsewhere is replacing 0 with a small float, so the value is changed from NaN to something close to inf.

@trivialfis

This comment has been minimized.

Copy link
Member

trivialfis commented Feb 1, 2020

Let me go through it one more time.

@trivialfis

This comment has been minimized.

Copy link
Member

trivialfis commented Feb 1, 2020

We need to make sure that, when hessian is invalid (after regularization), the split is guaranteed to be invalid.

@hcho3

This comment has been minimized.

Copy link
Collaborator Author

hcho3 commented Feb 1, 2020

@trivialfis Actually, in this degenerate case, no split meets the standard of min_child_weight specified. So this line

best.Update(loss_chg, fid, split_pt, d_step == -1, e, c);

is never called, and best probably contains all zeros.

I cannot think of any other scenarios where the hessian would be zero, given this check

if (e.sum_hess >= param_.min_child_weight) {
c.SetSubstract(snode.stats, e);
if (c.sum_hess >= param_.min_child_weight) {

So maybe returning 0 for zero hessian is sensible?

@trivialfis

This comment has been minimized.

Copy link
Member

trivialfis commented Feb 1, 2020

@hcho3 If it's not clear then we don't have to fix it right now. This algorithmic change is quite huge (even it's one line change)

@hcho3

This comment has been minimized.

Copy link
Collaborator Author

hcho3 commented Feb 1, 2020

@trivialfis Also the issue only appears when the regularization parameter reg_lambda is set to zero.

For now, can we just have the function return a zero weight whenever sum_hess is zero? I'd like to return to it after the release.

@hcho3

This comment has been minimized.

Copy link
Collaborator Author

hcho3 commented Feb 1, 2020

I will direct the OP to set a non-zero reg_lambda. Will come back and make a true fix later.

@hcho3 hcho3 closed this Feb 1, 2020
@hcho3 hcho3 deleted the hcho3:fix_pred_nan branch Feb 1, 2020
@hcho3 hcho3 restored the hcho3:fix_pred_nan branch Feb 1, 2020
@hcho3 hcho3 reopened this Feb 1, 2020
@hcho3 hcho3 added the status: WIP label Feb 1, 2020
@trivialfis

This comment has been minimized.

Copy link
Member

trivialfis commented Feb 1, 2020

There are a few other places employing this equation, fixing one of them the others will diverge. It might be better if we can investigate it further.

@hcho3 hcho3 changed the title Handle degenerate cases that result in NAN's in leaf outputs [WIP] Handle degenerate cases that result in NAN's in leaf outputs Feb 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants
You can’t perform that action at this time.