Intuition behind hessian and min_child_weight #2483

sergun · 2017-07-03T20:37:18Z

I tried to understand the role of hessian and min_child_weight but not everything is clear from the documentation.

Why it is better to restrict hessian than number of samples?
I tried to go deeper with hessian of log loss and min_child_weight. Hessian of logloss is the derivative of sigmoid. So it is positive, takes maximum at zero and minimum at +-inf. What is intuition behind restricting such hessian with min_child_weight? Why node with observations with large hessian (that means class label forecast from previous iteration close to zero) is better for split than node with high absolute forecasts?
What is the role of hessian in gain calculation?

benoitdescamps · 2017-07-03T21:39:19Z

https://arxiv.org/pdf/1603.02754.pdf

sergun · 2017-07-03T21:56:11Z

benoitdescamps, I read this article and know these formulas but they don't answer my questions.

superbobry · 2017-07-03T22:43:05Z

@sergun while I agree that the PDF does not magically give all the answers, the answer to the (3) is there, see eq. 7.

sergun · 2017-07-04T05:54:25Z

@superboby right, I saw the equation bu exact role is not clear. We can have two split candidates. Both with the same Gleft^2, Gright^2, (Gright+Gleft)^2 but differ in Hleft and Hright so one split will be better. It's not clear what is the meaning of such normalization by hessian during split find. It's complitely clear for RMSE loss when H is equal to number of observations but not clear for logloss when H is direvative of sigmoid

khotilov · 2017-07-04T20:45:23Z

The equation following eq. 9 is a rather intuitive representation of the role of hessian:

sergun · 2017-07-05T19:01:34Z

@khotilov Thanks!
But l'm still out of understang of hessian role. I understand his meaning for RMSE loss when hessian=1 and this means number of samples in a node. But for hessian of logloss it is not clear why it is useful in th algorithm that it's value is large for small predictions from previous round and small for large predictions. Is it something like: "observations with are already good described by ensemble(=large values of prediction) should have smaller influence in a node to be splitted"? It is also interesting how this logic works in case of class imbalance when uncertanty of prediction is shifted from zero.

benoitdescamps · 2017-07-05T20:02:04Z

In the case of binary classification, you might assume that each observations are i.i.d. Bernouilli B(1-p). Recall that Bernouilli has mean 1-p and variance p(1-p). In the case of logloss notice that these are proportional to the gradient and Hessian respectively.

Have a look at equation(5) in the paper. The weight is the ratio of the averaged probability of each observation, hence roughly the average probability of the leaf up to a factor, divided by the sum of the variance of each observation, hence the variance of the probability of the leaf up to a factor.
This is roughly the Z-score of the leaf.

So basically when boosting, you are updating the probabilities of each observations with the Z-score of the new partition they belong to.

I do not know, if this helps, but if your intuition for RMSE satisfies you, then this should be enough.

sergun closed this as completed Jul 4, 2017

sergun reopened this Jul 4, 2017

dmlc deleted a comment from sergun Jul 4, 2017

tqchen closed this as completed Jul 4, 2018

lock bot locked as resolved and limited conversation to collaborators Oct 24, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intuition behind hessian and min_child_weight #2483

Intuition behind hessian and min_child_weight #2483

sergun commented Jul 3, 2017

benoitdescamps commented Jul 3, 2017

sergun commented Jul 3, 2017

superbobry commented Jul 3, 2017

sergun commented Jul 4, 2017

khotilov commented Jul 4, 2017

sergun commented Jul 5, 2017

benoitdescamps commented Jul 5, 2017 •

edited

Loading

Intuition behind hessian and min_child_weight #2483

Intuition behind hessian and min_child_weight #2483

Comments

sergun commented Jul 3, 2017

benoitdescamps commented Jul 3, 2017

sergun commented Jul 3, 2017

superbobry commented Jul 3, 2017

sergun commented Jul 4, 2017

khotilov commented Jul 4, 2017

sergun commented Jul 5, 2017

benoitdescamps commented Jul 5, 2017 • edited Loading

benoitdescamps commented Jul 5, 2017 •

edited

Loading