Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intuition behind hessian and min_child_weight #2483

Closed
sergun opened this issue Jul 3, 2017 · 7 comments
Closed

Intuition behind hessian and min_child_weight #2483

sergun opened this issue Jul 3, 2017 · 7 comments

Comments

@sergun
Copy link

sergun commented Jul 3, 2017

I tried to understand the role of hessian and min_child_weight but not everything is clear from the documentation.

  1. Why it is better to restrict hessian than number of samples?
  2. I tried to go deeper with hessian of log loss and min_child_weight. Hessian of logloss is the derivative of sigmoid. So it is positive, takes maximum at zero and minimum at +-inf. What is intuition behind restricting such hessian with min_child_weight? Why node with observations with large hessian (that means class label forecast from previous iteration close to zero) is better for split than node with high absolute forecasts?
  3. What is the role of hessian in gain calculation?
@benoitdescamps
Copy link

@sergun
Copy link
Author

sergun commented Jul 3, 2017

benoitdescamps, I read this article and know these formulas but they don't answer my questions.

@superbobry
Copy link
Contributor

@sergun while I agree that the PDF does not magically give all the answers, the answer to the (3) is there, see eq. 7.

@sergun
Copy link
Author

sergun commented Jul 4, 2017

@superboby right, I saw the equation bu exact role is not clear. We can have two split candidates. Both with the same Gleft^2, Gright^2, (Gright+Gleft)^2 but differ in Hleft and Hright so one split will be better. It's not clear what is the meaning of such normalization by hessian during split find. It's complitely clear for RMSE loss when H is equal to number of observations but not clear for logloss when H is direvative of sigmoid

@sergun sergun closed this as completed Jul 4, 2017
@sergun sergun reopened this Jul 4, 2017
@dmlc dmlc deleted a comment from sergun Jul 4, 2017
@khotilov
Copy link
Member

khotilov commented Jul 4, 2017

The equation following eq. 9 is a rather intuitive representation of the role of hessian:

image

@sergun
Copy link
Author

sergun commented Jul 5, 2017

@khotilov Thanks!
But l'm still out of understang of hessian role. I understand his meaning for RMSE loss when hessian=1 and this means number of samples in a node. But for hessian of logloss it is not clear why it is useful in th algorithm that it's value is large for small predictions from previous round and small for large predictions. Is it something like: "observations with are already good described by ensemble(=large values of prediction) should have smaller influence in a node to be splitted"? It is also interesting how this logic works in case of class imbalance when uncertanty of prediction is shifted from zero.

@benoitdescamps
Copy link

benoitdescamps commented Jul 5, 2017

In the case of binary classification, you might assume that each observations are i.i.d. Bernouilli B(1-p). Recall that Bernouilli has mean 1-p and variance p(1-p). In the case of logloss notice that these are proportional to the gradient and Hessian respectively.

Have a look at equation(5) in the paper. The weight is the ratio of the averaged probability of each observation, hence roughly the average probability of the leaf up to a factor, divided by the sum of the variance of each observation, hence the variance of the probability of the leaf up to a factor.
This is roughly the Z-score of the leaf.

So basically when boosting, you are updating the probabilities of each observations with the Z-score of the new partition they belong to.

I do not know, if this helps, but if your intuition for RMSE satisfies you, then this should be enough.

@tqchen tqchen closed this as completed Jul 4, 2018
@lock lock bot locked as resolved and limited conversation to collaborators Oct 24, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants