You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I saw your presentation on “Reweighting distributions with gradient boosting” and it looked great, so gave it go. But now I want to explain it to others, so have to actually understand how it works 😄
One thing I'm not certain on is how the node splitting is determined, i.e. how the value of the split of the training data along a feature axis at a node is determined. You say this is the “symmetrized binned chi^2”, and I'd like to check that my understanding of what that is is correct. I have a notebook to try reproduce your plot. It looks similar, but I might have done something wrong nevertheless. Does it look sensible?
I tried to find where this computation is done in the code, but I couldn't find it. I'm not at all familiar with the general scikit-learn code architecture, so it's just that I have trouble following the flow of all the Xs and ys. Could you point me to where the chi^2 computation is done?
(And, of course, thanks for the excellent package! 🍻)
The text was updated successfully, but these errors were encountered:
I looked through your notebook, and everything is correct there.
I tried to find where this computation is done in the code, but I couldn't find it.
It's a bit tricky, there is a way to substitute optimizing chi^2 with minimizing MSE (and found splits will be identical).
Given that a) most packages are able to optimize MSE out-of-the-box b) I don't want to write one more tree, hep_ml makes use of this trick. In particual, ReweightLossFunction is constructing appropriate X, y parameters for tree to minimize MSE.
(And, of course, thanks for the excellent package! 🍻)
I saw your presentation on “Reweighting distributions with gradient boosting” and it looked great, so gave it go. But now I want to explain it to others, so have to actually understand how it works 😄
One thing I'm not certain on is how the node splitting is determined, i.e. how the value of the split of the training data along a feature axis at a node is determined. You say this is the “symmetrized binned chi^2”, and I'd like to check that my understanding of what that is is correct. I have a notebook to try reproduce your plot. It looks similar, but I might have done something wrong nevertheless. Does it look sensible?
I tried to find where this computation is done in the code, but I couldn't find it. I'm not at all familiar with the general scikit-learn code architecture, so it's just that I have trouble following the flow of all the
X
s andy
s. Could you point me to where the chi^2 computation is done?(And, of course, thanks for the excellent package! 🍻)
The text was updated successfully, but these errors were encountered: