Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binned chi^2 definition #32

Closed
alexpearce opened this issue May 4, 2016 · 2 comments
Closed

Binned chi^2 definition #32

alexpearce opened this issue May 4, 2016 · 2 comments

Comments

@alexpearce
Copy link
Contributor

I saw your presentation on “Reweighting distributions with gradient boosting” and it looked great, so gave it go. But now I want to explain it to others, so have to actually understand how it works 😄

One thing I'm not certain on is how the node splitting is determined, i.e. how the value of the split of the training data along a feature axis at a node is determined. You say this is the “symmetrized binned chi^2”, and I'd like to check that my understanding of what that is is correct. I have a notebook to try reproduce your plot. It looks similar, but I might have done something wrong nevertheless. Does it look sensible?

I tried to find where this computation is done in the code, but I couldn't find it. I'm not at all familiar with the general scikit-learn code architecture, so it's just that I have trouble following the flow of all the Xs and ys. Could you point me to where the chi^2 computation is done?

(And, of course, thanks for the excellent package! 🍻)

@arogozhnikov
Copy link
Owner

Hey, Alex.

I looked through your notebook, and everything is correct there.

I tried to find where this computation is done in the code, but I couldn't find it.

It's a bit tricky, there is a way to substitute optimizing chi^2 with minimizing MSE (and found splits will be identical).

Given that a) most packages are able to optimize MSE out-of-the-box b) I don't want to write one more tree, hep_ml makes use of this trick. In particual, ReweightLossFunction is constructing appropriate X, y parameters for tree to minimize MSE.

(And, of course, thanks for the excellent package! 🍻)

Thanks, you're welcome :)

@alexpearce
Copy link
Contributor Author

Ah neat, that makes sense. That's for the explanation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants