-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Gini Importances #154
Conversation
Addresses #127 for FIGS. |
Also, I didn't add support for |
Yeah good catch we can drop this. |
I think we should make all the importances positive (just flip the sign at the end), to stay consistent with sklearn. |
@csinva if we do that, multiplying by -1, the order would flip too, which is not what we want, right? Unless we think that when normalizing, by that negative number, we basically already flipped the order. I can see that actually, Maybe I should normalize by |
BTW sklearn has an additional normalization step in the code, but it is not ultimately used. |
Actually, I think we should try to figure this out first as the negative importance for |
@csinva I did some debugging and I think that the
The default param for How would you like to fix this? Seems to be a problem in the underlying FIGS fitting algo itself, not the impurity or importance calculations. Debugging output:
|
Ah you're right, when that new potentital node is updated at this line, |
Okay I think it should be fixed now |
Perfect! |
@csinva ready to merge! |
@csinva BTW I want to make one more PR to clean up that notebook and fix spelling mistakes, but then we can make a new release if you'd like. |
Hi @csinva how do the new Gini importances look?
I based the calculation off sklearn's code from here and here, though it needed to be made recursive as we do not have arrays of all the nodes and their properties.
There is a demo of the new code in the
FIGS_viz_demo.ipynb
notebook. I am a bit concerned with theNone
impurity in the root node of the second tree:I filled it with 0 for the calculation for now:
Is
None
expected if the tree has just one split?Also, after taking the mean and normalizing most of the importances are negative. I think this is fine, as we just care about the relative order of the features, but wanted to get your opinion as well:
BTW I noticed that we have an unused variable in
plot()
:Is this need for anything, or should we delete it?