-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature_importance with more permutations #29
Comments
Good idea! Both average and also the uncertanity would be useful. From B runs one can get min importance, max importance, 1q, 3q and average. Then on the plot we can add some information related to uncertainty |
Implementing the average is straightforward and would preserve the format of the output object. For the uncertainty, the output format would have to change. Do you have a preference? Some options:
|
Good points. I would like to keep backward compatibility for existing functions, as there are other packages that may depend on the So, let's assume that there is an additional argument Then the With default B=1 old scripts will work without any change, and for B>1 one may see averages or add additional information about range. |
Hadn't considered attributes. That works, thanks. Some code on this is now in a fork, branch "fi_permutations". The function already has an argument |
I updated the fork and it can now run multiple permutations of feature values as well as subsample the dataset. The current implementation is to subsample, permute the feature values, then repeat the whole procedure B times. Is it OK to send you a pull request with this? |
Thanks, it looks great, Argument In the |
Thanks. I added the label and the additional argument. I set the default to |
looks great, thanks! |
Hi ModelOriented.
Very useful collection of tools in this package ecosystem. Thank you.
I came across
ingredients
because of thefeature_importance
function. It works well based on a single permutation, but the variability between runs is sometimes noticeable on small datasets. For example, runs on the Titanic dataset can disagree on the importance ordering of the second- and third-best features.Would you be interested in including a new argument to set the number of permutations in
feature_importance
? The function could output the average dropout loss over those permutations. Returning averages would be compatible with the existing output format and hence with the rest of the package, for example, plots. I can send a pull request in this direction.The text was updated successfully, but these errors were encountered: