Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate PFI on raw features #4266

Open
najeeb-kazmi opened this issue Sep 30, 2019 · 0 comments
Open

Evaluate PFI on raw features #4266

najeeb-kazmi opened this issue Sep 30, 2019 · 0 comments
Assignees
Labels
enhancement New feature or request P2 Priority of the issue for triage purpose: Needs to be fixed at some point.

Comments

@najeeb-kazmi
Copy link
Member

najeeb-kazmi commented Sep 30, 2019

Right now, the PFI implementation, permutes and evaluates feature importances on the slots of the feature vector. However, I may want to evaluate PFI on the raw features (instead of or in addition to) the slots of the feature vector.

Consider this scenario: PCA transform or a "hash" transform e.g. Text Transform with NGram Hash or Categorical Hash has been applied to raw features and the corresponding slots have no name or are unintelligible names. PFI on these slots isn't very "explainable" for these slots. Some sense of "importance" can still be had if the initial column could be permuted.

In addition, PFI operates on the numerical features that were passed to the trainer. When the original features in the dataset are categorical or strings, this means that the trainer sees the output of the one-hot encoding transformer, which adds multiple columns for each individual input column. It is not always useful for the user to get feature importance on each slot of the featurized column, and there should be an option to evaluate PFI on the initial input columns.

See also #4296 #3895

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request P2 Priority of the issue for triage purpose: Needs to be fixed at some point.
Projects
None yet
Development

No branches or pull requests

3 participants