Evaluate PFI on raw features #4266

najeeb-kazmi · 2019-09-30T18:43:56Z

Right now, the PFI implementation, permutes and evaluates feature importances on the slots of the feature vector. However, I may want to evaluate PFI on the raw features (instead of or in addition to) the slots of the feature vector.

Consider this scenario: PCA transform or a "hash" transform e.g. Text Transform with NGram Hash or Categorical Hash has been applied to raw features and the corresponding slots have no name or are unintelligible names. PFI on these slots isn't very "explainable" for these slots. Some sense of "importance" can still be had if the initial column could be permuted.

In addition, PFI operates on the numerical features that were passed to the trainer. When the original features in the dataset are categorical or strings, this means that the trainer sees the output of the one-hot encoding transformer, which adds multiple columns for each individual input column. It is not always useful for the user to get feature importance on each slot of the featurized column, and there should be an option to evaluate PFI on the initial input columns.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate PFI on raw features #4266

Evaluate PFI on raw features #4266

najeeb-kazmi commented Sep 30, 2019 •

edited

Evaluate PFI on raw features #4266

Evaluate PFI on raw features #4266

Comments

najeeb-kazmi commented Sep 30, 2019 • edited

najeeb-kazmi commented Sep 30, 2019 •

edited