You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now, the PFI implementation, permutes and evaluates feature importances on the slots of the feature vector. However, I may want to evaluate PFI on the raw features (instead of or in addition to) the slots of the feature vector.
Consider this scenario: PCA transform or a "hash" transform e.g. Text Transform with NGram Hash or Categorical Hash has been applied to raw features and the corresponding slots have no name or are unintelligible names. PFI on these slots isn't very "explainable" for these slots. Some sense of "importance" can still be had if the initial column could be permuted.
In addition, PFI operates on the numerical features that were passed to the trainer. When the original features in the dataset are categorical or strings, this means that the trainer sees the output of the one-hot encoding transformer, which adds multiple columns for each individual input column. It is not always useful for the user to get feature importance on each slot of the featurized column, and there should be an option to evaluate PFI on the initial input columns.
Right now, the PFI implementation, permutes and evaluates feature importances on the slots of the feature vector. However, I may want to evaluate PFI on the raw features (instead of or in addition to) the slots of the feature vector.
Consider this scenario: PCA transform or a "hash" transform e.g. Text Transform with NGram Hash or Categorical Hash has been applied to raw features and the corresponding slots have no name or are unintelligible names. PFI on these slots isn't very "explainable" for these slots. Some sense of "importance" can still be had if the initial column could be permuted.
In addition, PFI operates on the numerical features that were passed to the trainer. When the original features in the dataset are categorical or strings, this means that the trainer sees the output of the one-hot encoding transformer, which adds multiple columns for each individual input column. It is not always useful for the user to get feature importance on each slot of the featurized column, and there should be an option to evaluate PFI on the initial input columns.
See also #4296 #3895
The text was updated successfully, but these errors were encountered: