Add InformativeFeatures class #58

rebeccabilbro · 2016-10-05T11:21:06Z

Need to add a class to enable the user to evaluate the features that were most informative for a fitted model. This class will inherit from ScoreVisualizer

The text was updated successfully, but these errors were encountered:

rebeccabilbro · 2016-10-05T11:23:09Z

http://scikit-learn.org/stable/auto_examples/ensemble/plot_forest_importances.html

rebeccabilbro · 2016-10-05T11:32:40Z

In Scikit-Learn, it looks like most supervised methods have a coef_ or feature_importances_ method to determine the most important features.

Examples that use feature_importances_:

Examples that use coef_:

bbengfort · 2016-10-05T13:15:19Z

@rebeccabilbro adding the comments I wrote on this from the paper. Also; I'd say this is a feature, not technical debt?

Generalized linear models compute a predicted independent variable via the linear combination of an array of coefficients with an array of dependent variables. GLMs are fit by modifying the coefficients so as to minimize error and regularization techniques specify how the model modifies coefficients in relation to each other. As a result, an opportunity presents itself: larger coefficients are necessarily "more informative" because they contribute a greater weight to the final prediction in most cases. Additionally we may say that instance features may also be more or less
"informative" depending on the product of the instance feature value with the feature coefficient. This creates two possibilities:

We can compare models based on ranking of coefficients, such that a higher coefficient is "more informative".
We can compare instances based on ranking of feature/coefficient products such that a higher product is "more informative".

In both cases, because the coefficient may be negative (indicating a strong negative correlation) we must rank features by the absolute values of their coefficients. Visualizing a model or multiple models by most informative feature is usually done via bar chart where the y-axis is the feature names and the x-axis is numeric value of the coefficient such that the x-axis has both a positive and negative quadrant. The bigger the size of the bar, the more informative that feature is.

This method may also be used for instances; but generally there are very many instances relative to the number models being compared. Instead a heatmap grid is a better choice to inspect the influence of features on individual instances. Here the grid is constructed such that the x-axis represents individual features, and the y-axis represents individual instances. The color of each cell (an instance, feature pair) represents the magnitude of the product of the instance value with the feature's coefficient for a single model. Visual inspection of this diagnostic may reveal a set of instances for which one feature is more predictive than another; or other types of regions of information in the model itself.

rebeccabilbro · 2016-10-05T14:42:46Z

@bbengfort is that a way of saying that you want to take this one?

bbengfort · 2016-10-05T14:52:48Z

@rebeccabilbro just saying that I had been thinking about it and had written some notes; if you'd like to take a first crack at it - please feel free! I'm still pushing on Radviz and Paralllel coords (plus all the style and architecture stuff).

bbengfort · 2018-03-03T18:48:15Z

Closed by #317

rebeccabilbro added level: intermediate python coding expertise required priority: medium can wait until after next release type: technical debt work to optimize or generalize code labels Oct 5, 2016

rebeccabilbro self-assigned this Oct 5, 2016

bbengfort added type: feature a new visualizer or utility for yb and removed type: technical debt work to optimize or generalize code labels Oct 5, 2016

rebeccabilbro assigned bbengfort Oct 5, 2016

rebeccabilbro removed their assignment Oct 5, 2016

bbengfort added this to the Backlog milestone Oct 6, 2016

rebeccabilbro mentioned this issue Oct 13, 2016

Most informative features #39

Closed

rebeccabilbro added the ready label Oct 13, 2016

rebeccabilbro modified the milestones: Backlog, Version 0.3.2 Oct 13, 2016

bbengfort modified the milestones: Backlog, Version 0.3.2 Oct 13, 2016

bbengfort removed the ready label Oct 13, 2016

bbengfort modified the milestones: PyCon Sprints, Backlog May 11, 2017

ayota mentioned this issue May 23, 2017

WIP top coefficients visualizer for text classifier #164

Closed

rebeccabilbro mentioned this issue Jul 11, 2017

Recursive Feature Elimination #268

Closed

This was referenced Mar 2, 2018

added ImportanceVisualizer for tree-based models #195

Closed

Add FeatureImportanceVisualizer for Treebased-Models #194

Open

FeatureImportances Visualizer #317

Merged

bbengfort closed this as completed Mar 3, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add InformativeFeatures class #58

Add InformativeFeatures class #58

rebeccabilbro commented Oct 5, 2016 •

edited by bbengfort

Loading

rebeccabilbro commented Oct 5, 2016

rebeccabilbro commented Oct 5, 2016 •

edited by ayota

Loading

bbengfort commented Oct 5, 2016

rebeccabilbro commented Oct 5, 2016

bbengfort commented Oct 5, 2016 •

edited

Loading

bbengfort commented Mar 3, 2018

Add InformativeFeatures class #58

Add InformativeFeatures class #58

Comments

rebeccabilbro commented Oct 5, 2016 • edited by bbengfort Loading

rebeccabilbro commented Oct 5, 2016

rebeccabilbro commented Oct 5, 2016 • edited by ayota Loading

bbengfort commented Oct 5, 2016

rebeccabilbro commented Oct 5, 2016

bbengfort commented Oct 5, 2016 • edited Loading

bbengfort commented Mar 3, 2018

rebeccabilbro commented Oct 5, 2016 •

edited by bbengfort

Loading

rebeccabilbro commented Oct 5, 2016 •

edited by ayota

Loading

bbengfort commented Oct 5, 2016 •

edited

Loading