Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add InformativeFeatures class #58

Closed
rebeccabilbro opened this issue Oct 5, 2016 · 6 comments
Closed

Add InformativeFeatures class #58

rebeccabilbro opened this issue Oct 5, 2016 · 6 comments
Assignees
Labels
level: intermediate python coding expertise required priority: medium can wait until after next release type: feature a new visualizer or utility for yb
Milestone

Comments

@rebeccabilbro
Copy link
Member

rebeccabilbro commented Oct 5, 2016

Need to add a class to enable the user to evaluate the features that were most informative for a fitted model. This class will inherit from ScoreVisualizer

@rebeccabilbro rebeccabilbro added level: intermediate python coding expertise required priority: medium can wait until after next release type: technical debt work to optimize or generalize code labels Oct 5, 2016
@rebeccabilbro
Copy link
Member Author

@rebeccabilbro rebeccabilbro self-assigned this Oct 5, 2016
@bbengfort bbengfort added type: feature a new visualizer or utility for yb and removed type: technical debt work to optimize or generalize code labels Oct 5, 2016
@bbengfort
Copy link
Member

@rebeccabilbro adding the comments I wrote on this from the paper. Also; I'd say this is a feature, not technical debt?

Generalized linear models compute a predicted independent variable via the linear combination of an array of coefficients with an array of dependent variables. GLMs are fit by modifying the coefficients so as to minimize error and regularization techniques specify how the model modifies coefficients in relation to each other. As a result, an opportunity presents itself: larger coefficients are necessarily "more informative" because they contribute a greater weight to the final prediction in most cases. Additionally we may say that instance features may also be more or less
"informative" depending on the product of the instance feature value with the feature coefficient. This creates two possibilities:

  1. We can compare models based on ranking of coefficients, such that a higher coefficient is "more informative".
  2. We can compare instances based on ranking of feature/coefficient products such that a higher product is "more informative".

In both cases, because the coefficient may be negative (indicating a strong negative correlation) we must rank features by the absolute values of their coefficients. Visualizing a model or multiple models by most informative feature is usually done via bar chart where the y-axis is the feature names and the x-axis is numeric value of the coefficient such that the x-axis has both a positive and negative quadrant. The bigger the size of the bar, the more informative that feature is.

This method may also be used for instances; but generally there are very many instances relative to the number models being compared. Instead a heatmap grid is a better choice to inspect the influence of features on individual instances. Here the grid is constructed such that the x-axis represents individual features, and the y-axis represents individual instances. The color of each cell (an instance, feature pair) represents the magnitude of the product of the instance value with the feature's coefficient for a single model. Visual inspection of this diagnostic may reveal a set of instances for which one feature is more predictive than another; or other types of regions of information in the model itself.

@rebeccabilbro
Copy link
Member Author

@bbengfort is that a way of saying that you want to take this one?

@bbengfort
Copy link
Member

bbengfort commented Oct 5, 2016

@rebeccabilbro just saying that I had been thinking about it and had written some notes; if you'd like to take a first crack at it - please feel free! I'm still pushing on Radviz and Paralllel coords (plus all the style and architecture stuff).

@bbengfort
Copy link
Member

Closed by #317

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
level: intermediate python coding expertise required priority: medium can wait until after next release type: feature a new visualizer or utility for yb
Projects
None yet
Development

No branches or pull requests

2 participants