-
-
Notifications
You must be signed in to change notification settings - Fork 558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add InformativeFeatures class #58
Comments
@rebeccabilbro adding the comments I wrote on this from the paper. Also; I'd say this is a feature, not technical debt? Generalized linear models compute a predicted independent variable via the linear combination of an array of coefficients with an array of dependent variables. GLMs are fit by modifying the coefficients so as to minimize error and regularization techniques specify how the model modifies coefficients in relation to each other. As a result, an opportunity presents itself: larger coefficients are necessarily "more informative" because they contribute a greater weight to the final prediction in most cases. Additionally we may say that instance features may also be more or less
In both cases, because the coefficient may be negative (indicating a strong negative correlation) we must rank features by the absolute values of their coefficients. Visualizing a model or multiple models by most informative feature is usually done via bar chart where the y-axis is the feature names and the x-axis is numeric value of the coefficient such that the x-axis has both a positive and negative quadrant. The bigger the size of the bar, the more informative that feature is. This method may also be used for instances; but generally there are very many instances relative to the number models being compared. Instead a heatmap grid is a better choice to inspect the influence of features on individual instances. Here the grid is constructed such that the x-axis represents individual features, and the y-axis represents individual instances. The color of each cell (an instance, feature pair) represents the magnitude of the product of the instance value with the feature's coefficient for a single model. Visual inspection of this diagnostic may reveal a set of instances for which one feature is more predictive than another; or other types of regions of information in the model itself. |
@bbengfort is that a way of saying that you want to take this one? |
@rebeccabilbro just saying that I had been thinking about it and had written some notes; if you'd like to take a first crack at it - please feel free! I'm still pushing on Radviz and Paralllel coords (plus all the style and architecture stuff). |
Closed by #317 |
Need to add a class to enable the user to evaluate the features that were most informative for a fitted model. This class will inherit from
ScoreVisualizer
The text was updated successfully, but these errors were encountered: