-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add drop column importance to evaluation code? #309
Comments
Drop-column Importance Example code to implement per below, or also in ref above. |
Drop Column Importance makes sense as a measurement to me.
Although I don't know how to formalize this, I would assume the rough
calculus on collecting and using more decision points is that the
importance of the decision point is more than the cost of collecting,
processing, and analyzing the information necessary to reliably answer the
question.
I'm pointing this out because this is one reason why it's important to have
a measure of importance.
|
I've got an idea. Take the tree and break it down into decision points. Those decision points should have ending values, based on the decision. If there are 4 outcomes and 3 potential decisions group the 2 least action-oriented to the least troublesome decision, with the remaining 2 outcomes assigned to the remaining 2 decisions. Do this for each decision point in the tree. Try putting the tree (sort-of) back together and see if the outcomes line up with the original. Once you have the 3 decision options isolated, you can then test to see which one is the really important decision and which one(s) can be removed. |
FYI as part of the 'SSVC from scratch in code" I presented in EPSS SIG last Friday, I also did the Permutation Importance and Drop-column Importance for the inputs to the decision nodes (though I did not present that part) - in addition to the outputs that I did present on. In other words, I did Feature Importance (2 independent ways: Permutation and Drop Column) on 2 data sets:
In addition, it is possible to approach the problem from the other side i.e. given a set of these parameters, and already assigned decisions/priorities, let the tools build the tree from this training data (with a set of constraints provided). |
A talk by Chris Madden @Crashedmind about theparanoids/PrioritizedRiskRemediation mentioned that Drop Column Importance might be preferable to (or at least augment) permutation importance, which we use in
src/analyze_csv.py
.We should look into that.
The text was updated successfully, but these errors were encountered: