Add drop column importance to evaluation code? #309

ahouseholder · 2023-09-15T15:58:29Z

A talk by Chris Madden @Crashedmind about theparanoids/PrioritizedRiskRemediation mentioned that Drop Column Importance might be preferable to (or at least augment) permutation importance, which we use in src/analyze_csv.py.

We should look into that.

The text was updated successfully, but these errors were encountered:

Crashedmind · 2023-09-15T17:06:32Z

Drop-column Importance
"The idea is to get a baseline performance score as with permutation importance but then drop a column entirely, retrain the model, and recompute the performance score. The importance value of a feature is the difference between the baseline and the score from the model missing that feature. This strategy answers the question of how important a feature is to overall model performance even more directly than the permutation importance strategy." and addresses some of the limitations of Permutation Importance per https://explained.ai/rf-importance/#5

Example code to implement per below, or also in ref above.
https://gist.github.com/erykml/6854134220276b1a50862aa486a44192#file-drop_col_feat_imp-py

j--- · 2023-09-18T15:25:41Z

Drop Column Importance makes sense as a measurement to me. Although I don't know how to formalize this, I would assume the rough calculus on collecting and using more decision points is that the importance of the decision point is more than the cost of collecting, processing, and analyzing the information necessary to reliably answer the question. I'm pointing this out because this is one reason why it's important to have a measure of importance.

laurie-tyz · 2023-09-18T19:11:35Z

I've got an idea. Take the tree and break it down into decision points. Those decision points should have ending values, based on the decision. If there are 4 outcomes and 3 potential decisions group the 2 least action-oriented to the least troublesome decision, with the remaining 2 outcomes assigned to the remaining 2 decisions.

Do this for each decision point in the tree. Try putting the tree (sort-of) back together and see if the outcomes line up with the original. Once you have the 3 decision options isolated, you can then test to see which one is the really important decision and which one(s) can be removed.

Decision Points.pptx

Crashedmind · 2023-09-18T19:39:08Z

FYI as part of the 'SSVC from scratch in code" I presented in EPSS SIG last Friday, I also did the Permutation Importance and Drop-column Importance for the inputs to the decision nodes (though I did not present that part) - in addition to the outputs that I did present on.

In other words, I did Feature Importance (2 independent ways: Permutation and Drop Column) on 2 data sets:

The Decision Node outputs: Exploitation Active/PoC/None, Automatable Y/N, Technical Impact Total/Partial
The Decision Node inputs

attackComplexity	attackVector	userInteraction	privilegesRequired	confidentialityImpact	integrityImpact	availabilityImpact	metasploit	nuclei	cisa_kev	exploitdb	MissionWellBeing	baseScore	epss

In addition, it is possible to approach the problem from the other side i.e. given a set of these parameters, and already assigned decisions/priorities, let the tools build the tree from this training data (with a set of constraints provided).
https://en.wikipedia.org/wiki/Chi-square_automatic_interaction_detection algorithm would be where I'd start because it supports multiway splits i.e. the current Coordinator tree has 2 and 3 splits e.g. Exploitation has Active/PoC/None and many algorithms only support binary/2 splits.
BUT this would require a reasonably large dataset to train on.

ahouseholder added this to the SSVC 2023Q4 milestone Sep 27, 2023

ahouseholder mentioned this issue Oct 2, 2023

add drop column importance #327

Merged

ahouseholder self-assigned this Oct 2, 2023

ahouseholder added enhancement New feature or request tools Software Tools labels Oct 2, 2023

ahouseholder closed this as completed in #327 Oct 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add drop column importance to evaluation code? #309

Add drop column importance to evaluation code? #309

ahouseholder commented Sep 15, 2023

Crashedmind commented Sep 15, 2023

j--- commented Sep 18, 2023 via email

laurie-tyz commented Sep 18, 2023

Crashedmind commented Sep 18, 2023 •

edited

Add drop column importance to evaluation code? #309

Add drop column importance to evaluation code? #309

Comments

ahouseholder commented Sep 15, 2023

Crashedmind commented Sep 15, 2023

j--- commented Sep 18, 2023 via email

laurie-tyz commented Sep 18, 2023

Crashedmind commented Sep 18, 2023 • edited

Crashedmind commented Sep 18, 2023 •

edited