Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add drop column importance to evaluation code? #309

Closed
ahouseholder opened this issue Sep 15, 2023 · 4 comments · Fixed by #327
Closed

Add drop column importance to evaluation code? #309

ahouseholder opened this issue Sep 15, 2023 · 4 comments · Fixed by #327
Assignees
Labels
enhancement New feature or request tools Software Tools
Milestone

Comments

@ahouseholder
Copy link
Contributor

A talk by Chris Madden @Crashedmind about theparanoids/PrioritizedRiskRemediation mentioned that Drop Column Importance might be preferable to (or at least augment) permutation importance, which we use in src/analyze_csv.py.

We should look into that.

@Crashedmind
Copy link

Drop-column Importance
"The idea is to get a baseline performance score as with permutation importance but then drop a column entirely, retrain the model, and recompute the performance score. The importance value of a feature is the difference between the baseline and the score from the model missing that feature. This strategy answers the question of how important a feature is to overall model performance even more directly than the permutation importance strategy." and addresses some of the limitations of Permutation Importance per https://explained.ai/rf-importance/#5

Example code to implement per below, or also in ref above.
https://gist.github.com/erykml/6854134220276b1a50862aa486a44192#file-drop_col_feat_imp-py

@j---
Copy link
Collaborator

j--- commented Sep 18, 2023 via email

@laurie-tyz
Copy link
Contributor

I've got an idea. Take the tree and break it down into decision points. Those decision points should have ending values, based on the decision. If there are 4 outcomes and 3 potential decisions group the 2 least action-oriented to the least troublesome decision, with the remaining 2 outcomes assigned to the remaining 2 decisions.

Do this for each decision point in the tree. Try putting the tree (sort-of) back together and see if the outcomes line up with the original. Once you have the 3 decision options isolated, you can then test to see which one is the really important decision and which one(s) can be removed.

Decision Points.pptx

@Crashedmind
Copy link

Crashedmind commented Sep 18, 2023

FYI as part of the 'SSVC from scratch in code" I presented in EPSS SIG last Friday, I also did the Permutation Importance and Drop-column Importance for the inputs to the decision nodes (though I did not present that part) - in addition to the outputs that I did present on.

In other words, I did Feature Importance (2 independent ways: Permutation and Drop Column) on 2 data sets:

  1. The Decision Node outputs: Exploitation Active/PoC/None, Automatable Y/N, Technical Impact Total/Partial
  2. The Decision Node inputs
attackComplexity attackVector userInteraction privilegesRequired confidentialityImpact integrityImpact availabilityImpact metasploit nuclei cisa_kev exploitdb MissionWellBeing baseScore epss

In addition, it is possible to approach the problem from the other side i.e. given a set of these parameters, and already assigned decisions/priorities, let the tools build the tree from this training data (with a set of constraints provided).
https://en.wikipedia.org/wiki/Chi-square_automatic_interaction_detection algorithm would be where I'd start because it supports multiway splits i.e. the current Coordinator tree has 2 and 3 splits e.g. Exploitation has Active/PoC/None and many algorithms only support binary/2 splits.
BUT this would require a reasonably large dataset to train on.

image

@ahouseholder ahouseholder added this to the SSVC 2023Q4 milestone Sep 27, 2023
@ahouseholder ahouseholder self-assigned this Oct 2, 2023
@ahouseholder ahouseholder added enhancement New feature or request tools Software Tools labels Oct 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request tools Software Tools
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants