Skip to content

Add probability and quantile-based bins as basic decision points #930

@ahouseholder

Description

@ahouseholder

Describe the solution you'd like

Many risk analysis methodologies produce probabilities or quantiles as their output. We can capture these in some basic decision points, for example:

  • probability bins based on words of estimative probability
  • probability bins based on equal division of the [0,1] interval (e.g., 2 or 5)
  • probability bins weighted toward higher resolution at the most likely end
  • probability bins weighted toward higher resolution at either extreme
  • median-split of quantiles (below or above median)
  • quartiles (1,2,3,4)
  • quintiles (1,2,3,4,5)

These will let us start to show folks how they can map their probability or quantile-based data into SSVC decision tables. (E.g., EPSS, possibly some applicability to FAIR, NIST 800-30)

Describe alternatives you've considered

Choosing probability bands or quantile bands is often met with resistance as being imprecise. Balancing against that argument we have to be curious about justifiable significant digits in the "more precise" measure. So here, we're making no assertions about whether banding is appropriate to any specific decision model. Instead, we're acknowledging that folks want to be able to use probability- and quantile-based information in their decision models, and we are providing multiple examples of how they might do so.

Additional context

This issue was prompted by a conversation about how to integrate EPSS into an SSVC decision model. The EPSS blog https://www.first.org/epss/articles/prob_percentile_bins

A third alternative to presenting EPSS probabilities is with categorical (ordinal) labels, such as "fix now / fix later," or "low, medium, high, critical." Bins provide a simple heuristic for users, and bypasses the cognitive effort required to process numerical distributions of values. Heuristics are important and useful mental shortcuts that we employ every day when making decisions, and can also be useful here.

However, there are a number of problems with binning. Bins are, by construction, subjective transformations of, in this case, a cardinal probability scale. And because the bins are subjectively defined, there is room for disagreement and misalignment across different users. There is no universal "right" answer to what the cut off should be between a high, and medium, or medium and low.

Moreover, arbitrary cutoffs force two scores, which may be separated by the tiniest of a value, to be labeled and then handled differently, despite there being no practical difference between them. For example, if two bins are set and the cutoff is set at 0.5, two vulnerabilities with probabilities of 0.499 and 0.501 would be treated just the same as two vulnerabilities with probabilities of 0.001 and 0.999. This kind of range compression is unavoidable and so any benefits from this kind of mental shortcut must be weighed against the information loss inevitable with binning.

For these reasons, EPSS does not currently bin EPSS scores using labels. However, the EPSS SIG always welcomes constructive feedback and suggestions for how to better present this information.

And yes, these are legitimate concerns for any sort of lossy-compression heuristic like binning. Nevertheless, it seems that providing folks with various options for binning might allow them to select something good enough to suit their decision problem.

We also observe that, if one were genuinely concerned about boundary errors and could tolerate the additional model complexity, one might use two decision points in conjunction: one for the bin, and one a boolean that could indicate whether the binned value was near a threshold.

Metadata

Metadata

Assignees

Labels

content/semanticChanges to the semantic content of the SSVC documentationenhancementNew feature or requesttech/backendBack-end tools, code, infrastructuretech/dataData implementation (content of /data, data object instances, etc.)

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions