Where do I start Confidence

Introduction

Confidence details the degree of certainty of a given observation. For instance:

I am 80% confident that on 2015-03-20T00:00:01Z example.com is dropping malware
I am 90% confident that partner-1's observation that http://example.com/1.html on 2015-03-20T00:01:01Z was being used as a phishing url
I am 100% confident that tinyurl.com was observed in a piece of unsolicited commercial email (eg: spam).

One of the primary use cases for confidence is in the generation of threat intelligence feeds. For example, You may want to generate a de-duplicated feed of indicators seen within the last seven days with a confidence of 3.5 or higher to be used in a network sensor. While judging confidence may be subjective; there's one simple pattern that can narrow down the answer rather quickly:

would you trust the data author with root access on your firewall to block something? if no, it's not a 3 or higher.
is there a better than 50/50 chance (a coin flip) that there's something suspect about the data? if yes. it's a 2 or higher. if no, it's less than a 1 and almost does not matter.

From there, you can very easily get to a 3 or 4 depending on your risk tolerance. With the WDIS Feeds concept, whitelists are used to help further reduce the risk of blocking something like google.com. With that, generally a 3 or 4 is OK as long as the feeds are extremely specific about the risk (eg: ipv4|ipv6 addresses have a port-list, protocol and timestamp associated with them).

General Scale and Other Rules of Thumb

(4) Certain

highly vetted data by known, trusted security professionals
vetting relationship has been consistent for more than 2 years
very specific data (eg: ip+port+protocol, or a specific url, or malware hash)
can typically be used via traffic mitigation processes (null-routing, firewall DROP, etc) with very little risk in collateral damage.

(3) Very Confident

vetted data by known, trusted security professionals
data that has been vetted by a human or set of known and proven processes
vetting relationship has been consistent and in-place for at-least 1 year
data feed has been observed for at-least a year
data should be highly specific (eg: port/protocols, prefixes should be as narrow as possible)
can typically be used via traffic mitigation processes (null-routing, firewall DROP, etc) with very little risk in collateral damage.

(2) Not Confident

semi-vetted data by a security professional or trusted analytics process
data that has under-gone some either machine or human vetting (eg: checked against a whitelist automatically)
could be leveraged in traffic mitigation processes (eg: dns sink-holing), contains slight risk of collateral damage, but still severely mitigated by native whitelisting process.
machine generated data or enumerated data
some feeds might fall in the category if the author is lazy, or trying to cram too much into the feed
examples might include a domains list where the author is simply taking a botnet urls list and posting just the domains as a feed
carries risk when used in automatic mitigation processes

(0-1) Informational Data

machine generated / enumerated data
examples include:
auto-enumerated name-servers from domains
infrastructure resolved from domain data
carries significant risk when used in automatic mitigation processes

Support the Project!

FAQ
the CIFv4 Book
Hardware
Playbook's
- Ubuntu16LTS
Where do I start?
Development Guide
- Python SDK
- REST API
What Changed?
Getting Involved
Home

Provide feedback

Saved searches

Use saved searches to filter your results more quickly