This is the data and the notebooks (code + documentation) for the analysis that is part of the EPSS Applied Guide.
Data Source | Detail | ~~ CVE count K | Directory |
---|---|---|---|
CISA KEV | Active Exploitation | 1 | cisa_kev |
EPSS | Predictor of Exploitation | 220 | epss |
Metasploit modules | Weaponized Exploit | 3 | metasploit |
Nuclei templates | Weaponized Exploit | 2 | nuclei |
ExploitDB | Published Exploit Code | 25 | exploitdb |
NVD CVE Data | NVD CVEs | 220 | nvd |
Qualys TruRisk Report | The 2023 Qualys TruRisk research report lists 190 CVEs from 2022 with QVS scores | .2 | qualys |
Microsoft Security Response Center (MSRC) | CVEs Exploited and with Exploitability Assessment | .2 | msrc |
- get_data.sh gets the data that can be downloaded automatically and used as-is.
- Other data is manually downloaded - see instructions below.
- MSRC
- ExploitDB
- GPZ
- Larger files are gzip'd
- A date.txt file is included in each folder with the data that contains the date of downloaded.
Get NVD data automatically
- A notebook or script in nvd downloads the NVD data.
- The data is ouput to data_out/CVSSData.csv.gz
- Note: The download method used will be deprecated some time after Dec 2023 per https://nvd.nist.gov/vuln/data-feeds
See 0day "In the Wild" GoogleSheet
- Select "All" tab.
- File - Download as csv
- Go to https://msrc.microsoft.com/update-guide/vulnerability
- Edit columns - ensure these columns are selected "Exploitability Assessment" and "Exploited"
- Download
The CVE data was extracted from the Qualys TruRisk Report PDF using standard tools like sed. This data is static so a date.txt is not included.
- Download https://gitlab.com/exploit-database/exploitdb/-/blob/main/files_exploits.csv (manually for now - credentials required for automation)
- Extract the CVEs using the script in the directory i.e. some entries don't have CVEs - and have only Open Source Vulnerability Database (OSVDB) entries instead.
Other data sources to consider - these are not currently used here:
- https://github.com/trickest/cve for a list of CVE PoCs
- enrich_cves.ipynb
- Take the data sources from data_in/
- Enrich the CVE data from NVD with the other data sources
- Add an "Exploit" column to indicate the source of the exploitability (used later to set colors of CVE data in plots)
- store the output in data_out/nvd_cves_v3_enriched.csv.gz
- kev_epss_cvss.ipynb
- Read the enriched CVE data from data_out/CVSSData_enriched.csv.gz
- Read the data from CISA KEV alert reports in ./data_in/cisa_kev/
- Plot CISA KEV datasets showing EPSS, CVSS by source of the exploitability
- Write data_out/cisa_kev/csa/csa.csv.gz which is the CISA KEV CysberSecurity Alerts (CSA) subset with EPSS and other data
- qualys.ipynb
- Read the enriched CVE data from data_out/CVSSData_enriched.csv.gz
- Read the data from ./data_in/qualys
- Plot Qualys dataset showing EPSS, CVSS by source of the exploitability
- Write data_out/qualys/qualys.csv.gz which is the Qualys data with EPSS and other data
- msrc.ipynb
- Read the enriched CVE data from data_out/CVSSData_enriched.csv.gz
- Read the data from ./data_in/msrc
- Plot Microsoft Exploitability Index dataset showing EPSS, CVSS by source of the exploitability
- Write data_out/msrc/msrc.csv.gz which is the MSEI data with EPSS and other data
- DT_from_scratch.ipynb
- Read the enriched CVE data from data_out/CVSSData_enriched.csv.gz
- Read the Decision Tree definition cisa_ssvc_dt/DT_rbp.csv
- Define the Decision Logic for the Decision Nodes
- Calculate the Decision Node Values for all CVEs
- Do some Exploratory Data Analysis with Venn Diagrams to understand our data
- Calculate the Output Decision from the Decision Node Values
- Plot Flow of All CVES across the Decision Tree aka Sankey
- Read the Sankey Diagram template definition cisa_ssvc_dt/DT_sankey.csv
- Triage some CVEs
- Read a list of CVEs to triage cisa_ssvc_dt/triage/cves2triage.csv
- Get Decisions
- Plot
- DT_analysis.ipynb
- Read the Decision Tree definition cisa_ssvc_dt/DT_rbp.csv
- Perform Feature Importance using 2 methods
- Permutation Importance
- Drop-column Importance
See CERTCC/SSVC#309 for the suggestion to add drop column importance to CISA SSVC.