The VERIS Community Database

Information sharing is a complex and challenging undertaking. If done correctly, everyone involved benefits from the collective intelligence. If done poorly, it may mislead participants or create a learning opportunity for our adversaries. The Verizon RISK Team supports and participates in a variety of information sharing initiatives and research efforts. We continue to drive the publication of the Verizon Data Breach Investigations Report (DBIR) annually, where we have an unprecedented number of new data-sharing partners, and we are committed to keeping the report publicly available and free to download. We regularly receive inquiries about our dataset, and our ability to share further, but we are limited in what data we can share in raw format due to agreements with our partners and customers.

The Problem

While there are a handful of efforts to capture security incidents that are publicly disclosed, there is no unrestricted, comprehensive raw dataset available for download on security incidents that is sufficiently rich to support both community research and corporate decision-making. There are organizations that collect—and in some form—disseminate aggregated collections, but they are either not in a format that lends itself to ease of data manipulation and transformation required for research, or the underlying data are not freely and publicly available for use. This gap has long hampered researchers who are studying the problems surrounding security incidents, as well as the risk managers who are starved for reliable data upon which to base their risk calculations.

Getting Involved

If you want to get involved in this project, we have directions in the wiki for this repo. If you are new to GitHub, it is the book icon to the top of this page section.

WARNING ON SAMPLING

Most VCDB issues are chosen randomly (with a preferences for those in the last year), however we specifically select healthcare issues and some priority incidents. Incidents not chosen randomly can be identified by the value of 'plus.sub_source'. It will be 'phidbr' for healthcare issues and 'priority' for priority issues. For those wishing to normalize out non-random selection, here is the issue composition as of Jan 13, 2018 to normalize the actuall incidents to:

{
'2013': {'all': 1199, 'phidbr': 0, 'priority': 11},
'2014': {'all': 3885, 'phidbr': 30, 'priority': 113},
'2015': {'all': 1844, 'phidbr': 197, 'priority': 47},
'2016': {'all': 1996, 'phidbr': 516, 'priority': 75},
'2017': {'all': 1826, 'phidbr': 455, 'priority': 75},
'2018': {'all': 28, 'phidbr': 8, 'priority': 1}
}

VCDB Statistics

As of Jan 13, 2018

vcdb %>%
    dplyr::group_by(attribute.confidentiality.data_disclosure.Yes) %>%
    dplyr::count(timeline.incident.year) %>%
    dplyr::ungroup() %>%
    dplyr::rename(breach = attribute.confidentiality.data_disclosure.Yes) %>%
    dplyr::mutate(breach = ifelse(breach, "Breach", "Incident")) %>%
  ggplot2::ggplot() + 
    ggplot2::geom_bar(ggplot2::aes(x=timeline.incident.year, y=n, group=breach, fill=breach), stat="identity") + 
    ggplot2::labs(title="VCDB Breaches and Incidents by Incident Year", x="Count", y="Year") +
    ggplot2::scale_x_continuous(expand=c(0,0), limits=c(2003, 2018)) +
    ggplot2::scale_y_continuous(expand=c(0,0)) +
    ggplot2::scale_fill_brewer() +
    ggplot2::theme_minimal() +
    ggplot2::theme(
        panel.grid.major.x = ggplot2::element_blank(),
        panel.grid.minor.x = ggplot2::element_blank(),
        panel.grid.minor.y = ggplot2::element_blank()
    )

vcdb %>%
    verisr::getenumCI("action", by="asset.variety") %>%
    dplyr::filter(!is.na(n)) %>%
    dplyr::mutate(by = stringr::str_sub(by, 15)) %>%
  ggplot2::ggplot() +
    ggplot2::geom_tile(ggplot2::aes(x=enum, y=by, fill=x)) +
    ggplot2::geom_text(ggplot2::aes(x=enum, y=by, label=x)) +
    ggplot2::scale_fill_gradient2() +
    ggplot2::theme_void() +
    ggplot2::theme(
        axis.text = ggplot2::element_text(),
        axis.text.x = ggplot2::element_text(hjust=1, angle=90)
    )

Index

vcdb_diff.json - An update to the verisc.json schema file to produce the schema file used for the vcdb
vcdb_diff-labels.json - An update to the verisc-labels.json labels file to produce the vcdb labels file
vcdb.json - The vcdb schema file
vcdb-labels.json - The vcdb labels file
vcdb-merged.json - The full schema, combining the schema file and enumerations from the labels file.
vcdb-enum.json - A json file containing just the enumerations from the schema.
vcdb-keynames-real.txt - A text file containing the keys in the vcdb schema.

Name		Name	Last commit message	Last commit date
Latest commit History 751 Commits
bin		bin
data		data
figure		figure
tools		tools
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
NEWS.md		NEWS.md
README.Rmd		README.Rmd
README.md		README.md
VCDB_Standard_Excel.xlsx		VCDB_Standard_Excel.xlsx
campaigns.md		campaigns.md
keynames-real.txt		keynames-real.txt
vcdb-enum.json		vcdb-enum.json
vcdb-keynames-real.txt		vcdb-keynames-real.txt
vcdb-labels.json		vcdb-labels.json
vcdb-merged.json		vcdb-merged.json
vcdb.json		vcdb.json
vcdb_diff-labels.json		vcdb_diff-labels.json
vcdb_diff.json		vcdb_diff.json

License

vz-risk/VCDB

Folders and files

Latest commit

History

Repository files navigation

The VERIS Community Database

The Problem

Getting Involved

WARNING ON SAMPLING

VCDB Statistics

Index

About

Resources

License

Stars

Watchers

Forks

Languages