# CVE Mitre Introduction

Vulnerabilities data are available in three different sources: **CVE Mitre, NVD and CVE Details**, being created and annotated through the data sources in this respective order. Launched in 1999 when most information security tools used their own databases with their own names for security vulnerabilities, the Common Vulnerabilities and Exposures (CVE) by Mitre documents known vulnerabilities manually for public usage. Each vulnerability contains a description, is uniquely identified by a CVE ID, and may also include fields specifying the vulnerable software, version and vendors affected by it. If a set of vulnerabilities are similar, but occur for different software, they can have different CVE-IDs, and contain the same weakness ID (CWE ID). When created by CVE Mitre, each vulnerability may or not be annotated with a weakness ID(CWE ID),but when available they can serve to group similar vulnerabilities conceptually,and observe how they have been ‘instantiated’ in different software, version or vendor. CVE Mitre’s vulnerabilities are then annotated with severity scores, fix information, and impact ratings in the National Vulnerability Database(NVD),and made available for download as XML feeds. CVE Details was created to provide a user-friendly interface to NVD’s XML feeds. For instance, using vulnerabilities’ CWE IDs and keyword matching, it defines 13 vulnerability types to facilitate browsing vulnerabilities. Since CVE Details warns about inconsistencies in NVD XML Feeds (e.g.same vendor’s software having different names), and irrelevant entries to our purposes (i.e. reserved, duplicates and removed entries), we downloaded all software vulnerabilities to date from the three sources to define our vulnerability dataset and ensure consistency.

## Motivation

The CVE Mitre database has information about the reference (or the source) of the vulnerability. There are various sources, and the database provides information about the Url it is reported from the description of the attack (with an ID associated). It is important to identify the right sources of vulnerabilities and this notebook aims to help choose the sources and and filter the chosen ones into a new file. 

# Method

The files provided by the CVE Mitre website are in CVRF(XML)format and can be found http://cve.mitre.org/data/downloads/index.html . The XML schema is built such that it encapsulates tables within a table. We will parse through the tree to reach the required child node and perform pattern matching using regular expressions. This will enable us to extract the right fields and write it onto a file(file1). The other unfiltered sources are writen into another file(file2), from where they can be fetched if felt they are to be considered.

In [1]:
#import Element tree for parsing xml
from xml.etree.ElementTree import ElementTree 
import csv
import re
import glob

#parsing the tree and fetching root node
table_root = "{http://www.icasi.org/CVRF/schema/vuln/1.1}"

In [2]:
#creating variables for regex search
BID_regex = "BID:(\d+)"
SECTRACK_regex = "SECTRACK:(\d+)"
MS_regex = "MS:[A-Z]*[0-9]*[-][0-9]*"

#creating headers for file
header_file1= ["CVE ID","BID Description","BID Url","SECTRACK Description","SECTRACK Url","MS Description","MS Url"]
header_file2= ["CVE ID","Reference Description","Reference Url"]

#creating list to hold data for file.write into file
file1_data = []
file2_data = []

In [3]:
#write into file currently holding references
def write_file(filename,data,header):
            with open(filename , 'w') as file:
                writer = csv.DictWriter(file, fieldnames = header)
                writer.writeheader()
                for value in data:
                    try:
                        writer.writerow(value)
                    except UnicodeEncodeError:
                        print ("Unicode error in these values : " + str(value));
                        writer.writerow({k:v.encode('utf8') for k,v in value.items()})

In [4]:
def reference_sort(data):
    #data[1] holds references
    if data[1] is not None:
        for child in data[1].findall(table_root+"Reference"):
            file1 = {}
            file2 = {}
            #re.search(regex,text)
            if re.search(BID_regex,child.find(table_root + "Description").text):
                file1["CVE ID"] = data[0].text
                file1["BID Url"] = child.find(table_root + "URL").text
                file1["BID Description"] = child.find(table_root + "Description").text
                file1_data.append(file1)
            elif re.search(SECTRACK_regex,child.find(table_root + "Description").text):
                file1["CVE ID"] = data[0].text
                file1["SECTRACK Url"] = child.find(table_root + "URL").text
                file1["SECTRACK Description"] = child.find(table_root + "Description").text
                file1_data.append(file1)
            elif re.search(MS_regex,child.find(table_root + "Description").text):
                file1["CVE ID"] = data[0].text
                file1["MS Url"] = child.find(table_root + "URL").text
                file1["MS Description"] = child.find(table_root + "Description").text
                file1_data.append(file1)
            else:
                file2["CVE ID"] = data[0].text
                file2["Reference Description"] = child.find(table_root + "Description").text
                file2["Reference Url"] = child.find(table_root + "URL").text
                file2_data.append(file2)

In [5]:
#verify write operation into file and perform reference sort
def module_runner(cve_Tree):
    print("Vulnerability data count : " + str(len(CVE_tree.findall(table_root+"Vulnerability")))); 
    v_counter = 0
    for vul in CVE_tree.findall(table_root+"Vulnerability"):
        print ("Vulnerability index: " + str(v_counter));
        v_counter +=1
        reference_sort((vul.find(table_root + "CVE"),vul.find(table_root + "References")))
        write_file('file1.csv', file1_data, header_file1)
        write_file('file2.csv', file2_data, header_file2)

In [6]:
#call module runner to perform parsing
for filename in glob.glob("CVE_XML/*.xml"):
        print ("parsing file " + str(filename) + "....");
        CVE_tree = ElementTree()
        CVE_tree.parse(filename)
        module_runner(CVE_tree)

parsing file CVE_XML\allitems-cvrf-year-2004.xml....
Vulnerability data count : 2778
Vulnerability index: 0
Vulnerability index: 1
Vulnerability index: 2
Vulnerability index: 3
Vulnerability index: 4
Vulnerability index: 5
Vulnerability index: 6
Vulnerability index: 7
Vulnerability index: 8
Vulnerability index: 9
Vulnerability index: 10
Vulnerability index: 11
Vulnerability index: 12
Vulnerability index: 13
Vulnerability index: 14
Vulnerability index: 15
Vulnerability index: 16
Vulnerability index: 17
Vulnerability index: 18
Vulnerability index: 19
Vulnerability index: 20
Vulnerability index: 21
Vulnerability index: 22
Vulnerability index: 23
Vulnerability index: 24
Vulnerability index: 25
Vulnerability index: 26
Vulnerability index: 27
Vulnerability index: 28
Vulnerability index: 29
Vulnerability index: 30
Vulnerability index: 31
Vulnerability index: 32
Vulnerability index: 33
Vulnerability index: 34
Vulnerability index: 35
Vulnerability index: 36
Vulnerability index: 37
Vulnerability

KeyboardInterrupt: 