# Working with Known IoT-Related CVES: A New Direction
This notebook is dedicated to cleaning up the MITRE list of known IoT-related CVEs, creating a dataframe from nation-state attack data, and merging both of these with a cleaned-up version of the CVE data agregated in the `APT_IoT_CVE_EDA` notebook. The resulting dataset is then saved to both CSV and parquet file types for easy reading/preprocessing.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
sheet_names = pd.ExcelFile('../data/MITRE/MITRE_2024_IoT_CVEs.xlsx').sheet_names
print(f'The MITRE IoT CVEs Excel spreadsheet has the following sheets: {sheet_names}')

The MITRE IoT CVEs Excel spreadsheet has the following sheets: ['2024 IoT CVEs', '2020-2024 CVEs', '2019-2024 CVEs']


In [3]:
df2019_2024 = pd.read_excel('../data/MITRE/MITRE_2024_IoT_CVEs.xlsx', sheet_name='2019-2024 CVEs')

In [4]:
df2020_2024 = pd.read_excel('../data/MITRE/MITRE_2024_IoT_CVEs.xlsx', sheet_name='2020-2024 CVEs')

In [5]:
df2024 = pd.read_excel('../data/MITRE/MITRE_2024_IoT_CVEs.xlsx', sheet_name='2024 IoT CVEs')

In [6]:
def shape_of(name, df):
    rows = len(df)
    cols = len(df.columns)
    print(f'"{name}" has {rows} rows and {cols} columns.')

shape_of('2019-2024 CVES', df2019_2024)
shape_of('2020-2024 CVES', df2020_2024)
shape_of('2024 IoT CVES', df2024)

"2019-2024 CVES" has 1088 rows and 2 columns.
"2020-2024 CVES" has 714 rows and 2 columns.
"2024 IoT CVES" has 24 rows and 2 columns.


In [7]:
df2019_2024.head(3)

Unnamed: 0,CVE-2024-38089,Microsoft Defender for IoT Elevation of Privilege Vulnerability
0,CVE-2024-29195,The azure-c-shared-utility is a C library for ...
1,CVE-2024-29055,Microsoft Defender for IoT Elevation of Privil...
2,CVE-2024-29054,Microsoft Defender for IoT Elevation of Privil...


In [8]:
df2020_2024.head(3)

Unnamed: 0,CVE-2024-38089,Microsoft Defender for IoT Elevation of Privilege Vulnerability
0,CVE-2024-29195,The azure-c-shared-utility is a C library for ...
1,CVE-2024-29055,Microsoft Defender for IoT Elevation of Privil...
2,CVE-2024-29054,Microsoft Defender for IoT Elevation of Privil...


In [9]:
df2024.head(3)

Unnamed: 0,2024: MITRE - IoT CVEs,Unnamed: 1
0,CVE-2024-38089,Microsoft Defender for IoT Elevation of Privil...
1,CVE-2024-29195,The azure-c-shared-utility is a C library for ...
2,CVE-2024-29055,Microsoft Defender for IoT Elevation of Privil...


The column names of these datasets are themselves observations. They'll have to be pushed down into the dataset and replaced with accurate column names.

In [10]:
def add_cols_as_obs(df):
    current_col_names = df.columns.tolist() # Grab current column names
    df.loc[-1] = current_col_names # Set the column names equal to a row
    df.index = df.index + 1 # Shift the index
    df = df.sort_index() # Sort the index
    df = df.reset_index(drop=True)
    return df

df2019_2024 = add_cols_as_obs(df2019_2024)
df2020_2024 = add_cols_as_obs(df2020_2024)

In [11]:
# Rename columns names
df2019_2024 = df2019_2024.rename(columns={
    'CVE-2024-38089': 'cve_id',
    'Microsoft Defender for IoT Elevation of Privilege Vulnerability': 'description'
})

df2020_2024 = df2020_2024.rename(columns={
    'CVE-2024-38089': 'cve_id',
    'Microsoft Defender for IoT Elevation of Privilege Vulnerability': 'description'
})

df2024 = df2024.rename(columns={
    '2024: MITRE - IoT CVEs': 'cve_id',
    'Unnamed: 1': 'description'
})

In [12]:
df2019_2024.head(3) 

Unnamed: 0,cve_id,description
0,CVE-2024-38089,Microsoft Defender for IoT Elevation of Privil...
1,CVE-2024-29195,The azure-c-shared-utility is a C library for ...
2,CVE-2024-29055,Microsoft Defender for IoT Elevation of Privil...


In [13]:
df2020_2024.head(3)

Unnamed: 0,cve_id,description
0,CVE-2024-38089,Microsoft Defender for IoT Elevation of Privil...
1,CVE-2024-29195,The azure-c-shared-utility is a C library for ...
2,CVE-2024-29055,Microsoft Defender for IoT Elevation of Privil...


In [14]:
df2024.head(3)

Unnamed: 0,cve_id,description
0,CVE-2024-38089,Microsoft Defender for IoT Elevation of Privil...
1,CVE-2024-29195,The azure-c-shared-utility is a C library for ...
2,CVE-2024-29055,Microsoft Defender for IoT Elevation of Privil...


In [15]:
print(f'"df2019-2024" has {df2019_2024.duplicated().sum()} duplicate observations.')
print(f'"df2020-2024" has {df2020_2024.duplicated().sum()} duplicate observations.')
print(f'"df2024" has {df2024.duplicated().sum()} duplicate observations.')

"df2019-2024" has 0 duplicate observations.
"df2020-2024" has 0 duplicate observations.
"df2024" has 0 duplicate observations.


In [16]:
print(f'"df2019-2024" has {df2019_2024.isnull().sum().tolist()} null values.')
print(f'"df2020-2024" has {df2020_2024.isnull().sum().tolist()} null values.')
print(f'"df2024" has {df2024.isnull().sum().tolist()} null values.')

"df2019-2024" has [0, 0] null values.
"df2020-2024" has [0, 0] null values.
"df2024" has [0, 0] null values.


In [17]:
# Check whether the smaller datasets exist in the larger datasets
df2019_2024_set = set([tuple(row) for row in df2019_2024.values])
df2020_2024_set = set([tuple(row) for row in df2020_2024.values])
df2024_set = set([tuple(row) for row in df2024.values])

print(f'All observations in "df2020_2024" appear in df2019_2024: {df2020_2024_set.issubset(df2019_2024_set)}')
print(f'All observations in "df2024" appear in df2020_2024: {df2024_set.issubset(df2020_2024_set)}')

All observations in "df2020_2024" appear in df2019_2024: True
All observations in "df2024" appear in df2020_2024: True


All of the observations in `df2024` exist in `df2020_2024` and all of those observations exist in `df2019_2024`, so we'll use the largest dataset.

## Nation-State Attack Data
Next, I'll create a dataset from the nation-state attack information we've consolidated. We can then concatenate these two dataframes together.

In [18]:
# Create nation-state attack dataframe
nsa = {
    'attack': [
        'Mirai Botnet',
        'VPNFilter',
        'Triton/Trisis',
        'Iranian Cyberattacks on Water Systems',
        'Iranian APT Exploits on Fortinet Vulnerabilities',
        'Operation Shadowhammer',
        'Ripple20 Vulnerabilities',
        'Dragonfly/Energetic Bear Campaign 1',
        'Dragonfly/Energetic Bear Campaign 2',
        'Stuxnet',
        'Heartbleed Exploits',
        'BlackEnergy Attack on Ukraine',
        'Microsoft Exchange ProxyShell Exploits',
        'F5 BIG-IP Exploits',
        'Pulse Secure VPN Exploits',
        'Equifax Data Breach',
        'SolarWinds Orion Supply Chain Attack',
        'Not Petya Ransomware Attack',
        'WannaCry Ransomware Attack'
    ],
    'year_start': [
        2016,
        2018,
        2017,
        2020,
        2021,
        2018,
        2020,
        2013,
        2017,
        2018,
        2014,
        2015,
        2021,
        2021,
        2019,
        2017,
        2020,
        2017,
        2017
    ],
    'year_end': [
        2016,
        2018,
        2017,
        2020,
        2021,
        2019,
        2020,
        2014,
        2017,
        2018,
        2014,
        2015,
        2021,
        2021,
        2021,
        2017,
        2020,
        2017,
        2017
    ],
    'attribution_group': [
        pd.NA,
        'APT28 (Fancy Bear)',
        pd.NA,
        pd.NA,
        pd.NA,
        'APT41',
        pd.NA,
        'Dragonfly (Energetic Bear)',
        'Dragonfly (Energetic Bear)',
        pd.NA,
        pd.NA,
        'Sandworm',
        pd.NA,
        pd.NA,
        'APT5',
        'APT10',
        'APT29 (Cozy Bear)',
        'Sandworm',
        'Lazarus'
    ],
    'attribution_state': [
        [pd.NA],
        ['Russia', 'Russia', 'Russia', 'Russia', 'Russia'],
        ['Russia', 'Russia'],
        ['Iran'],
        ['Iran'],
        ['China'],
        [pd.NA, pd.NA, pd.NA, pd.NA],
        ['Russia'],
        ['Russia'],
        ['US', 'Israel'],
        ['China'],
        ['Russia'],
        ['China', 'China', 'China'],
        ['Russia', 'China'],
        ['China'],
        ['China'],
        ['Russia'],
        ['Russia'],
        ['DPRK']
    ],
    'cve_id': [
        [pd.NA],
        [
            'CVE-2018-14847',
            'CVE-2017-12074',
            'CVE-2018-10561',
            'CVE-2018-10562',
            'CVE-2017-8418',
        ],
        [
            'CVE-2017-7905',
            'CVE-2017-7921'
        ],
        [pd.NA],
        ['CVE-2018-13379'],
        ['CVE-2019-19781'],
        [
            'CVE-2020-11896',
            'CVE-2020-11898',
            'CVE-2020-11899',
            'CVE-2020-11901'
        ],
        [pd.NA],
        [pd.NA],
        [pd.NA, pd.NA],
        ['CVE-2014-0160'],
        [pd.NA],
        [
            'CVE-2021-34473',
            'CVE-2021-34523',
            'CVE-2021-31207'
        ],
        [
            'CVE-2020-5902',
            'CVE-2020-5902'
        ],
        ['CVE-2019-11510'],
        ['CVE-2017-5638'],
        [pd.NA],
        ['CVE-2017-0144'],
        ['CVE-2017-0144']
    ],
    'description': [
        [pd.NA],
        [
            'MikroTik RouterOS through 6.42 allows unauthenticated remote attackers to read arbitrary files and remote authenticated attackers to write arbitrary files due to a directory traversal vulnerability in the WinBox interface.',
            'Directory traversal vulnerability in the SYNO.DNSServer.Zone.MasterZoneConf in Synology DNS Server before 2.2.1-3042 allows remote authenticated attackers to write arbitrary files via the domain_name parameter.',
            'An issue was discovered on Dasan GPON home routers. It is possible to bypass authentication simply by appending "?images" to any URL of the device that requires authentication, as demonstrated by the /menu.html?images/ or /GponForm/diag_FORM?images/ URI. One can then manage the device.',
            "An issue was discovered on Dasan GPON home routers. Command Injection can occur via the dest_host parameter in a diag_action=ping request to a GponForm/diag_Form URI. Because the router saves ping results in /tmp and transmits them to the user when the user revisits /diag.html, it's quite simple to execute commands and retrieve their output.",
            'RuboCop 0.48.1 and earlier does not use /tmp in safe way, allowing local users to exploit this to tamper with cache files belonging to other users.'
        ],
        [
            'A Weak Cryptography for Passwords issue was discovered in General Electric (GE) Multilin SR 750 Feeder Protection Relay, firmware versions prior to Version 7.47; SR 760 Feeder Protection Relay, firmware versions prior to Version 7.47; SR 469 Motor Protection Relay, firmware versions prior to Version 5.23; SR 489 Generator Protection Relay, firmware versions prior to Version 4.06; SR 745 Transformer Protection Relay, firmware versions prior to Version 5.23; SR 369 Motor Protection Relay, all firmware versions; Multilin Universal Relay, firmware Version 6.0 and prior versions; and Multilin URplus (D90, C90, B95), all versions. Ciphertext versions of user passwords were created with a non-random initialization vector leaving them susceptible to dictionary attacks. Ciphertext of user passwords can be obtained from the front LCD panel of affected products and through issued Modbus commands.',
            'An Improper Authentication issue was discovered in Hikvision DS-2CD2xx2F-I Series V5.2.0 build 140721 to V5.4.0 build 160530, DS-2CD2xx0F-I Series V5.2.0 build 140721 to V5.4.0 Build 160401, DS-2CD2xx2FWD Series V5.3.1 build 150410 to V5.4.4 Build 161125, DS-2CD4x2xFWD Series V5.2.0 build 140721 to V5.4.0 Build 160414, DS-2CD4xx5 Series V5.2.0 build 140721 to V5.4.0 Build 160421, DS-2DFx Series V5.2.0 build 140805 to V5.4.5 Build 160928, and DS-2CD63xx Series V5.0.9 build 140305 to V5.3.5 Build 160106 devices. The improper authentication vulnerability occurs when an application does not adequately or correctly authenticate users. This may allow a malicious user to escalate his or her privileges on the system and gain access to sensitive information.'
        ],
        [pd.NA],
        ['An Improper Limitation of a Pathname to a Restricted Directory ("Path Traversal") in Fortinet FortiOS 6.0.0 to 6.0.4, 5.6.3 to 5.6.7 and 5.4.6 to 5.4.12 and FortiProxy 2.0.0, 1.2.0 to 1.2.8, 1.1.0 to 1.1.6, 1.0.0 to 1.0.7 under SSL VPN web portal allows an unauthenticated attacker to download system files via special crafted HTTP resource requests.'],
        ['An issue was discovered in Citrix Application Delivery Controller (ADC) and Gateway 10.5, 11.1, 12.0, 12.1, and 13.0. They allow Directory Traversal.'],
        [
            'The Treck TCP/IP stack before 6.0.1.66 allows Remote Code Execution, related to IPv4 tunneling.',
            'The Treck TCP/IP stack before 6.0.1.66 improperly handles an IPv4/ICMPv4 Length Parameter Inconsistency, which might allow remote attackers to trigger an information leak.',
            'The Treck TCP/IP stack before 6.0.1.66 has an IPv6 Out-of-bounds Read.',
            'The Treck TCP/IP stack before 6.0.1.66 allows Remote Code execution via a single invalid DNS response.'
        ],
        [pd.NA],
        [pd.NA],
        [pd.NA, pd.NA],
        ['The (1) TLS and (2) DTLS implementations in OpenSSL 1.0.1 before 1.0.1g do not properly handle Heartbeat Extension packets, which allows remote attackers to obtain sensitive information from process memory via crafted packets that trigger a buffer over-read, as demonstrated by reading private keys, related to d1_both.c and t1_lib.c, aka the Heartbleed bug.'],
        [pd.NA],
        [
            'Microsoft Exchange Server Remote Code Execution Vulnerability',
            'Microsoft Exchange Server Elevation of Privilege Vulnerability',
            'Microsoft Exchange Server Security Feature Bypass Vulnerability'
        ],
        [
            'In BIG-IP versions 15.0.0-15.1.0.3, 14.1.0-14.1.2.5, 13.1.0-13.1.3.3, 12.1.0-12.1.5.1, and 11.6.1-11.6.5.1, the Traffic Management User Interface (TMUI), also referred to as the Configuration utility, has a Remote Code Execution (RCE) vulnerability in undisclosed pages.',
            'In BIG-IP versions 15.0.0-15.1.0.3, 14.1.0-14.1.2.5, 13.1.0-13.1.3.3, 12.1.0-12.1.5.1, and 11.6.1-11.6.5.1, the Traffic Management User Interface (TMUI), also referred to as the Configuration utility, has a Remote Code Execution (RCE) vulnerability in undisclosed pages.'
        ],
        ['In Pulse Secure Pulse Connect Secure (PCS) 8.2 before 8.2R12.1, 8.3 before 8.3R7.1, and 9.0 before 9.0R3.4, an unauthenticated remote attacker can send a specially crafted URI to perform an arbitrary file reading vulnerability.'],
        ['The Jakarta Multipart parser in Apache Struts 2 2.3.x before 2.3.32 and 2.5.x before 2.5.10.1 has incorrect exception handling and error-message generation during file-upload attempts, which allows remote attackers to execute arbitrary commands via a crafted Content-Type, Content-Disposition, or Content-Length HTTP header, as exploited in the wild in March 2017 with a Content-Type header containing a #cmd= string.'],
        [pd.NA],
        ['The SMBv1 server in Microsoft Windows Vista SP2; Windows Server 2008 SP2 and R2 SP1; Windows 7 SP1; Windows 8.1; Windows Server 2012 Gold and R2; Windows RT 8.1; and Windows 10 Gold, 1511, and 1607; and Windows Server 2016 allows remote attackers to execute arbitrary code via crafted packets, aka "Windows SMB Remote Code Execution Vulnerability." This vulnerability is different from those described in CVE-2017-0143, CVE-2017-0145, CVE-2017-0146, and CVE-2017-0148.'],
        ['The SMBv1 server in Microsoft Windows Vista SP2; Windows Server 2008 SP2 and R2 SP1; Windows 7 SP1; Windows 8.1; Windows Server 2012 Gold and R2; Windows RT 8.1; and Windows 10 Gold, 1511, and 1607; and Windows Server 2016 allows remote attackers to execute arbitrary code via crafted packets, aka "Windows SMB Remote Code Execution Vulnerability." This vulnerability is different from those described in CVE-2017-0143, CVE-2017-0145, CVE-2017-0146, and CVE-2017-0148.']
    ]
}

df_nsa = pd.DataFrame(nsa)


In [19]:
df_nsa.head(3)

Unnamed: 0,attack,year_start,year_end,attribution_group,attribution_state,cve_id,description
0,Mirai Botnet,2016,2016,,[<NA>],[<NA>],[<NA>]
1,VPNFilter,2018,2018,APT28 (Fancy Bear),"[Russia, Russia, Russia, Russia, Russia]","[CVE-2018-14847, CVE-2017-12074, CVE-2018-1056...",[MikroTik RouterOS through 6.42 allows unauthe...
2,Triton/Trisis,2017,2017,,"[Russia, Russia]","[CVE-2017-7905, CVE-2017-7921]",[A Weak Cryptography for Passwords issue was d...


In [20]:
df_nsa.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19 entries, 0 to 18
Data columns (total 7 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   attack             19 non-null     object
 1   year_start         19 non-null     int64 
 2   year_end           19 non-null     int64 
 3   attribution_group  10 non-null     object
 4   attribution_state  19 non-null     object
 5   cve_id             19 non-null     object
 6   description        19 non-null     object
dtypes: int64(2), object(5)
memory usage: 1.2+ KB


## Exploding the Data
With the attack data aggregated and processed into a (dirty) dataset, we have to look at the relationships between lists in list-containing columns. Since there is a one-to-one relationship between `cve_id` and their `description`, we'll explode these columns simultaneously. Only then will we explode the lists in the `attribution_state` column, since we don't want to create false relationships that suggest that, within the context of a single attack, Nation A used CVE A while Nation B used CVE B, when in fact we don't know. Ultimately, we have to represent the situation as both nations having used both CVEs. I created the dictionary object knowing how Pandas needs our observation's lists aligned, so we can avoid we what had to do for the CVE and CWE lists in terms of normalizing their content lengths.

In [21]:
# Explode the nation-state attack data
df_nsa = df_nsa.explode(['cve_id', 'description'])
df_nsa = df_nsa.explode('attribution_state')

In [22]:
# Drop duplicate observations
df_nsa = df_nsa.drop_duplicates()

## Merging with IoT CVE Data

In [23]:
df = df2019_2024.merge(
    df_nsa,
    on=['cve_id', 'description'],
    how='outer'
)

In [24]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1120 entries, 0 to 1119
Data columns (total 7 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   cve_id             1112 non-null   object 
 1   description        1112 non-null   object 
 2   attack             31 non-null     object 
 3   year_start         31 non-null     float64
 4   year_end           31 non-null     float64
 5   attribution_group  14 non-null     object 
 6   attribution_state  26 non-null     object 
dtypes: float64(2), object(5)
memory usage: 61.4+ KB


In [25]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1120 entries, 0 to 1119
Data columns (total 7 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   cve_id             1112 non-null   object 
 1   description        1112 non-null   object 
 2   attack             31 non-null     object 
 3   year_start         31 non-null     float64
 4   year_end           31 non-null     float64
 5   attribution_group  14 non-null     object 
 6   attribution_state  26 non-null     object 
dtypes: float64(2), object(5)
memory usage: 61.4+ KB


## Adding CVEs from the Check Point Article
[Revisit the article here.](https://blog.checkpoint.com/security/the-tipping-point-exploring-the-surge-in-iot-cyberattacks-plaguing-the-education-sector/)

In [26]:
# New observations from article
cp = {
    'cve_id': [
        'CVE-2015-2051',
        'CVE-2016-6277',
        'CVE-2022-37061'
    ],
    'description': [
        'The D-Link DIR-645 Wired/Wireless Router Rev. Ax with firmware 1.04b12 and earlier allows remote attackers to execute arbitrary commands via a GetDeviceSettings action to the HNAP interface.',
        'NETGEAR R6250 before 1.0.4.6.Beta, R6400 before 1.0.1.18.Beta, R6700 before 1.0.1.14.Beta, R6900, R7000 before 1.0.7.6.Beta, R7100LG before 1.0.0.28.Beta, R7300DST before 1.0.0.46.Beta, R7900 before 1.0.1.8.Beta, R8000 before 1.0.3.26.Beta, D6220, D6400, D7000, and possibly other routers allow remote attackers to execute arbitrary commands via shell metacharacters in the path info to cgi-bin/.',
        'All FLIR AX8 thermal sensor cameras version up to and including 1.46.16 are vulnerable to Remote Command Injection. This can be exploited to inject and execute arbitrary shell commands as the root user through the id HTTP POST parameter in the res.php endpoint. A successful exploit could allow the attacker to execute arbitrary commands on the underlying operating system with the root privileges.'
    ],
    'attack': [pd.NA, pd.NA, pd.NA],
    'year_start': [pd.NA, pd.NA, pd.NA],
    'year_end': [pd.NA, pd.NA, pd.NA],
    'attribution_group': [pd.NA, pd.NA, pd.NA],
    'attribution_state': [pd.NA, pd.NA, pd.NA]
}

# Convert new data to dataframe
df_cp = pd.DataFrame(cp)

# Concatenate df and df_cp
df = pd.concat([df, df_cp], ignore_index=True)

  df = pd.concat([df, df_cp], ignore_index=True)


In [27]:
# Convert data types to text
obj_cols = df.select_dtypes('object').columns
df[obj_cols] = df[obj_cols].astype('string')

In [28]:
# Extract year from CVE ID
df['year_cve'] = df['cve_id'].str.split('-').str[1]

# Convert year columns back to whole numbers
year_cols = ['year_start', 'year_end', 'year_cve']
df[year_cols] = df[year_cols].astype('Int64')

# Move the year column
df.insert(1, 'year_cve', df.pop('year_cve'))

### Importing and Cleaning the CVE Data
All these steps were determined to be necessary in the `APT_IoT_CVE_EDA` notebook.

In [29]:
# Import
cves = pd.read_parquet('../data/CVE_V5/CVE_List.parquet')

# Drop rejected CVEs
cves = cves.drop(cves[cves['cve_state'] == 'REJECTED'].index)

# Convert publication date to datetime format
cves['date_published'] = pd.to_datetime(cves['date_published'], format='ISO8601', utc=True)

# Convert objects to text data (string)
obj_cols = cves.select_dtypes(include=['object']).columns
cves[obj_cols] = cves[obj_cols].astype('string')

# Standardize severity scores
cves['severity'] = cves['severity'].replace(['medium', 'MODERATE'], 'MEDIUM')

# Remove leading or trailing whitespace
str_cols = cves.select_dtypes(include=['string']).columns
cves[str_cols] = cves[str_cols].apply(lambda x: x.str.strip())

In [30]:
# Glance
cves.head(3)

Unnamed: 0,cve_id,cwe_id,cve_state,date_published,description,severity,severity_score,attack_vector,attack_complexity
0,CVE-1999-0001,,PUBLISHED,2000-02-04 05:00:00+00:00,ip_input.c in BSD-derived TCP/IP implementatio...,,,,
1,CVE-1999-0002,,PUBLISHED,1999-09-29 04:00:00+00:00,Buffer overflow in NFS mountd gives root acces...,,,,
2,CVE-1999-0003,,PUBLISHED,1999-09-29 04:00:00+00:00,Execute commands as root via buffer overflow i...,,,,


### Merging into Main Dataframe
Since we don't want hundreds of thousands of CVEs that we don't know are related to IoTs in our dataset, I'm going to preform a leftward merge into our nation-state and IoT CVE attack data. This will only keep information from the CVE data if a CVE's `cve_id` is also found in our main dataset's `cve_id` attribute.

In [31]:
df = df.merge(
    cves,
    on=['cve_id', 'description'],
    how='left'
)

In [32]:
# Rename CVE data's attack-related attributes for clarity
df = df.rename(columns={
    'date_published': 'cve_publish_date',
    'severity': 'cve_severity',
    'severity_score': 'cve_severity_score',
    'attack_vector': 'cve_attack_vector',
    'attack_complexity': 'cve_attack_complexity'
})

In [33]:
df.head(3)

Unnamed: 0,cve_id,year_cve,description,attack,year_start,year_end,attribution_group,attribution_state,cwe_id,cve_state,cve_publish_date,cve_severity,cve_severity_score,cve_attack_vector,cve_attack_complexity
0,CVE-2014-0160,2014,The (1) TLS and (2) DTLS implementations in Op...,Heartbleed Exploits,2014,2014,,China,,PUBLISHED,2014-04-07 00:00:00+00:00,,,,
1,CVE-2017-0144,2017,The SMBv1 server in Microsoft Windows Vista SP...,Not Petya Ransomware Attack,2017,2017,Sandworm,Russia,,PUBLISHED,2017-03-17 00:00:00+00:00,,,,
2,CVE-2017-0144,2017,The SMBv1 server in Microsoft Windows Vista SP...,WannaCry Ransomware Attack,2017,2017,Lazarus,DPRK,,PUBLISHED,2017-03-17 00:00:00+00:00,,,,


In [34]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1123 entries, 0 to 1122
Data columns (total 15 columns):
 #   Column                 Non-Null Count  Dtype              
---  ------                 --------------  -----              
 0   cve_id                 1115 non-null   string             
 1   year_cve               1115 non-null   Int64              
 2   description            1115 non-null   string             
 3   attack                 31 non-null     string             
 4   year_start             31 non-null     Int64              
 5   year_end               31 non-null     Int64              
 6   attribution_group      14 non-null     string             
 7   attribution_state      26 non-null     string             
 8   cwe_id                 107 non-null    string             
 9   cve_state              1103 non-null   string             
 10  cve_publish_date       1103 non-null   datetime64[ns, UTC]
 11  cve_severity           476 non-null    string           

In [35]:
df.head()

Unnamed: 0,cve_id,year_cve,description,attack,year_start,year_end,attribution_group,attribution_state,cwe_id,cve_state,cve_publish_date,cve_severity,cve_severity_score,cve_attack_vector,cve_attack_complexity
0,CVE-2014-0160,2014,The (1) TLS and (2) DTLS implementations in Op...,Heartbleed Exploits,2014,2014,,China,,PUBLISHED,2014-04-07 00:00:00+00:00,,,,
1,CVE-2017-0144,2017,The SMBv1 server in Microsoft Windows Vista SP...,Not Petya Ransomware Attack,2017,2017,Sandworm,Russia,,PUBLISHED,2017-03-17 00:00:00+00:00,,,,
2,CVE-2017-0144,2017,The SMBv1 server in Microsoft Windows Vista SP...,WannaCry Ransomware Attack,2017,2017,Lazarus,DPRK,,PUBLISHED,2017-03-17 00:00:00+00:00,,,,
3,CVE-2017-12074,2017,Directory traversal vulnerability in the SYNO....,VPNFilter,2018,2018,APT28 (Fancy Bear),Russia,CWE-22,PUBLISHED,2017-08-23 00:00:00+00:00,,,,
4,CVE-2017-5638,2017,The Jakarta Multipart parser in Apache Struts ...,Equifax Data Breach,2017,2017,APT10,China,,PUBLISHED,2017-03-11 02:11:00+00:00,,,,


In [41]:
df[['attack', 'attribution_group', 'attribution_state']].value_counts()
df['attack'].value_counts()

attack
VPNFilter                                           5
Ripple20 Vulnerabilities                            4
Microsoft Exchange ProxyShell Exploits              3
Stuxnet                                             2
Triton/Trisis                                       2
F5 BIG-IP Exploits                                  2
Not Petya Ransomware Attack                         1
WannaCry Ransomware Attack                          1
Equifax Data Breach                                 1
Iranian APT Exploits on Fortinet Vulnerabilities    1
Pulse Secure VPN Exploits                           1
Operation Shadowhammer                              1
Iranian Cyberattacks on Water Systems               1
Heartbleed Exploits                                 1
Mirai Botnet                                        1
Dragonfly/Energetic Bear Campaign 1                 1
Dragonfly/Energetic Bear Campaign 2                 1
BlackEnergy Attack on Ukraine                       1
SolarWinds Orion Supp

## Saving the Dataframe

In [1]:
df.info()

NameError: name 'df' is not defined