### Liens python :

- [docs.python](https://docs.python.org)
- [python.doctor](https://python.doctor)

### Projet :
- [CVE dataset](https://nvd.nist.gov/vuln/data-feeds)

- [CVE descriptif](https://www.redhat.com/fr/topics/security/what-is-cve)

- [CPE descriptif](https://medium.com/prohacktive/comment-exploiter-la-base-cve-du-nist-dfb10837da5c)

- [Standards pour la gestion des vulnérabilités](https://www.cert-ist.com/public/fr/SO_detail?code=standards_gestion_vulnerabilites)

### à voir :

- cartopy
- ipywidget
- [kaggle](https://www.kaggle.com/)

<hr>

<h2>Introduction</h2>

**Common Vulnerabilities and Exposures** ou **CVE** est un dictionnaire des informations publiques relatives aux vulnérabilités de sécurité. Le dictionnaire est maintenu par l'organisme MITRE, soutenu par le département de la Sécurité intérieure des États-Unis.

**Common Vulnerability Scoring System** ou **CVSS** est un système d'évaluation standardisé de la criticité des vulnérabilités selon des critères objectifs et mesurables.


CVSS se compose de trois groupes de mesures : Base, Temporel, et Environnemental. 
Les métriques de base produisent un score allant de 0 à 10, qui peut ensuite être modifié en notant les métriques temporelles et environnementales.
<br>
Un score CVSS est également représenté sous la forme d'une chaîne vectorielle, une représentation textuelle comprimée des valeurs utilisées pour obtenir le score. Ainsi, CVSS est bien adapté comme système de mesure standard pour les industries, les organisations et les gouvernements qui ont besoin de scores de gravité de vulnérabilité précis et cohérents.
<br>

La base de données nationale sur les vulnérabilités (**NVD**) fournit des scores CVSS pour presque toutes les vulnérabilités connues.
<br>

**La NVD prend en charge les normes CVSS v2.0 et v3.X.**

![cvss](media/cvss.png)

Le NVD fournit des "scores de base" CVSS qui représentent les caractéristiques innées de chaque vulnérabilité.
Le NVD ne fournit pas actuellement de "scores temporels" (mesures qui évoluent dans le temps en raison d'événements extérieurs à la vulnérabilité) ou de "scores environnementaux" (scores personnalisés pour refléter l'impact de la vulnérabilité sur votre organisation). Cependant, le NVD fournit un calculateur CVSS pour CVSS v2 et v3 qui vous permet d'ajouter des données de score temporel et environnemental.
<br>

Pour certaines vulnérabilités, toutes les informations nécessaires à la création des scores CVSS peuvent ne pas être disponibles. Cela se produit généralement lorsqu'un fournisseur annonce une vulnérabilité mais refuse de fournir certains détails. Dans de telles situations, les analystes du NVD attribuent des scores CVSS en utilisant l'approche du pire cas. Ainsi, si un fournisseur ne fournit aucun détail sur une vulnérabilité, le NVD attribuera à cette vulnérabilité une note de 10.0 (la note la plus élevée).



In [None]:
# Download all cve from nvd.nist.gov

import requests
from datetime import datetime

url = 'https://nvd.nist.gov/feeds/json/cve/1.1/nvdcve-1.1-{}.json.gz'

for cve in range(2002, datetime.now().year+1):
    data = requests.get(url.format(cve)).content
    with open(f'cve_dataset/nvdcve-1.1-{cve}.json.gz','wb') as f:
        f.write(data)



In [42]:
# Extract gzip data and create dataframe
import json
from glob import glob
import gzip
import pandas as pd
from pandas import json_normalize

cve_dataset = list()

for data in glob('cve_dataset/*.gz'):
    with open(data,'rb') as f:
        json_data = json.loads(gzip.decompress(f.read()))

        cve_items = json_normalize(json_data['CVE_Items'])
        cve_items.drop(
            ['cve.data_type', 'cve.data_format', 'cve.data_version', 'cve.problemtype.problemtype_data', 'cve.references.reference_data', 'configurations.CVE_data_version', 'configurations.nodes','cve.description.description_data'],
            axis=1,
            inplace=True
        )

        descriptions = json_normalize(
            json_data['CVE_Items'],
            record_path=[['cve','description','description_data']],
            meta=[['cve','CVE_data_meta','ID']]
        )

        descriptions.drop(['lang'],axis=1,inplace=True)
        descriptions.rename(columns={"value":"description"}, inplace=True)

        dataframe = cve_items.merge(descriptions,on='cve.CVE_data_meta.ID')

        dataframe.rename(columns={"cve.CVE_data_meta.ID":"ID"}, inplace=True)
        dataframe = dataframe.set_index("ID")
        
        cve_dataset.append(dataframe)

df = pd.concat(cve_dataset)
df = df[sorted(df)]



In [41]:
df.head()

Unnamed: 0,cve.CVE_data_meta.ASSIGNER,cve.CVE_data_meta.ID,impact.baseMetricV2.acInsufInfo,impact.baseMetricV2.cvssV2.accessComplexity,impact.baseMetricV2.cvssV2.accessVector,impact.baseMetricV2.cvssV2.authentication,impact.baseMetricV2.cvssV2.availabilityImpact,impact.baseMetricV2.cvssV2.baseScore,impact.baseMetricV2.cvssV2.confidentialityImpact,impact.baseMetricV2.cvssV2.integrityImpact,impact.baseMetricV2.cvssV2.vectorString,impact.baseMetricV2.cvssV2.version,impact.baseMetricV2.exploitabilityScore,impact.baseMetricV2.impactScore,impact.baseMetricV2.obtainAllPrivilege,impact.baseMetricV2.obtainOtherPrivilege,impact.baseMetricV2.obtainUserPrivilege,impact.baseMetricV2.severity,impact.baseMetricV2.userInteractionRequired,impact.baseMetricV3.cvssV3.attackComplexity,impact.baseMetricV3.cvssV3.attackVector,impact.baseMetricV3.cvssV3.availabilityImpact,impact.baseMetricV3.cvssV3.baseScore,impact.baseMetricV3.cvssV3.baseSeverity,impact.baseMetricV3.cvssV3.confidentialityImpact,impact.baseMetricV3.cvssV3.integrityImpact,impact.baseMetricV3.cvssV3.privilegesRequired,impact.baseMetricV3.cvssV3.scope,impact.baseMetricV3.cvssV3.userInteraction,impact.baseMetricV3.cvssV3.vectorString,impact.baseMetricV3.cvssV3.version,impact.baseMetricV3.exploitabilityScore,impact.baseMetricV3.impactScore,lastModifiedDate,publishedDate
0,secalert@redhat.com,CVE-2010-0001,True,MEDIUM,NETWORK,NONE,PARTIAL,6.8,PARTIAL,PARTIAL,AV:N/AC:M/Au:N/C:P/I:P/A:P,2.0,8.6,6.4,False,False,False,MEDIUM,True,,,,,,,,,,,,,,,2017-09-19 01:30:00+00:00,2010-01-29 18:30:00+00:00
1,secalert@redhat.com,CVE-2010-0002,True,LOW,LOCAL,NONE,PARTIAL,2.1,NONE,NONE,AV:L/AC:L/Au:N/C:N/I:N/A:P,2.0,3.9,2.9,False,False,False,LOW,False,,,,,,,,,,,,,,,2011-08-08 04:00:00+00:00,2010-01-14 18:30:00+00:00
2,secalert@redhat.com,CVE-2010-0003,True,MEDIUM,LOCAL,NONE,COMPLETE,5.4,PARTIAL,NONE,AV:L/AC:M/Au:N/C:P/I:N/A:C,2.0,3.4,7.8,False,False,False,MEDIUM,False,,,,,,,,,,,,,,,2018-11-16 15:53:00+00:00,2010-01-26 18:30:00+00:00
3,secalert@redhat.com,CVE-2010-0004,True,LOW,NETWORK,NONE,NONE,5.0,PARTIAL,NONE,AV:N/AC:L/Au:N/C:P/I:N/A:N,2.0,10.0,2.9,False,False,False,MEDIUM,False,,,,,,,,,,,,,,,2018-08-13 21:47:00+00:00,2010-01-29 18:30:00+00:00
4,secalert@redhat.com,CVE-2010-0005,True,LOW,NETWORK,NONE,PARTIAL,7.5,PARTIAL,PARTIAL,AV:N/AC:L/Au:N/C:P/I:P/A:P,2.0,10.0,6.4,False,False,False,HIGH,False,,,,,,,,,,,,,,,2010-02-02 05:00:00+00:00,2010-01-29 18:30:00+00:00


In [45]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 186100 entries, CVE-2010-0001 to CVE-2008-7321
Data columns (total 35 columns):
 #   Column                                            Non-Null Count   Dtype              
---  ------                                            --------------   -----              
 0   cve.CVE_data_meta.ASSIGNER                        186100 non-null  object             
 1   description                                       186100 non-null  object             
 2   impact.baseMetricV2.acInsufInfo                   186100 non-null  bool               
 3   impact.baseMetricV2.cvssV2.accessComplexity       174852 non-null  object             
 4   impact.baseMetricV2.cvssV2.accessVector           174852 non-null  object             
 5   impact.baseMetricV2.cvssV2.authentication         174852 non-null  object             
 6   impact.baseMetricV2.cvssV2.availabilityImpact     174852 non-null  object             
 7   impact.baseMetricV2.cvssV2.baseScore      

In [43]:
# pd.options.display.max_columns = None

# # cast column type
# # date
# df['publishedDate'] = pd.to_datetime(df['publishedDate'])
# df['lastModifiedDate'] = pd.to_datetime(df['lastModifiedDate'])

# # boolean
# df['impact.baseMetricV2.obtainAllPrivilege'] = df['impact.baseMetricV2.obtainAllPrivilege'].astype(bool)
# df['impact.baseMetricV2.acInsufInfo'] = df['impact.baseMetricV2.acInsufInfo'].astype(bool)
# df['impact.baseMetricV2.obtainUserPrivilege'] = df['impact.baseMetricV2.obtainUserPrivilege'].astype(bool)
# df['impact.baseMetricV2.obtainOtherPrivilege'] = df['impact.baseMetricV2.obtainOtherPrivilege'].astype(bool)
# df['impact.baseMetricV2.userInteractionRequired'] = df['impact.baseMetricV2.userInteractionRequired'].astype(bool)


In [37]:
# displays statistics for quantitative variables
df.describe()

Unnamed: 0,impact.baseMetricV2.cvssV2.baseScore,impact.baseMetricV2.exploitabilityScore,impact.baseMetricV2.impactScore,impact.baseMetricV3.cvssV3.baseScore,impact.baseMetricV3.exploitabilityScore,impact.baseMetricV3.impactScore
count,174852.0,174852.0,174852.0,101524.0,101524.0,101524.0
mean,5.907614,8.091637,5.47101,7.212935,2.696689,4.376551
std,1.984387,2.160103,2.591998,1.655646,0.947169,1.512204
min,0.0,1.2,0.0,1.8,0.1,1.4
25%,4.3,8.0,2.9,6.1,1.8,3.6
50%,5.4,8.6,6.4,7.5,2.8,3.6
75%,7.5,10.0,6.4,8.8,3.9,5.9
max,10.0,10.0,10.0,10.0,3.9,6.0


In [39]:
df

Unnamed: 0,cve.CVE_data_meta.ASSIGNER,cve.CVE_data_meta.ID,impact.baseMetricV2.acInsufInfo,impact.baseMetricV2.cvssV2.accessComplexity,impact.baseMetricV2.cvssV2.accessVector,impact.baseMetricV2.cvssV2.authentication,impact.baseMetricV2.cvssV2.availabilityImpact,impact.baseMetricV2.cvssV2.baseScore,impact.baseMetricV2.cvssV2.confidentialityImpact,impact.baseMetricV2.cvssV2.integrityImpact,impact.baseMetricV2.cvssV2.vectorString,impact.baseMetricV2.cvssV2.version,impact.baseMetricV2.exploitabilityScore,impact.baseMetricV2.impactScore,impact.baseMetricV2.obtainAllPrivilege,impact.baseMetricV2.obtainOtherPrivilege,impact.baseMetricV2.obtainUserPrivilege,impact.baseMetricV2.severity,impact.baseMetricV2.userInteractionRequired,impact.baseMetricV3.cvssV3.attackComplexity,impact.baseMetricV3.cvssV3.attackVector,impact.baseMetricV3.cvssV3.availabilityImpact,impact.baseMetricV3.cvssV3.baseScore,impact.baseMetricV3.cvssV3.baseSeverity,impact.baseMetricV3.cvssV3.confidentialityImpact,impact.baseMetricV3.cvssV3.integrityImpact,impact.baseMetricV3.cvssV3.privilegesRequired,impact.baseMetricV3.cvssV3.scope,impact.baseMetricV3.cvssV3.userInteraction,impact.baseMetricV3.cvssV3.vectorString,impact.baseMetricV3.cvssV3.version,impact.baseMetricV3.exploitabilityScore,impact.baseMetricV3.impactScore,lastModifiedDate,publishedDate
0,secalert@redhat.com,CVE-2010-0001,True,MEDIUM,NETWORK,NONE,PARTIAL,6.8,PARTIAL,PARTIAL,AV:N/AC:M/Au:N/C:P/I:P/A:P,2.0,8.6,6.4,False,False,False,MEDIUM,True,,,,,,,,,,,,,,,2017-09-19 01:30:00+00:00,2010-01-29 18:30:00+00:00
1,secalert@redhat.com,CVE-2010-0002,True,LOW,LOCAL,NONE,PARTIAL,2.1,NONE,NONE,AV:L/AC:L/Au:N/C:N/I:N/A:P,2.0,3.9,2.9,False,False,False,LOW,False,,,,,,,,,,,,,,,2011-08-08 04:00:00+00:00,2010-01-14 18:30:00+00:00
2,secalert@redhat.com,CVE-2010-0003,True,MEDIUM,LOCAL,NONE,COMPLETE,5.4,PARTIAL,NONE,AV:L/AC:M/Au:N/C:P/I:N/A:C,2.0,3.4,7.8,False,False,False,MEDIUM,False,,,,,,,,,,,,,,,2018-11-16 15:53:00+00:00,2010-01-26 18:30:00+00:00
3,secalert@redhat.com,CVE-2010-0004,True,LOW,NETWORK,NONE,NONE,5.0,PARTIAL,NONE,AV:N/AC:L/Au:N/C:P/I:N/A:N,2.0,10.0,2.9,False,False,False,MEDIUM,False,,,,,,,,,,,,,,,2018-08-13 21:47:00+00:00,2010-01-29 18:30:00+00:00
4,secalert@redhat.com,CVE-2010-0005,True,LOW,NETWORK,NONE,PARTIAL,7.5,PARTIAL,PARTIAL,AV:N/AC:L/Au:N/C:P/I:P/A:P,2.0,10.0,6.4,False,False,False,HIGH,False,,,,,,,,,,,,,,,2010-02-02 05:00:00+00:00,2010-01-29 18:30:00+00:00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7166,cve@mitre.org,CVE-2008-7315,True,LOW,NETWORK,NONE,PARTIAL,7.5,PARTIAL,PARTIAL,AV:N/AC:L/Au:N/C:P/I:P/A:P,2.0,10.0,6.4,False,False,False,HIGH,False,LOW,NETWORK,HIGH,9.8,CRITICAL,HIGH,HIGH,NONE,UNCHANGED,NONE,CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H,3.0,3.9,5.9,2017-11-03 17:15:00+00:00,2017-10-10 16:29:00+00:00
7167,security@debian.org,CVE-2008-7316,True,LOW,LOCAL,NONE,PARTIAL,2.1,NONE,NONE,AV:L/AC:L/Au:N/C:N/I:N/A:P,2.0,3.9,2.9,False,False,False,LOW,False,LOW,LOCAL,HIGH,5.5,MEDIUM,NONE,NONE,LOW,UNCHANGED,NONE,CVSS:3.0/AV:L/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H,3.0,1.8,3.6,2016-05-06 00:54:00+00:00,2016-05-02 10:59:00+00:00
7168,cve@mitre.org,CVE-2008-7319,True,LOW,NETWORK,NONE,COMPLETE,10.0,COMPLETE,COMPLETE,AV:N/AC:L/Au:N/C:C/I:C/A:C,2.0,10.0,10.0,False,False,False,HIGH,False,LOW,NETWORK,HIGH,9.8,CRITICAL,HIGH,HIGH,NONE,UNCHANGED,NONE,CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H,3.0,3.9,5.9,2017-11-29 15:49:00+00:00,2017-11-07 21:29:00+00:00
7169,cve@mitre.org,CVE-2008-7320,False,LOW,LOCAL,NONE,NONE,2.1,PARTIAL,NONE,AV:L/AC:L/Au:N/C:P/I:N/A:N,2.0,3.9,2.9,False,False,False,LOW,False,LOW,PHYSICAL,HIGH,6.8,MEDIUM,HIGH,HIGH,NONE,UNCHANGED,NONE,CVSS:3.0/AV:P/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H,3.0,0.9,5.9,2018-12-17 20:02:00+00:00,2018-11-18 19:29:00+00:00


In [None]:
# find by CVE-id
df.loc['CVE-2005-1479']

In [26]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 186100 entries, CVE-2010-0001 to CVE-2008-7321
Data columns (total 35 columns):
 #   Column                                            Non-Null Count   Dtype  
---  ------                                            --------------   -----  
 0   cve.CVE_data_meta.ASSIGNER                        186100 non-null  object 
 1   description                                       186100 non-null  object 
 2   impact.baseMetricV2.acInsufInfo                   76735 non-null   object 
 3   impact.baseMetricV2.cvssV2.accessComplexity       174852 non-null  object 
 4   impact.baseMetricV2.cvssV2.accessVector           174852 non-null  object 
 5   impact.baseMetricV2.cvssV2.authentication         174852 non-null  object 
 6   impact.baseMetricV2.cvssV2.availabilityImpact     174852 non-null  object 
 7   impact.baseMetricV2.cvssV2.baseScore              174852 non-null  float64
 8   impact.baseMetricV2.cvssV2.confidentialityImpact  174852 non-null  obj

In [None]:
df.isnull().sum()

In [49]:
df.duplicated().sum()

# ** REJECT ** DO NOT USE THIS CANDIDATE NUMBER
mask = df.duplicated()

df[mask]

Unnamed: 0_level_0,cve.CVE_data_meta.ASSIGNER,description,impact.baseMetricV2.acInsufInfo,impact.baseMetricV2.cvssV2.accessComplexity,impact.baseMetricV2.cvssV2.accessVector,impact.baseMetricV2.cvssV2.authentication,impact.baseMetricV2.cvssV2.availabilityImpact,impact.baseMetricV2.cvssV2.baseScore,impact.baseMetricV2.cvssV2.confidentialityImpact,impact.baseMetricV2.cvssV2.integrityImpact,impact.baseMetricV2.cvssV2.vectorString,impact.baseMetricV2.cvssV2.version,impact.baseMetricV2.exploitabilityScore,impact.baseMetricV2.impactScore,impact.baseMetricV2.obtainAllPrivilege,impact.baseMetricV2.obtainOtherPrivilege,impact.baseMetricV2.obtainUserPrivilege,impact.baseMetricV2.severity,impact.baseMetricV2.userInteractionRequired,impact.baseMetricV3.cvssV3.attackComplexity,impact.baseMetricV3.cvssV3.attackVector,impact.baseMetricV3.cvssV3.availabilityImpact,impact.baseMetricV3.cvssV3.baseScore,impact.baseMetricV3.cvssV3.baseSeverity,impact.baseMetricV3.cvssV3.confidentialityImpact,impact.baseMetricV3.cvssV3.integrityImpact,impact.baseMetricV3.cvssV3.privilegesRequired,impact.baseMetricV3.cvssV3.scope,impact.baseMetricV3.cvssV3.userInteraction,impact.baseMetricV3.cvssV3.vectorString,impact.baseMetricV3.cvssV3.version,impact.baseMetricV3.exploitabilityScore,impact.baseMetricV3.impactScore,lastModifiedDate,publishedDate
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1
CVE-2010-0253,cve@mitre.org,** REJECT ** DO NOT USE THIS CANDIDATE NUMBER...,True,,,,,,,,,,,,True,True,True,,True,,,,,,,,,,,,,,,2017-05-11 14:29:00+00:00,2017-05-11 14:29:00+00:00
CVE-2010-0259,cve@mitre.org,** REJECT ** DO NOT USE THIS CANDIDATE NUMBER...,True,,,,,,,,,,,,True,True,True,,True,,,,,,,,,,,,,,,2017-05-11 14:29:00+00:00,2017-05-11 14:29:00+00:00
CVE-2010-0493,cve@mitre.org,** REJECT ** DO NOT USE THIS CANDIDATE NUMBER...,True,,,,,,,,,,,,True,True,True,,True,,,,,,,,,,,,,,,2017-05-11 14:29:00+00:00,2017-05-11 14:29:00+00:00
CVE-2010-0495,cve@mitre.org,** REJECT ** DO NOT USE THIS CANDIDATE NUMBER...,True,,,,,,,,,,,,True,True,True,,True,,,,,,,,,,,,,,,2017-05-11 14:29:00+00:00,2017-05-11 14:29:00+00:00
CVE-2010-0809,cve@mitre.org,** REJECT ** DO NOT USE THIS CANDIDATE NUMBER...,True,,,,,,,,,,,,True,True,True,,True,,,,,,,,,,,,,,,2017-05-11 14:29:00+00:00,2017-05-11 14:29:00+00:00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
CVE-2008-7304,cve@mitre.org,** REJECT ** DO NOT USE THIS CANDIDATE NUMBER....,True,,,,,,,,,,,,True,True,True,,True,,,,,,,,,,,,,,,2020-11-05 20:15:00+00:00,2020-11-05 20:15:00+00:00
CVE-2008-7305,cve@mitre.org,** REJECT ** DO NOT USE THIS CANDIDATE NUMBER....,True,,,,,,,,,,,,True,True,True,,True,,,,,,,,,,,,,,,2020-11-05 20:15:00+00:00,2020-11-05 20:15:00+00:00
CVE-2008-7306,cve@mitre.org,** REJECT ** DO NOT USE THIS CANDIDATE NUMBER....,True,,,,,,,,,,,,True,True,True,,True,,,,,,,,,,,,,,,2020-11-05 20:15:00+00:00,2020-11-05 20:15:00+00:00
CVE-2008-7307,cve@mitre.org,** REJECT ** DO NOT USE THIS CANDIDATE NUMBER....,True,,,,,,,,,,,,True,True,True,,True,,,,,,,,,,,,,,,2020-11-05 20:15:00+00:00,2020-11-05 20:15:00+00:00


In [None]:
# test graph
years = df['publishedDate'].dt.year

# years[years == 1989]
# a = years.value_counts().sort_index()

years.value_counts().sort_index().plot(kind='bar',figsize=(15, 5),logy=True)
