# OSV API Vulnerabilities Collector

Starting from a set of <b> organizations</b> already present in the database, this notebook collected vulnerabilities with the <b>OSV API</b> and store them in the database.<br>

If an organization is not stored in the database, a **warning** log message will occur (both in stdout and in the log folder). Execution will continue without taking into account that organization.
<hr>

In [None]:
organizations = ["italia"] # Set here the GitHub username of the organization

# In order to obtain further information about vulnerabilities, NVD API are used. The public rate limit (without an API key) 
# is 5 requests in a rolling 30 second window, thus in this notebook the default wait_time is setted to 6 sec between to requests.
# If you want to speed up the process (up to 50 requests in a rolling 30 second window), you can get an API KEY by following the
# instructions in #https://nvd.nist.gov/developers/start-here#:~:text=to%20in%20sequence.-,Request%20an%20API%20Key,-On%20the%20API.

# Once the key is obtained, it is sufficient to paste it in the following variable for increasing the rate limit used in this notebook.
# If you do not want to use a KEY simply leave the following variable EMPTY.
nvd_api_key = "<NVD-API-KEY>"  

#### Requirements

<hr>

#### Logger set up

In [None]:
import logging, os, datetime,sys
from pathlib import Path
Path('logs').mkdir(parents=True,exist_ok=True)
# Logging Levels: DEBUG, INFO, WARNING, ERROR, CRITICAL
logging.basicConfig(#filename=os.path.join('logs','sbom_creator',str(datetime.datetime.now().strftime("%d-%m-%Y T%H %M %S")) +'.log'),
handlers=[
        logging.FileHandler(os.path.join('logs','log-'+str(datetime.datetime.now().strftime("%d-%m-%Y")) +'.log')),
        logging.StreamHandler(sys.stdout)
    ],
                    format='%(asctime)s |:| LEVEL:%(levelname)-2s |:| FILE:notebook_3 (osv_vulns).ipynb:%(lineno)-s |:| %(message)s',
                    datefmt='%Y-%m-%d %H:%M:%S',
                    level=logging.DEBUG)
logging.getLogger("urllib3").propagate = False

#### Database connection

In [None]:
from lib.sqlite_utils import DBConnection 

if not os.path.exists(os.path.join('database','database.sqlite')):
    logging.critical('Database does not exists! You need to create it first (db_builder.ipynb)')
    raise Exception('Database does not exists! You need to create it first (db_builder.ipynb)')

conn=DBConnection(os.path.join('database','database.sqlite'))
logging.info('Connected with "database/database.sqlite" database.') 

#### Checking organization existance

In [None]:
for org in range(len(organizations)):
    try:
        organizations[org] = conn.get_rows('organization',{'url':'https://github.com/{}'.format(organizations[org])})[0]
        logging.info('Found organization "{}" in the database!'.format(organizations[org]['user_name']))
    except IndexError as err:
        logging.warning('Cannot find organization "{}" in the database!'.format(organizations[org]))
        organizations.remove(organizations[org])
        continue

#### Get packages from repositories of the organizations:

In [None]:
packages = list()
for organization in organizations:
    res = conn.query('SELECT p.purl,p.name,p.package_manager,p.version,p.namespace FROM (SELECT * FROM manifest_dependency UNION SELECT * FROM parsed_dependency) d LEFT JOIN package p ON d.package=p.purl LEFT JOIN repository r ON r.url=d.repository LEFT JOIN organization o ON o.url=r.organization WHERE o.url="{}"'.format(organization['url']))
packages.extend([dict(zip(['purl','name','package_manager','version','namespace'],p)) for p in res])

In [None]:
packages = list({x['purl']:x for x in packages}.values()) # Drop duplicates

In [None]:
len(packages)

#### Get vulnerabilities for each package:

In [None]:
from lib.vuln_utils import get_osv_api_vulnerabilities

vulnerabilities, osv_affection = list(), list()
for package in packages:
    vulns, affecs = get_osv_api_vulnerabilities(package,logger= logging)
    vulnerabilities.extend(vulns)
    osv_affection.extend(affecs)

vulnerabilities = list({x['id']:x for x in vulnerabilities}.values())

#### Update database:

In [None]:
for vuln in vulnerabilities:
    conn.add_or_update('vulnerability',vuln)

for aff in osv_affection:
    conn.add_or_update('osv_api_potential_affection',aff)

#### Use NVD API to store more info about vulnerabilities collected with Grype:

In [None]:
from lib.vuln_utils import extend_vulns_with_nvdapi
logging.info('Getting more info about vulnerabilities with OSV API')
vulnerabilities = extend_vulns_with_nvdapi(vulnerabilities,wait_time=0.6 if nvd_api_key!='' else 6, logger=logging,nvd_api_key=nvd_api_key if nvd_api_key!='' else None)

#### Update database:

In [None]:
for vuln in vulnerabilities:
    conn.add_or_update('vulnerability',vuln)

#### Close database:

In [None]:
conn.close()