Skip to content

artemis19/ExploitDBScraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ExploitDBScraper

Kaitlyn DeValk | March 17, 2022


Scraping the ExploitDB repository to use the data in a visualization project. The database metrics are subject to change dependent on when you pull the data.

Add ExploitDB Repository as submodule

Package repository into my own:

git submodule add https://github.com/offensive-security/exploitdb

Important Data Fields

Date of release
Exploit Title
Type
Platform
Tag(s)
Filename
File Extension
Filesize
File hash

The file file_exploits.csv and the repository data contain the following variables with accurate information:

  1. Filename, part of file variable
  2. File Extension, part of file variable
  3. Exploit Title, description variable
  4. Type
  5. Platform
  6. Filesize (Can extract this from the repository files)
  7. Hash (Can extract this from the files)

The Date was included as a field, but every exploit was labeled with the same release date: 1970-01-01.

The two remaining fields that I needed to scrape from the github repository or website would have been:

  1. Date
  2. Tag(s)

After messing with the data in the CSV, I ended up deciding to actually scrape everything, with exception to the actual files, from the ExploitDB website since there appeared to be no easy way to get accurate dates or tags from the github repository, and it would have been more work to try to put the two separate datasets together versus getting everything from one location. This was done using the scraper.py script.

Data Parsing

This scraper was made for creating a dataset that could be imported into Tableau to create visualizations. The parser.py script is meant to extract the relevant data fields and also clean the data for duplicate exploits (those that share the same files) and place the data into a CSV file for easy import into Tableau. Exploits that are created for the same CVE will not be considered duplicate if they are written in separate languages or perform a different attack vector even if it's for the same vulnerability, as that will show how impactful a vulnerability was if multiple exploits were created for it.

I also added columns or "tags" for the exploits to compare each one to the MITRE CWE Software Top 25 list for 2021.

I weigh the tag as double what the title is because when a tag is present, that's a incredibly good indicator of its category, but those are not always available.

A score of 0 for all 25 CWE is not mapped at all, which takes out about half the data points for 2021.

Notes

exploitdb/exploits is organized by platform subdirectories, which then has the type underneath:

+ exploits
----+ platform
	----+ type

The data is not uploaded as it's near 50MB.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages