Skip to content

ShortageOfName/Exploit-Database-Scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Exploit Database Scraper

A Python-based web scraping tool for extracting vulnerability and exploit data from exploit-db.com and exporting it to CSV format for analysis and research purposes.

🚀 Features

  • Automated Data Collection: Scrapes verified exploits from exploit-db.com
  • Batch Processing: Efficiently processes 3,500 records per API call
  • Comprehensive Data Extraction: Extracts 13 different data fields including CVE codes, descriptions, authors, and more
  • Animated Progress Tracking: Visual loading indicators with rotating dots
  • Error Handling: Graceful failure with detailed error reporting
  • CSV Export: Generates timestamped CSV files with UTF-8 encoding
  • Modular Architecture: Clean separation of utilities and main logic

📁 Project Structure

Webscraping ex-database/
├── main.py              # Main scraper application
├── utils.py             # Utility functions
├── README.md           # This file
└── exploits_*.csv      # Generated data files

🔧 Installation

Prerequisites

  • Python 3.7 or higher
  • pip (Python package installer)

Dependencies

pip install requests

Setup

  1. Clone or download this repository
  2. Navigate to the project directory
  3. Install dependencies using the command above

📖 Usage

Basic Usage

python main.py

This will:

  • Fetch all available exploit records from exploit-db.com
  • Process data in batches of 3,500 records
  • Generate a timestamped CSV file (e.g., exploits_20241005_143022.csv)
  • Display real-time progress with animated loading indicators

Output

The scraper generates CSV files with the following columns:

  • id: Exploit ID
  • description: Exploit description
  • type: Exploit type (e.g., Remote, Local)
  • platform: Target platform (e.g., Linux, Windows)
  • author: Author name
  • date_published: Publication date
  • verified: Verification status (Yes/No)
  • cve_codes: Associated CVE codes
  • download_link: Download URL
  • application_path: Application path
  • application_md5: File checksum
  • port: Target port
  • tags: Relevant tags

⚙️ Configuration

Batch Size

The batch size can be modified in main.py:

# Configuration
PAGE_SIZE = 3500  # Records per API call

API Parameters

HTTP headers and API parameters are configured in main.py:

headers = {
    'accept': 'application/json',
    'referer': 'https://www.exploit-db.com/',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
    'x-requested-with': 'XMLHttpRequest',
}

🛠️ Technical Details

Architecture

  • main.py: Core scraping logic, API integration, and CSV generation
  • utils.py: Utility functions for data processing and progress display
  • generate_pdf.py: PDF synopsis generator for documentation

Key Components

  • Data Extraction: HTTP requests to exploit-db.com API
  • Progress Tracking: Animated loading indicators with real-time updates
  • Error Handling: Comprehensive error catching and reporting
  • Data Processing: Batch processing with configurable page size

Performance

  • Processing Speed: 3,500 records per batch
  • Memory Efficiency: Processes data in chunks to avoid memory issues
  • Rate Limiting: 0.5-second delays between requests
  • Error Recovery: Continues processing even if individual requests fail

📊 Sample Output

id,description,type,platform,author,date_published,verified,cve_codes,download_link,application_path,application_md5,port,tags
12345,"Buffer Overflow in Web Server",Remote,Linux,"Security Researcher","2024-01-15","Yes","CVE-2024-1234","https://www.exploit-db.com/download/12345","/usr/bin/webserver","abc123def456",80,"buffer-overflow,web,linux"

🎯 Use Cases

  • Security Research: Vulnerability analysis and trend identification
  • Threat Intelligence: Exploit database for security teams
  • Academic Research: Cybersecurity studies and analysis
  • Compliance: Security assessment and reporting
  • Penetration Testing: Exploit reference for security testing

🔍 Troubleshooting

Common Issues

  1. Network Errors: Check internet connection and try again
  2. API Limits: The scraper includes rate limiting to avoid overwhelming the server
  3. Memory Issues: Large datasets are processed in batches automatically
  4. Permission Errors: Ensure write permissions for CSV file generation

Error Messages

  • "Error at position X": Indicates where the scraping stopped
  • "Fetched X records": Shows progress for each batch
  • "Error: [message]": General error with detailed description

📋 Requirements

  • Python: 3.7+
  • Memory: Minimum 512MB RAM
  • Storage: Varies based on dataset size (~1MB per 1000 records)
  • Network: Stable internet connection

🤝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Test thoroughly
  5. Submit a pull request

📄 License

This project is for educational and research purposes. Please respect the terms of service of exploit-db.com and use responsibly.

⚠️ Disclaimer

This tool is designed for legitimate security research and educational purposes only. Users are responsible for ensuring compliance with applicable laws and regulations. The authors are not responsible for any misuse of this software.

📞 Support

For issues, questions, or contributions, please:

  1. Check the troubleshooting section above
  2. Review the code comments for technical details
  3. Create an issue in the repository

Last Updated: January 2025
Maintainer: Security Research Team

About

Web scraping application designed to extract vulnerability and exploit data from exploit-db.com

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages