CVE HUNTER

ML-Powered Vulnerability Intelligence System

📋 Overview

CVE Hunter is an advanced vulnerability intelligence platform that combines machine learning with a comprehensive CVE database to help security professionals, developers, and system administrators quickly identify and understand security vulnerabilities relevant to their infrastructure.

The system uses Natural Language Processing (NLP) and machine learning classification to predict vulnerability types while maintaining a powerful keyword search across the National Vulnerability Database (NVD) CVE dataset spanning from 2011-2026.

✨ Key Features

🤖 ML-Powered Classification: Automatically predicts vulnerability types with confidence scoring
⚡ Fast Search: Keyword-based search across 14+ years of CVE data
📊 Advanced Filtering: Filter by severity, vulnerability type, and CVSS score
🔍 Detailed CVE Information:
- Full CVE descriptions
- CVSS scores and severity ratings
- Common Weakness Enumeration (CWE) mapping
- Publication and modification dates
- References and external links
💾 Comprehensive Database: 15+ years of standardized CVE data from NVD
🎨 Cyberpunk UI: Modern, fast, responsive web interface with dark theme
📈 Statistics Dashboard: Overview of database coverage and vulnerability trends
🔗 One-Click Expansion: View full details with expandable CVE cards

🛠️ Tech Stack

Backend

Flask - Python web framework
scikit-learn - Machine learning and vectorization
joblib - Model serialization
CORS - Cross-origin resource sharing

Frontend

HTML5 / CSS3 - Modern semantic markup and styling
JavaScript (Vanilla) - Dynamic interactions
Responsive Design - Works on desktop and mobile

Data & ML

JSON - CVE dataset storage
TF-IDF Vectorizer - Text feature extraction
ML Classification Model - Vulnerability type prediction

📦 Project Structure

project/
├── app.py                    # Flask backend & API
├── index.html               # Frontend UI
├── clean_raw.py             # Data cleaning script
├── train_model.ipynb        # ML model training notebook
├── requirements.txt         # Python dependencies
├── model/
│   ├── model.pkl           # Trained ML classifier
│   └── vectorizer.pkl      # TF-IDF vectorizer
├── README.md               # This file
├── LICENSE                 # MIT License
└── BRAND.md               # Brand guidelines

🚀 Quick Start

Prerequisites

Python 3.8 or higher
pip (Python package manager)

Installation

Clone or navigate to the project directory:
```
cd ~/Desktop/project
```
Install dependencies:
```
pip install -r requirements.txt
```
Run the application:
```
python app.py
```
The application will:
- Start the Flask server on http://localhost:5000
- Automatically open the web interface in your default browser
- Load the ML model and CVE dataset

Access the Application

Web UI: http://localhost:5000
API Base URL: http://localhost:5000

📖 How to Use

Search for Vulnerabilities

Enter a search query in the search box:
- Application names: apache, nginx, openssl
- Software versions: 2.4.49, 1.0.2
- Known CVEs: log4j, heartbleed
- Package names: windows, linux
Press [SCAN] button or hit Enter
Results display:
- ML Prediction: Detected vulnerability type with confidence %
- Matched CVEs: Ranked by relevance
- Severity Badges: Visual severity indicators (CRITICAL, HIGH, MEDIUM, LOW)

View Full Details

Click the ▶ [DETAILS] button on any CVE card
View expanded information:
- Complete description
- All references and links
- CWE classification
- CVSS score
- Publication dates
Click again to collapse

Clear Search

Click [CLR] button to reset and start a new search

View Statistics

Top panel shows:
- Total CVEs in database
- Top vulnerability types
- Critical vulnerability count

🔌 API Endpoints

`POST /search`

Search for CVEs and get ML predictions.

Request:

{
  "query": "apache 2.4.49"
}

Response:

{
  "query": "apache 2.4.49",
  "predicted_type": "Improper Input Validation",
  "confidence": 92.5,
  "total_matches": 45,
  "results": [
    {
      "cve_id": "CVE-2021-41773",
      "severity": "HIGH",
      "cvss_score": 7.5,
      "description": "...",
      "vuln_type": "...",
      "references": [...],
      "published": "2021-10-05",
      "cwe": "CWE-22"
    }
  ]
}

`GET /stats`

Get database statistics.

Response:

{
  "total": 195000,
  "vuln_types": {
    "Improper Input Validation": 15234,
    "SQL Injection": 12456,
    "Cross-site Scripting": 9876,
    ...
  },
  "severities": {
    "CRITICAL": 3456,
    "HIGH": 24567,
    "MEDIUM": 89234,
    "LOW": 78234
  }
}

`GET /cve/<cve_id>`

Get full details for a specific CVE.

Example: /cve/CVE-2021-41773

Response:

{
  "cve_id": "CVE-2021-41773",
  "description": "...",
  "severity": "HIGH",
  "cvss_score": 7.5,
  "vuln_type": "Path Traversal",
  "cwe": "CWE-22",
  "published": "2021-10-05",
  "modified": "2024-11-21",
  "references": [...]
}

🔧 Configuration

Model Retraining

To retrain the ML model with updated CVE data:

Update the raw CVE files in raw_data/
Run the cleaning script:
```
python clean_raw.py
```
Run the training notebook:
```
jupyter notebook train_model.ipynb
```
Replace model/model.pkl and model/vectorizer.pkl with updated versions

Port Configuration

To change the Flask server port, modify app.py:

if __name__ == "__main__":
    Timer(1.5, open_browser).start()
    app.run(debug=True, port=8000)  # Change port here

📊 Data Sources

National Vulnerability Database (NVD): https://nvd.nist.gov/
CVE Dataset: JSON format from NVD API
Time Coverage: 2011 - 2026
Update Frequency: Latest CVE data available

🗂️ Download Pre-processed Dataset

Available on Kaggle: 2021-2025 All CVEs Cleaned Dataset

To use the Kaggle dataset:

Download from the link above
Place cve_enriched.json in the clean_data/ folder
Run the application as normal

🎨 UI Features

Responsive Design

Desktop optimized (960px max-width)
Mobile-friendly layout
Touch-friendly buttons and controls

Visual Indicators

Severity Colors: Critical (Red), High (Pink), Medium (Orange), Low (Green)
ML Confidence Bar: Visual representation of model confidence
Scanline Effect: Cyberpunk aesthetic with animated scanlines
Grid Background: Tech-themed grid pattern

Dark Theme

Low-light cyberpunk aesthetic
Easy on the eyes for extended use
High contrast for accessibility
Color-coded information (Cyan for primary, Pink for danger, Green for success)

⚙️ Dependencies

See requirements.txt:

Flask
Flask-CORS
scikit-learn
joblib

Install all with:

pip install -r requirements.txt

🐛 Troubleshooting

Issue: "Backend not running" error

Solution: Ensure Flask is running

python app.py

Issue: Model files not found

Solution: Check paths in app.py match your folder structure:

model      = joblib.load("model/model.pkl")
vectorizer = joblib.load("model/vectorizer.pkl")

Issue: CVE data not loading

Solution: Verify clean_data/cve_enriched.json exists and is valid JSON

Issue: Port 5000 already in use

Solution: Change port in app.py:

app.run(debug=True, port=8000)

🔐 Security Considerations

Backend CORS enabled for frontend communication
No authentication required (for local/trusted networks)
For production: Add authentication, HTTPS, and security headers
Validate and sanitize all user inputs
Keep CVE database updated regularly

📈 Performance

Search Speed: < 100ms for 195,000+ CVEs
ML Inference: < 50ms per query
Memory Usage: ~500MB with full dataset loaded
Browser Load Time: < 2 seconds

🤝 Contributing

Found a bug or have a feature request?

Review existing issues
Provide clear description and reproduction steps
Include environment details (Python version, OS, etc.)

📜 License

This project is licensed under the MIT License.

See the LICENSE file for full license text.

MIT License

Copyright (c) 2026 Ben

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions...

🎨 Brand & Design

For branding guidelines, color schemes, and design specifications, see BRAND.md.

👨‍💼 Author

Ben
Security Researcher & Software Developer

Portfolio: https://www.benslates.xyz
Year: 2026
License: MIT

📞 Support

For questions, issues, or feedback:

Check this README
Review the troubleshooting section
Contact via portfolio: https://www.benslates.xyz

🙏 Acknowledgments

National Vulnerability Database (NVD) - Data source
NIST - CVE standards and classifications
scikit-learn - ML toolkit
Flask - Web framework
Open Source Community - Tools and inspiration

🔮 Future Roadmap

Advanced filtering and faceted search
Custom dashboards and reports
CVE trend analysis and predictions
Integration with security tools (Nessus, Qualys)
Real-time CVE feed alerts
Multi-language support
Dark/Light theme toggle
Export results (PDF, CSV, JSON)
User accounts and saved searches

Last Updated: April 2026
Status: Active Development
Maintained By: Ben

Made with ❤️ by Ben

Empowering security professionals with AI-powered vulnerability intelligence

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
model		model
.env.example		.env.example
.gitignore		.gitignore
BRAND.md		BRAND.md
LICENSE		LICENSE
README.md		README.md
app.py		app.py
clean_raw.py		clean_raw.py
index.html		index.html
requirements.txt		requirements.txt
train_model.ipynb		train_model.ipynb

Folders and files

Latest commit

History

Repository files navigation

CVE HUNTER

ML-Powered Vulnerability Intelligence System

📋 Overview

✨ Key Features

🛠️ Tech Stack

Backend

Frontend

Data & ML

📦 Project Structure

🚀 Quick Start

Prerequisites

Installation

Access the Application

📖 How to Use

Search for Vulnerabilities

View Full Details

Clear Search

View Statistics

🔌 API Endpoints

POST /search

GET /stats

GET /cve/<cve_id>

🔧 Configuration

Model Retraining

Port Configuration

📊 Data Sources

🗂️ Download Pre-processed Dataset

🎨 UI Features

Responsive Design

Visual Indicators

Dark Theme

⚙️ Dependencies

🐛 Troubleshooting

Issue: "Backend not running" error

Issue: Model files not found

Issue: CVE data not loading

Issue: Port 5000 already in use

🔐 Security Considerations

📈 Performance

🤝 Contributing

📜 License

🎨 Brand & Design

👨‍💼 Author

📞 Support

🙏 Acknowledgments

🔮 Future Roadmap

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`POST /search`

`GET /stats`

`GET /cve/<cve_id>`

Packages