Skip to content

This project helps estimate the severity early by learning how humans have historically assigned severity scores based on vulnerability descriptions.

Notifications You must be signed in to change notification settings

JithukrishnanV/Vulnerability-analysis-Using-Python-ML

Repository files navigation

Predicting CVE Severity from Vulnerability Descriptions

Machine Learning–Based Risk Analysis Using CVSS v3.1


📖 Project Overview

Security teams often need to prioritize vulnerabilities before official CVSS scores are published. This project addresses that gap by predicting vulnerability severity directly from CVE descriptions using machine learning.

The system learns how historical CVE descriptions were translated into CVSS v3.1 metrics by security experts and applies that knowledge to newly disclosed vulnerabilities. By analyzing vulnerability text, the model predicts key CVSS attributes and reconstructs an estimated severity score to support early-stage risk assessment.


🎯 Problem Statement

  • CVE databases grow rapidly, making manual risk analysis slow and inconsistent
  • Official CVSS scores may be delayed after disclosure
  • Organizations need early, explainable severity estimates for faster remediation

🧠 How the System Works (Simple Explanation)

1️⃣ Training Phase

  • Input: Historical CVE descriptions + their official CVSS v3.1 metrics

  • CVE text is converted into numerical features using TF-IDF

  • Supervised machine learning models (Logistic Regression) learn patterns between:

    • Vulnerability language
    • Exploitability conditions
    • Impact severity

📌 Important: The model is not told what “high” or “low” severity means. It learns this implicitly by observing how similar descriptions were scored in the past.


2️⃣ Prediction Phase

When a new CVE description is provided:

  • The model predicts individual CVSS components:

    • Attack Vector
    • Attack Complexity
    • Privileges Required
    • User Interaction
    • Confidentiality, Integrity, Availability impact
  • These predicted values are combined using CVSS formulas to estimate a base severity score

  • The result enables early risk prioritization before official scoring


🔍 Example

Input CVE Description:

“A remote attacker can exploit this vulnerability via a network request without authentication, leading to system compromise.”

Model Output:

  • Attack Vector: Network
  • Privileges Required: None
  • Attack Complexity: Low
  • Availability Impact: High

➡️ Predicted Severity: High / Critical


🛠️ Technologies Used

  • Programming: Python
  • Machine Learning: Scikit-learn (Logistic Regression)
  • Text Processing: TF-IDF (NLP)
  • Data Handling: Pandas, NumPy
  • Visualization: Matplotlib, Seaborn, Plotly
  • Data Source: National Vulnerability Database (NVD)

📊 Key Results

  • Attack Vector prediction accuracy: 96%
  • Attack Complexity accuracy: 94%
  • User Interaction accuracy: 95%
  • Strong performance on early-stage severity estimation
  • High interpretability compared to deep learning models

📈 Visualization Features

  • CVE severity trend heatmaps
  • Vulnerability clustering (honeycomb visualization)
  • Time-series analysis of CVE growth
  • Interactive dashboard for analyst exploration

⚠️ Limitations

  • TF-IDF lacks contextual understanding (semantic ambiguity)
  • Some overlap between similar CVSS categories
  • Batch-based analysis (not real-time yet)

🚀 Future Work

  • Replace TF-IDF with transformer-based NLP models (BERT)
  • Integrate real-time CVE feeds
  • Support CVSS v4.0
  • Improve handling of class imbalance

📌 Why This Project Matters

This project demonstrates how machine learning can:

  • Reduce response time in vulnerability management
  • Provide explainable and transparent risk scoring
  • Assist security teams in prioritizing threats proactively

Here is an example of the CVE data from NVD

{
  "cveMetadata": {
    "cveId": "CVE-2025-35028",
    "datePublished": "2025-11-30T21:27:56.057Z"
  },
  "descriptions": [
    {
      "lang": "en",
      "value": "By providing a command-line argument starting with a semi-colon …"
    }
  ],
  "metrics": [
    {
      "cvssV3_1": {
        "attackVector": "NETWORK",
        "attackComplexity": "LOW",
        "privilegesRequired": "NONE",
        "userInteraction": "NONE",
        "confidentialityImpact": "HIGH",
        "integrityImpact": "HIGH",
        "availabilityImpact": "NONE",
        "baseScore": 9.1,
        "baseSeverity": "CRITICAL",
        "vectorString": "CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:N",
        "version": "3.1"
      }
    }
  ]
}

About

This project helps estimate the severity early by learning how humans have historically assigned severity scores based on vulnerability descriptions.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published