GitHub - JithukrishnanV/Vulnerability-analysis-Using-Python-ML: This project helps estimate the severity early by learning how humans have historically assigned severity scores based on vulnerability descriptions.

Predicting CVE Severity from Vulnerability Descriptions

Machine Learning–Based Risk Analysis Using CVSS v3.1

📖 Project Overview

Security teams often need to prioritize vulnerabilities before official CVSS scores are published. This project addresses that gap by predicting vulnerability severity directly from CVE descriptions using machine learning.

The system learns how historical CVE descriptions were translated into CVSS v3.1 metrics by security experts and applies that knowledge to newly disclosed vulnerabilities. By analyzing vulnerability text, the model predicts key CVSS attributes and reconstructs an estimated severity score to support early-stage risk assessment.

🎯 Problem Statement

CVE databases grow rapidly, making manual risk analysis slow and inconsistent
Official CVSS scores may be delayed after disclosure
Organizations need early, explainable severity estimates for faster remediation

🧠 How the System Works (Simple Explanation)

1️⃣ Training Phase

Input: Historical CVE descriptions + their official CVSS v3.1 metrics
CVE text is converted into numerical features using TF-IDF
Supervised machine learning models (Logistic Regression) learn patterns between:
- Vulnerability language
- Exploitability conditions
- Impact severity

📌 Important: The model is not told what “high” or “low” severity means. It learns this implicitly by observing how similar descriptions were scored in the past.

2️⃣ Prediction Phase

When a new CVE description is provided:

The model predicts individual CVSS components:
- Attack Vector
- Attack Complexity
- Privileges Required
- User Interaction
- Confidentiality, Integrity, Availability impact
These predicted values are combined using CVSS formulas to estimate a base severity score
The result enables early risk prioritization before official scoring

🔍 Example

Input CVE Description:

“A remote attacker can exploit this vulnerability via a network request without authentication, leading to system compromise.”

Model Output:

Attack Vector: Network
Privileges Required: None
Attack Complexity: Low
Availability Impact: High

➡️ Predicted Severity: High / Critical

🛠️ Technologies Used

Programming: Python
Machine Learning: Scikit-learn (Logistic Regression)
Text Processing: TF-IDF (NLP)
Data Handling: Pandas, NumPy
Visualization: Matplotlib, Seaborn, Plotly
Data Source: National Vulnerability Database (NVD)

📊 Key Results

Attack Vector prediction accuracy: 96%
Attack Complexity accuracy: 94%
User Interaction accuracy: 95%
Strong performance on early-stage severity estimation
High interpretability compared to deep learning models

📈 Visualization Features

CVE severity trend heatmaps
Vulnerability clustering (honeycomb visualization)
Time-series analysis of CVE growth
Interactive dashboard for analyst exploration

⚠️ Limitations

TF-IDF lacks contextual understanding (semantic ambiguity)
Some overlap between similar CVSS categories
Batch-based analysis (not real-time yet)

🚀 Future Work

Replace TF-IDF with transformer-based NLP models (BERT)
Integrate real-time CVE feeds
Support CVSS v4.0
Improve handling of class imbalance

📌 Why This Project Matters

This project demonstrates how machine learning can:

Reduce response time in vulnerability management
Provide explainable and transparent risk scoring
Assist security teams in prioritizing threats proactively

Here is an example of the CVE data from NVD

{
  "cveMetadata": {
    "cveId": "CVE-2025-35028",
    "datePublished": "2025-11-30T21:27:56.057Z"
  },
  "descriptions": [
    {
      "lang": "en",
      "value": "By providing a command-line argument starting with a semi-colon …"
    }
  ],
  "metrics": [
    {
      "cvssV3_1": {
        "attackVector": "NETWORK",
        "attackComplexity": "LOW",
        "privilegesRequired": "NONE",
        "userInteraction": "NONE",
        "confidentialityImpact": "HIGH",
        "integrityImpact": "HIGH",
        "availabilityImpact": "NONE",
        "baseScore": 9.1,
        "baseSeverity": "CRITICAL",
        "vectorString": "CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:N",
        "version": "3.1"
      }
    }
  ]
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Cheat Sheet.pdf		Cheat Sheet.pdf
Code.ipynb		Code.ipynb
Predicting CVE Severity from Vulnerability.pdf		Predicting CVE Severity from Vulnerability.pdf
README.md		README.md
Vulnerability analysis Using Python & ML.pdf		Vulnerability analysis Using Python & ML.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Predicting CVE Severity from Vulnerability Descriptions

📖 Project Overview

🎯 Problem Statement

🧠 How the System Works (Simple Explanation)

1️⃣ Training Phase

2️⃣ Prediction Phase

🔍 Example

🛠️ Technologies Used

📊 Key Results

📈 Visualization Features

⚠️ Limitations

🚀 Future Work

📌 Why This Project Matters

About

Uh oh!

Releases

Packages

Languages

JithukrishnanV/Vulnerability-analysis-Using-Python-ML

Folders and files

Latest commit

History

Repository files navigation

Predicting CVE Severity from Vulnerability Descriptions

📖 Project Overview

🎯 Problem Statement

🧠 How the System Works (Simple Explanation)

1️⃣ Training Phase

2️⃣ Prediction Phase

🔍 Example

🛠️ Technologies Used

📊 Key Results

📈 Visualization Features

⚠️ Limitations

🚀 Future Work

📌 Why This Project Matters

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages