Skip to content

NS-unina/CVExploit

Repository files navigation

CVExploit: A Semi-Automated Multi-Agent Framework for Offensive Code Generation

In cybersecurity, proactive strategies aimed at identifying vulnerabilities before they can be exploited play a major role. Among these strategies is penetration testing which, despite its importance, remains a complex process that requires strong technical expertise and a significant time investment. One of the most critical phases is exploitation, during which a Proof of Concept (PoC) is developed to concretely verify whether an identified vulnerability is actually exploitable.

This work presents CVExploit, a semi-automated multi-agent framework designed to automate this phase through the use of a Large Language Model (LLM). A pipeline was designed to automatically generate an exploit starting from the information associated with a vulnerability's CVE and the details provided about the target system. The architecture integrates problem decomposition, validation, and code refinement mechanisms to improve the robustness and reliability of the generation process.

The framework was evaluated on a set of 32 CVEs, achieving an overall success rate of 65.6%. The results show that integrating an LLM into a carefully designed architecture can provide concrete support for exploitation activities in penetration testing.

The source code for the CVExploit framework and the results collected during the evaluation are available in this GitHub repository.

Pipeline Architecture

pipeline

Repository Structure

├── README.md
├── .env.example     # Example template for the .env file
├── config_info.yaml # File containing the framework configuration parameters
├── Ablation_Study/  # Implementation of the different pipelines tested in the ablation study
├── CVE/             # Vulnerable environments for the tested CVEs
├── Executors/       # Isolated environment for exploit execution
│   └── Python-Docker-Executor/
├── Pipeline/        # Implementation of the framework's main pipeline
├── Results/         # Results obtained while testing the framework
└── Setup/           # Scripts and files for setting up the working environment

Framework Usage

Prerequisites

Before getting started, make sure you have a Windows or Linux machine configured with the following software:

  • WSL (Windows Subsystem for Linux): Mandatory on Windows to use searchsploit and support Docker.
  • Docker: Required to manage both the vulnerable environment and the attack environment.
    • Note for Linux users: Docker must be configured to run without root privileges (sudo). Run the command below and restart your session or machine:
      sudo usermod -aG docker $USER
  • Ollama: Engine used for the embedding model and local models.
    • Note: If the service is hosted on a remote machine, see section 1 API Keys and Services for OLLAMA_HOST. In that case, a local installation is not required.
  • Python and PIP: Version 3.11 or later is recommended.

Configuration

Once the prerequisites are satisfied, proceed with the working environment setup.

  • Note for Linux users: From the project root, you must assign execution permissions to the scripts:
    sudo chmod -R 755 .

1. API Keys and Services

  • Rename .env.example to .env and open it with a text editor. Its content should be:
GOOGLE_API_KEY=insert_key_here

GROQ_API_KEY=insert_key_here

OLLAMA_HOST=http://localhost:11434

GITHUB_TOKEN=insert_token_here

LANGSMITH_API_KEY=insert_key_here
  • Configure the API keys and services according to these rules:
API Key / Service Rule
GOOGLE_API_KEY Required if you use GOOGLE as the LLM provider
GROQ_API_KEY Required if you use GROQ as the LLM provider
OLLAMA_HOST Always required. HTTP endpoint of the Ollama service. Change it only if Ollama is not running locally, specifying the IP address or hostname and the port on which the service is listening
GITHUB_TOKEN Always required
LANGSMITH_API_KEY Required only if you want to enable tracing through LangSmith
ℹ️ Click here to learn how to obtain the API keys and tokens

1. GOOGLE_API_KEY

  • Go to Google AI Studio at Get API key.
  • Click Create API Key.
  • Fill in the requested information, click Create key, and copy the generated string.

2. GROQ_API_KEY

  • Go to GROQ at API Keys.
  • Click Create API Key.
  • Fill in the requested information, click Submit, and copy the generated string.

3. GITHUB_TOKEN

  • Go to GitHub at Developer Settings > Personal access tokens.
  • Click Generate new token (classic).
  • Assign a name, for example GIT-REPO-token, set an Expiration Date, and select public_repo permissions.
  • Click Generate token and copy the generated string.

4. LANGSMITH_API_KEY

  • Go to LangSmith at Settings > API Keys.
  • Click + API Key in the top-right corner.
  • Fill in the description field, select Personal Access Token as the key type, configure the workspace, and choose an Expiration Date.
  • Click Create API Key and copy the generated string.

2. Default Parameters

The config_info.yaml file contains the parameters required for the pipeline to operate. You can modify them based on your needs:

Parameter Default Value Description
TARGET CVE
CVE_ID CVE-2014-6271 Identifier of the CVE you want to try to exploit.
PROVIDER AND LLM SETTINGS
PROVIDER google LLM provider. Supported options: google, groq, and ollama.
NAME_MODEL gemini-2.5-flash Specific model name to use.
TEMPERATURE 0.0 Degree of model creativity. 0.0 enables deterministic and reproducible results.
RETRIES 5 Number of model retries in case of errors.
RATE_LIMITER_REQUESTS_PER_MINUTE 1000 Requests-per-minute limit to avoid rate-limiting errors from Google and Groq providers.
INPUT_TOKEN_PER_MILLION 0.30 Cost in USD for one million input tokens for the selected model.
OUTPUT_TOKEN_PER_MILLION 2.50 Cost in USD for one million output tokens for the selected model.
MAX_TOKENS 1000000 Maximum number of input tokens the model can process.
SAFETY_MARGIN 1000 Safety margin to avoid exceeding the model's maximum input token limit.
DOCUMENT RETRIEVAL AND CHUNKING SETTINGS (RAG)
CHUNK_SIZE 1024 Size of the text chunks into which documents are split.
CHUNK_OVERLAP 100 Number of overlapping characters between adjacent chunks.
NUMBER_CHUNK 10 Maximum number of relevant text chunks that can be retrieved.
EMBEDDING_MODEL embeddinggemma:300m Model used through Ollama to generate document embeddings. You can specify a different model that was previously downloaded through Ollama.
EXECUTION AND VALIDATION PARAMETERS
MAX_CVE_URL_DOCUMENT 5 Number of documents related to the target CVE that is considered sufficient to avoid analyzing documents for similar CVEs as well.
MAX_VALIDATION_RETRIES 3 Maximum number of attempts for validation checks.
MAX_REFINEMENT_CYCLES 6 Maximum number of automatic corrections after exploit generation.
TOOLS
EXTERNAL_TOOLS jmet-0.1.0-all.jar, ysoserial-all.jar, marshalsec-0.0.3-SNAPSHOT-all.jar, ColdFusionPwn-0.0.1-SNAPSHOT-all.jar Tools that the LLM can use inside the exploit code. To add a tool, insert its name in this list and copy the related file into Executors/Python-Docker-Executor/TOOL/.

Warning: if you change the value of NAME_MODEL, you must also update MAX_TOKENS, RATE_LIMITER_REQUESTS_PER_MINUTE, INPUT_TOKEN_PER_MILLION, and OUTPUT_TOKEN_PER_MILLION accordingly.

3. Create the Virtual Environment (Venv)

It is good practice to isolate project dependencies. From the project root, run:

python3 -m venv venv

Then activate the virtual environment:

  • Windows:
    .\venv\Scripts\activate
  • Linux:
    source venv/bin/activate

4. Dependencies, SearchSploit, Default Embedding Model

Once the virtual environment is active, run the setup script. This command installs the Python libraries, configures searchsploit, and downloads the default embedding model.

python3 Setup/setup.py

Start the Framework

Check the README of the selected CVE in the corresponding subdirectory inside CVE, especially the Vulnerable Environment Setup and Post-Execution Framework sections, to understand whether any action is required before starting the framework and how to verify whether the exploit succeeded.

To start CVExploit using the pipeline shown in the figure, run:

python3 Pipeline/main.py

If you want to reproduce the ablation-study experiments instead, run:

python3 Ablation_Study/main.py

In that case, when the framework starts you will be asked to select which pipeline to use:

  • Pipeline 1: does not include validation checks, code refinement/correction, or the ability for the user to provide input.
  • Pipeline 2: includes validation checks, but does not include code refinement/correction or user input.
  • Pipeline 3: includes code refinement/correction, but does not include validation checks or user input.
  • Pipeline 4: includes both validation checks and code refinement/correction, but does not allow user input.
  • Pipeline 5: corresponds to the complete configuration shown in the figure. It integrates validation checks, code refinement/correction, and the ability for the user to provide input.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors