CVExploit: A Semi-Automated Multi-Agent Framework for Offensive Code Generation

In cybersecurity, proactive strategies aimed at identifying vulnerabilities before they can be exploited play a major role. Among these strategies is penetration testing which, despite its importance, remains a complex process that requires strong technical expertise and a significant time investment. One of the most critical phases is exploitation, during which a Proof of Concept (PoC) is developed to concretely verify whether an identified vulnerability is actually exploitable.

This work presents CVExploit, a semi-automated multi-agent framework designed to automate this phase through the use of a Large Language Model (LLM). A pipeline was designed to automatically generate an exploit starting from the information associated with a vulnerability's CVE and the details provided about the target system. The architecture integrates problem decomposition, validation, and code refinement mechanisms to improve the robustness and reliability of the generation process.

The framework was evaluated on a set of 32 CVEs, achieving an overall success rate of 65.6%. The results show that integrating an LLM into a carefully designed architecture can provide concrete support for exploitation activities in penetration testing.

The source code for the CVExploit framework and the results collected during the evaluation are available in this GitHub repository.

Pipeline Architecture

Repository Structure

├── README.md
├── .env.example     # Example template for the .env file
├── config_info.yaml # File containing the framework configuration parameters
├── Ablation_Study/  # Implementation of the different pipelines tested in the ablation study
├── CVE/             # Vulnerable environments for the tested CVEs
├── Executors/       # Isolated environment for exploit execution
│   └── Python-Docker-Executor/
├── Pipeline/        # Implementation of the framework's main pipeline
├── Results/         # Results obtained while testing the framework
└── Setup/           # Scripts and files for setting up the working environment

Framework Usage

Prerequisites

Before getting started, make sure you have a Windows or Linux machine configured with the following software:

WSL (Windows Subsystem for Linux): Mandatory on Windows to use searchsploit and support Docker.
Docker: Required to manage both the vulnerable environment and the attack environment.
- Note for Linux users: Docker must be configured to run without root privileges (sudo). Run the command below and restart your session or machine:
```
sudo usermod -aG docker $USER
```
Ollama: Engine used for the embedding model and local models.
- Note: If the service is hosted on a remote machine, see section 1 API Keys and Services for OLLAMA_HOST. In that case, a local installation is not required.
Python and PIP: Version 3.11 or later is recommended.

Configuration

Once the prerequisites are satisfied, proceed with the working environment setup.

Note for Linux users: From the project root, you must assign execution permissions to the scripts:
```
sudo chmod -R 755 .
```

1. API Keys and Services

Rename .env.example to .env and open it with a text editor. Its content should be:

GOOGLE_API_KEY=insert_key_here

GROQ_API_KEY=insert_key_here

OLLAMA_HOST=http://localhost:11434

GITHUB_TOKEN=insert_token_here

LANGSMITH_API_KEY=insert_key_here

Configure the API keys and services according to these rules:

API Key / Service	Rule
`GOOGLE_API_KEY`	Required if you use GOOGLE as the LLM provider
`GROQ_API_KEY`	Required if you use GROQ as the LLM provider
`OLLAMA_HOST`	Always required. HTTP endpoint of the Ollama service. Change it only if Ollama is not running locally, specifying the IP address or hostname and the port on which the service is listening
`GITHUB_TOKEN`	Always required
`LANGSMITH_API_KEY`	Required only if you want to enable tracing through LangSmith

ℹ️ Click here to learn how to obtain the API keys and tokens

1. GOOGLE_API_KEY

Go to Google AI Studio at Get API key.

Click Create API Key.

Fill in the requested information, click Create key, and copy the generated string.

2. GROQ_API_KEY

Go to GROQ at API Keys.

Click Create API Key.

Fill in the requested information, click Submit, and copy the generated string.

3. GITHUB_TOKEN

Go to GitHub at Developer Settings > Personal access tokens.

Click Generate new token (classic).

Assign a name, for example GIT-REPO-token, set an Expiration Date, and select public_repo permissions.

Click Generate token and copy the generated string.

4. LANGSMITH_API_KEY

Go to LangSmith at Settings > API Keys.

Click + API Key in the top-right corner.

Fill in the description field, select Personal Access Token as the key type, configure the workspace, and choose an Expiration Date.

Click Create API Key and copy the generated string.

2. Default Parameters

The config_info.yaml file contains the parameters required for the pipeline to operate. You can modify them based on your needs:

Parameter	Default Value	Description
TARGET CVE
`CVE_ID`	`CVE-2014-6271`	Identifier of the CVE you want to try to exploit.
PROVIDER AND LLM SETTINGS
`PROVIDER`	`google`	LLM provider. Supported options: `google`, `groq`, and `ollama`.
`NAME_MODEL`	`gemini-2.5-flash`	Specific model name to use.
`TEMPERATURE`	`0.0`	Degree of model creativity. `0.0` enables deterministic and reproducible results.
`RETRIES`	`5`	Number of model retries in case of errors.
`RATE_LIMITER_REQUESTS_PER_MINUTE`	`1000`	Requests-per-minute limit to avoid rate-limiting errors from Google and Groq providers.
`INPUT_TOKEN_PER_MILLION`	`0.30`	Cost in USD for one million input tokens for the selected model.
`OUTPUT_TOKEN_PER_MILLION`	`2.50`	Cost in USD for one million output tokens for the selected model.
`MAX_TOKENS`	`1000000`	Maximum number of input tokens the model can process.
`SAFETY_MARGIN`	`1000`	Safety margin to avoid exceeding the model's maximum input token limit.
DOCUMENT RETRIEVAL AND CHUNKING SETTINGS (RAG)
`CHUNK_SIZE`	`1024`	Size of the text chunks into which documents are split.
`CHUNK_OVERLAP`	`100`	Number of overlapping characters between adjacent chunks.
`NUMBER_CHUNK`	`10`	Maximum number of relevant text chunks that can be retrieved.
`EMBEDDING_MODEL`	`embeddinggemma:300m`	Model used through Ollama to generate document embeddings. You can specify a different model that was previously downloaded through Ollama.
EXECUTION AND VALIDATION PARAMETERS
`MAX_CVE_URL_DOCUMENT`	`5`	Number of documents related to the target CVE that is considered sufficient to avoid analyzing documents for similar CVEs as well.
`MAX_VALIDATION_RETRIES`	`3`	Maximum number of attempts for validation checks.
`MAX_REFINEMENT_CYCLES`	`6`	Maximum number of automatic corrections after exploit generation.
TOOLS
`EXTERNAL_TOOLS`	`jmet-0.1.0-all.jar, ysoserial-all.jar, marshalsec-0.0.3-SNAPSHOT-all.jar, ColdFusionPwn-0.0.1-SNAPSHOT-all.jar`	Tools that the LLM can use inside the exploit code. To add a tool, insert its name in this list and copy the related file into `Executors/Python-Docker-Executor/TOOL/`.

Warning: if you change the value of NAME_MODEL, you must also update MAX_TOKENS, RATE_LIMITER_REQUESTS_PER_MINUTE, INPUT_TOKEN_PER_MILLION, and OUTPUT_TOKEN_PER_MILLION accordingly.

3. Create the Virtual Environment (Venv)

It is good practice to isolate project dependencies. From the project root, run:

python3 -m venv venv

Then activate the virtual environment:

Windows:
```
.\venv\Scripts\activate
```
Linux:
```
source venv/bin/activate
```

4. Dependencies, SearchSploit, Default Embedding Model

Once the virtual environment is active, run the setup script. This command installs the Python libraries, configures searchsploit, and downloads the default embedding model.

python3 Setup/setup.py

Start the Framework

Check the README of the selected CVE in the corresponding subdirectory inside CVE, especially the Vulnerable Environment Setup and Post-Execution Framework sections, to understand whether any action is required before starting the framework and how to verify whether the exploit succeeded.

To start CVExploit using the pipeline shown in the figure, run:

python3 Pipeline/main.py

If you want to reproduce the ablation-study experiments instead, run:

python3 Ablation_Study/main.py

In that case, when the framework starts you will be asked to select which pipeline to use:

Pipeline 1: does not include validation checks, code refinement/correction, or the ability for the user to provide input.
Pipeline 2: includes validation checks, but does not include code refinement/correction or user input.
Pipeline 3: includes code refinement/correction, but does not include validation checks or user input.
Pipeline 4: includes both validation checks and code refinement/correction, but does not allow user input.
Pipeline 5: corresponds to the complete configuration shown in the figure. It integrates validation checks, code refinement/correction, and the ability for the user to provide input.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CVExploit: A Semi-Automated Multi-Agent Framework for Offensive Code Generation

Pipeline Architecture

Repository Structure

Framework Usage

Prerequisites

Configuration

1. API Keys and Services

2. Default Parameters

3. Create the Virtual Environment (Venv)

4. Dependencies, SearchSploit, Default Embedding Model

Start the Framework

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 189 Commits
Ablation_Study		Ablation_Study
CVE		CVE
Executors/Python-Docker-Executor		Executors/Python-Docker-Executor
Pipeline		Pipeline
Results		Results
Setup		Setup
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
config_info.yaml		config_info.yaml

Folders and files

Latest commit

History

Repository files navigation

CVExploit: A Semi-Automated Multi-Agent Framework for Offensive Code Generation

Pipeline Architecture

Repository Structure

Framework Usage

Prerequisites

Configuration

1. API Keys and Services

2. Default Parameters

3. Create the Virtual Environment (Venv)

4. Dependencies, SearchSploit, Default Embedding Model

Start the Framework

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages