AgentDropoutV2

This repository anonymously releases the codes and data for the paper -- AgentDropoutV2: Optimizing Information Flow in Multi-Agent Systems via Test-Time Rectify-or-Reject Pruning.

AgentDropoutV2: Optimizing Information Flow in Multi-Agent Systems via Test-Time Rectify-or-Reject Pruning

🛡️ About AgentDropoutV2

AgentDropoutV2 is a test-time framework designed to dynamically optimize information flow in Multi-Agent Systems (MAS) without expensive retraining.

It acts as an active firewall during MAS execution:

Intercept: It intercepts agent outputs before they are broadcast.
Rectify: A retrieval-augmented rectifier scrutinizes the output using a Failure-Driven Indicator Pool (constructed from historical error patterns). It provides targeted feedback for iterative correction.
Reject: If the output remains flawed after maximum retries, it is pruned to prevent error propagation.
Fallback: A safeguard mechanism preserves structural integrity if too many agents are pruned.

The Framework of AgentDropout

📜 File Structure

The repository is organized into two main components: train (for offline indicator pool construction) and test (for online inference).

🛠️ Requirements

This project can be reproduced with a single Python environment:

conda create -n myenv python=3.10.18
conda activate myenv
pip install -r requirements.txt

🚀 Quick Start

1. Test-Time Inference (Main Workflow)

Before running this part: many *.sh and *.py files contain configurable fields such as model name, API URL/base URL, API key, and data/output paths (often marked as #### / ###). Please fill them based on your setup first.

Generate trigger embeddings for the indicator pool
```
cd test/metrics_pool/two_pool
python embed_metrics-trigger.py
```
This step generates the .jsonl embedding cache required at test time.
Run evaluation scripts to produce result files
```
cd ../../
bash run-xxx.sh
```
Replace run-xxx.sh with your target benchmark script (for example: run-math500.sh, run-aqua.sh, run-livecode.sh).
Compute final accuracy from result file
```
python calc_accuracy.py
```
Set FILE_PATH in calc_accuracy.py to your generated result file path before running.

2. Training & Build Your Own Indicator Pool (Optional)

Before running this part: scripts in train/ also require you to fill model, URL/base URL, API key, and local path settings according to your environment.

Run training-time scripts to get raw result files
```
cd train
bash run-xxx.sh
```
Replace run-xxx.sh with your selected training script (for example: run-math-train.sh, run-aqua-train.sh).
Extract, deduplicate, and build trigger embeddings
```
python Extraction-deduplication-embedding.py
```
This script completes:
- extraction of raw indicators from training outputs,
- deduplication into a cleaner metric pool,
- embedding generation for trigger_condition (used by test-time retrieval).
Use your custom pool in test scripts Update METRIC_POOL_FILE and EMBEDDING_CACHE_FILE in test/run-*.sh to point to your newly generated files.

🧾 Common Argument Reference

Core experiment files in both train and test sections use the following arguments:

Argument	Description
`--in_file` / `--out_file`	Input dataset path and output result path.
`--log_file`	Detailed per-worker log file (optional). If omitted, a default `_full.log` or `_detailed.log` is auto-generated from `out_file`.
`--selector_url` / `--selector_model` / `--selector_key`	Endpoint, model, and API key for the selector/planner model.
`--reasoning_url` / `--reasoning_model` / `--reasoning_key`	Endpoint, model, and API key for participant reasoning and final answer generation.
`--supervisor_url` / `--supervisor_model` / `--supervisor_key`	Endpoint, model, and API key for the supervisor/auditor model.
`--embedding_url` / `--embedding_model` / `--embedding_key`	Endpoint, model, and API key for embedding service used in retrieval/audit.
`--metric_pool_file` / `--embedding_cache_file`	Indicator pool file and precomputed embedding cache file (`.jsonl`).
`--max_turns`	Maximum number of MAS conversation turns.
`--pass_rate`	Pruning/audit threshold used by supervisor.
`--retries_times`	Total retry budget for one agent output, including one final decision attempt.
`--direct_k` / `--random_k`	Test-time indicator-pool retrieval sizes: `direct_k` is the number of RAG-retrieved indicators, while `random_k` is the number of randomly sampled indicators.
`--use_simple_audit`	Enable simplified audit mode.
`--baseline_only`	Run baseline MAS without audit/pruning.
`--limit`	Run only a subset of the dataset for debugging or controlled experiments.

Notes:

#### / ### are placeholders in scripts. Replace them with your real model names, URLs, and keys.
For OpenAI-compatible local endpoints (for example, vLLM), dummy keys such as EMPTY are usually acceptable if auth is not enforced.

Acknowledgments

This code framework is based on AgentDropout.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
image/readme		image/readme
test		test
train		train
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AgentDropoutV2

🔗 Quick Links

🛡️ About AgentDropoutV2

📜 File Structure

🛠️ Requirements

🚀 Quick Start

1. Test-Time Inference (Main Workflow)

2. Training & Build Your Own Indicator Pool (Optional)

🧾 Common Argument Reference

Acknowledgments

About

Uh oh!

Releases

Packages

Contributors 2

Languages

TonySY2/AgentDropoutV2

Folders and files

Latest commit

History

Repository files navigation

AgentDropoutV2

🔗 Quick Links

🛡️ About AgentDropoutV2

📜 File Structure

🛠️ Requirements

🚀 Quick Start

1. Test-Time Inference (Main Workflow)

2. Training & Build Your Own Indicator Pool (Optional)

🧾 Common Argument Reference

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages