This repository anonymously releases the codes and data for the paper -- AgentDropoutV2: Optimizing Information Flow in Multi-Agent Systems via Test-Time Rectify-or-Reject Pruning.
AgentDropoutV2: Optimizing Information Flow in Multi-Agent Systems via Test-Time Rectify-or-Reject Pruning
AgentDropoutV2 is a test-time framework designed to dynamically optimize information flow in Multi-Agent Systems (MAS) without expensive retraining.
It acts as an active firewall during MAS execution:
- Intercept: It intercepts agent outputs before they are broadcast.
- Rectify: A retrieval-augmented rectifier scrutinizes the output using a Failure-Driven Indicator Pool (constructed from historical error patterns). It provides targeted feedback for iterative correction.
- Reject: If the output remains flawed after maximum retries, it is pruned to prevent error propagation.
- Fallback: A safeguard mechanism preserves structural integrity if too many agents are pruned.
The Framework of AgentDropout
The repository is organized into two main components: train (for offline indicator pool construction) and test (for online inference).
This project can be reproduced with a single Python environment:
conda create -n myenv python=3.10.18
conda activate myenv
pip install -r requirements.txtBefore running this part: many
*.shand*.pyfiles contain configurable fields such as model name, API URL/base URL, API key, and data/output paths (often marked as####/###). Please fill them based on your setup first.
-
Generate trigger embeddings for the indicator pool
cd test/metrics_pool/two_pool python embed_metrics-trigger.pyThis step generates the
.jsonlembedding cache required at test time. -
Run evaluation scripts to produce result files
cd ../../ bash run-xxx.shReplace
run-xxx.shwith your target benchmark script (for example:run-math500.sh,run-aqua.sh,run-livecode.sh). -
Compute final accuracy from result file
python calc_accuracy.py
Set
FILE_PATHincalc_accuracy.pyto your generated result file path before running.
Before running this part: scripts in
train/also require you to fill model, URL/base URL, API key, and local path settings according to your environment.
-
Run training-time scripts to get raw result files
cd train bash run-xxx.shReplace
run-xxx.shwith your selected training script (for example:run-math-train.sh,run-aqua-train.sh). -
Extract, deduplicate, and build trigger embeddings
python Extraction-deduplication-embedding.py
This script completes:
- extraction of raw indicators from training outputs,
- deduplication into a cleaner metric pool,
- embedding generation for
trigger_condition(used by test-time retrieval).
-
Use your custom pool in test scripts Update
METRIC_POOL_FILEandEMBEDDING_CACHE_FILEintest/run-*.shto point to your newly generated files.
Core experiment files in both train and test sections use the following arguments:
| Argument | Description |
|---|---|
--in_file / --out_file |
Input dataset path and output result path. |
--log_file |
Detailed per-worker log file (optional). If omitted, a default *_full.log or *_detailed.log is auto-generated from out_file. |
--selector_url / --selector_model / --selector_key |
Endpoint, model, and API key for the selector/planner model. |
--reasoning_url / --reasoning_model / --reasoning_key |
Endpoint, model, and API key for participant reasoning and final answer generation. |
--supervisor_url / --supervisor_model / --supervisor_key |
Endpoint, model, and API key for the supervisor/auditor model. |
--embedding_url / --embedding_model / --embedding_key |
Endpoint, model, and API key for embedding service used in retrieval/audit. |
--metric_pool_file / --embedding_cache_file |
Indicator pool file and precomputed embedding cache file (.jsonl). |
--max_turns |
Maximum number of MAS conversation turns. |
--pass_rate |
Pruning/audit threshold used by supervisor. |
--retries_times |
Total retry budget for one agent output, including one final decision attempt. |
--direct_k / --random_k |
Test-time indicator-pool retrieval sizes: direct_k is the number of RAG-retrieved indicators, while random_k is the number of randomly sampled indicators. |
--use_simple_audit |
Enable simplified audit mode. |
--baseline_only |
Run baseline MAS without audit/pruning. |
--limit |
Run only a subset of the dataset for debugging or controlled experiments. |
Notes:
####/###are placeholders in scripts. Replace them with your real model names, URLs, and keys.- For OpenAI-compatible local endpoints (for example, vLLM), dummy keys such as
EMPTYare usually acceptable if auth is not enforced.
This code framework is based on AgentDropout.

