Skip to content

TonySY2/AgentDropoutV2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AgentDropoutV2

This repository anonymously releases the codes and data for the paper -- AgentDropoutV2: Optimizing Information Flow in Multi-Agent Systems via Test-Time Rectify-or-Reject Pruning.

AgentDropoutV2 Logo

AgentDropoutV2: Optimizing Information Flow in Multi-Agent Systems via Test-Time Rectify-or-Reject Pruning

🔗 Quick Links

🛡️ About AgentDropoutV2

AgentDropoutV2 is a test-time framework designed to dynamically optimize information flow in Multi-Agent Systems (MAS) without expensive retraining.

It acts as an active firewall during MAS execution:

  1. Intercept: It intercepts agent outputs before they are broadcast.
  2. Rectify: A retrieval-augmented rectifier scrutinizes the output using a Failure-Driven Indicator Pool (constructed from historical error patterns). It provides targeted feedback for iterative correction.
  3. Reject: If the output remains flawed after maximum retries, it is pruned to prevent error propagation.
  4. Fallback: A safeguard mechanism preserves structural integrity if too many agents are pruned.

Main Picture

The Framework of AgentDropout

📜 File Structure

The repository is organized into two main components: train (for offline indicator pool construction) and test (for online inference).

🛠️ Requirements

This project can be reproduced with a single Python environment:

conda create -n myenv python=3.10.18
conda activate myenv
pip install -r requirements.txt

🚀 Quick Start

1. Test-Time Inference (Main Workflow)

Before running this part: many *.sh and *.py files contain configurable fields such as model name, API URL/base URL, API key, and data/output paths (often marked as #### / ###). Please fill them based on your setup first.

  1. Generate trigger embeddings for the indicator pool

    cd test/metrics_pool/two_pool
    python embed_metrics-trigger.py

    This step generates the .jsonl embedding cache required at test time.

  2. Run evaluation scripts to produce result files

    cd ../../
    bash run-xxx.sh

    Replace run-xxx.sh with your target benchmark script (for example: run-math500.sh, run-aqua.sh, run-livecode.sh).

  3. Compute final accuracy from result file

    python calc_accuracy.py

    Set FILE_PATH in calc_accuracy.py to your generated result file path before running.

2. Training & Build Your Own Indicator Pool (Optional)

Before running this part: scripts in train/ also require you to fill model, URL/base URL, API key, and local path settings according to your environment.

  1. Run training-time scripts to get raw result files

    cd train
    bash run-xxx.sh

    Replace run-xxx.sh with your selected training script (for example: run-math-train.sh, run-aqua-train.sh).

  2. Extract, deduplicate, and build trigger embeddings

    python Extraction-deduplication-embedding.py

    This script completes:

    • extraction of raw indicators from training outputs,
    • deduplication into a cleaner metric pool,
    • embedding generation for trigger_condition (used by test-time retrieval).
  3. Use your custom pool in test scripts Update METRIC_POOL_FILE and EMBEDDING_CACHE_FILE in test/run-*.sh to point to your newly generated files.

🧾 Common Argument Reference

Core experiment files in both train and test sections use the following arguments:

Argument Description
--in_file / --out_file Input dataset path and output result path.
--log_file Detailed per-worker log file (optional). If omitted, a default *_full.log or *_detailed.log is auto-generated from out_file.
--selector_url / --selector_model / --selector_key Endpoint, model, and API key for the selector/planner model.
--reasoning_url / --reasoning_model / --reasoning_key Endpoint, model, and API key for participant reasoning and final answer generation.
--supervisor_url / --supervisor_model / --supervisor_key Endpoint, model, and API key for the supervisor/auditor model.
--embedding_url / --embedding_model / --embedding_key Endpoint, model, and API key for embedding service used in retrieval/audit.
--metric_pool_file / --embedding_cache_file Indicator pool file and precomputed embedding cache file (.jsonl).
--max_turns Maximum number of MAS conversation turns.
--pass_rate Pruning/audit threshold used by supervisor.
--retries_times Total retry budget for one agent output, including one final decision attempt.
--direct_k / --random_k Test-time indicator-pool retrieval sizes: direct_k is the number of RAG-retrieved indicators, while random_k is the number of randomly sampled indicators.
--use_simple_audit Enable simplified audit mode.
--baseline_only Run baseline MAS without audit/pruning.
--limit Run only a subset of the dataset for debugging or controlled experiments.

Notes:

  • #### / ### are placeholders in scripts. Replace them with your real model names, URLs, and keys.
  • For OpenAI-compatible local endpoints (for example, vLLM), dummy keys such as EMPTY are usually acceptable if auth is not enforced.

Acknowledgments

This code framework is based on AgentDropout.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published