Contextual Counterfactual Credit Assignment
Project Page | arXiv | PDF | Getting Started | Code Map | Implementation Checklist | Release Checklist
Reference implementation for the paper Contextual Counterfactual Credit Assignment for Multi-Agent Reinforcement Learning in LLM Collaboration.
Paper status: now available on arXiv as 2603.06859. The companion project page is available at eit-east-lab.github.io/C3, and the official PDF is available here.
Terminal-only feedback in multi-agent LLM collaboration diffuses credit across an entire trajectory. C3 freezes transcript-derived context and estimates local causal credit with fixed-context replay plus a leave-one-out baseline, outperforming MAPPO and MAGRPO while improving fidelity, variance, and inter-agent influence under matched budgets.
Mechanism | Results | Quickstart | Workflows | Release Gate
Figure 1: C3 mechanism overview. Protocol-level replay, fixed-context alternatives, and leave-one-out credit assignment.
Main paper results: Sample-efficiency and performance trajectories against baseline methods (e.g., MAPPO, MAGRPO). C3 reaches superior performance given matched training token budgets.
C3 is a paper-aligned research codebase designed to support:
- Protocol Reproduction: Executing the multi-agent task protocols described in the paper.
- Experiment Execution: Running full training sweeps, evaluation-only probes, and performance analyses.
- Implementation Audit: Providing a transparent mapping from theoretical mechanisms to executable code.
- Extension and Testing: Modifying or testing the framework without relying on private paths or bundled artifacts.
To maintain a clean public-release surface, this repository does not distribute:
- Prepared datasets
- Trained model checkpoints
- Cached model weights
- Generated run-time artifacts (e.g., run logs, checkpoints, reports, and experiment outputs)
Corresponding local directories (data/, artifacts/, ckpt/, runs/, wandb/, models/) are treated as local working outputs and remain empty or absent in the public release. For details, see the Release Policy.
- c3/: Core C3 implementation, including multi-agent protocol handling, environments, credit assignment logic, and analysis tools.
- openrlhf/: Vendored upstream RLHF training stack, augmented with C3-specific integration points.
- configs/: Configurations for tasks, roles, analyses, execution registries, and data manifests.
- scripts/: Entrypoints for data preparation, experiment reproduction, model utilities, and release gating.
- docs/: Documentation covering release policies, implementation audits, upstream provenance, and data-source contracts.
- project-page/: Static companion site for the paper, deployed via GitHub Pages.
- Project Page: C3 Paper Page
- Paper: arXiv Abstract
- PDF: arXiv PDF
- New User: Getting Started Guide
- Code Layout: Code Map
- Paper-to-Code Mapping: Implementation Audit
- Development Invariants: Implementation Checklist
- Data Provenance: Data Sources
- Release Verification: Release Checklist
- Upstream Lineage: Upstream Provenance
- OS: Linux (x86_64)
- Python: 3.11
- GPU Stack: NVIDIA GPUs with CUDA 12.8-compatible runtime
- PyTorch:
torch/torchaudio/torchvision == 2.9.0+cu128
If your execution environment differs substantially, you may need to manually adapt the pinned dependencies.
Create a standard Python environment, install the pinned stack, and validate the installation:
python -m pip install -U pip
python -m pip install -r requirements.txt --no-build-isolation
python -m pip checkInstallation Notes:
requirements.txtacts as a strict full-lock snapshot to guarantee reproducibility.--no-build-isolationis strongly recommended due to build-sensitive packages (e.g.,flash_attn).- Experiment reproduction scripts automatically export the repository root to
PYTHONPATHviascripts/reproduce/common_env.sh. - For lightweight local development or CI checks, an editable install is supported:
python -m pip install -e .[test]
# 1. Install dependencies
python -m pip install -r requirements.txt --no-build-isolation
# 2. Prepare local datasets (datasets are not shipped with the repo)
bash scripts/data/prepare_all.sh --out_dir data
# 3. Run a fast E2E wiring smoke test
bash scripts/reproduce/smoke.sh --task math --limit 1 --print_example 0Prepared datasets are generated locally from strictly pinned upstream sources. They are not bundled with the codebase.
bash scripts/data/prepare_all.sh --out_dir dataThe authoritative source of truth for all data derivations is configs/data_manifest.yaml. See Data Sources and Third-Party Notices for provenance details.
While Transformers/vLLM will automatically download weights on first use, we provide a dedicated utility to pre-download and cache all required HuggingFace base models. This is highly recommended for reproducibility, offline clusters, or avoiding concurrent download races:
# Log in to Hugging Face (required for gated models like Qwen)
huggingface-cli login
# Pre-download all base models referenced in the results registry
bash scripts/models/download_models.sh \
--registry configs/main_results_registry.yaml \
--out_dir modelsFast end-to-end wiring check to verify environment and protocol integrity:
bash scripts/reproduce/smoke.shRun the evaluation matrix exclusively for the SFT baseline:
bash scripts/reproduce/paper_main_results.sh sweep \
--registry configs/main_results_registry.yaml \
--only_methods SFTExecute the complete set of model training runs:
export PRETRAIN='Qwen/Qwen2.5-3B-Instruct'
bash scripts/reproduce/paper_train.shRun the full paper evaluation matrix across all methods:
bash scripts/reproduce/paper_main_results.sh sweep \
--registry configs/main_results_registry.yamlGenerate analysis figures directly from local run directories:
bash scripts/reproduce/paper_analysis_figs.sh fig2 \
--suite math \
--run_c3 ckpt/_runs/<C3_run_dir> \
--run_mappo ckpt/_runs/<MAPPO_run_dir> \
--run_magrpo ckpt/_runs/<MAGRPO_run_dir> \
--run_sft ckpt/_runs/_sft_main_results/<SFT_dir> \
--mappo_critic_ckpt <PATH_TO_MAPPO_CRITIC>Note on C3 Algorithm Location: The primary credit assignment mechanism discussed in the paper is not located in c3/algorithms/c3.py (which serves as a fallback compatibility calculator). The paper-facing C3 implementation is deeply integrated into the experience generation phase. Key entry points include:
Please consult the Implementation Audit for a comprehensive mapping between the paper's theoretical framework and the codebase.
Before making any public release, execute the pre-release checks:
bash scripts/audit/pre_release.shTo run the complete local release gate (including syntax checks and unit tests):
bash scripts/audit/release_gate.shFor a single-command preflight reproduction check:
bash scripts/reproduce/preflight_repro.sh --task mathThese gating scripts rigorously verify:
- Absence of hard-coded private paths
- Absence of obvious leaked secrets
- Absence of bundled datasets or generated release-surface artifacts
- Bash scripting syntax sanity
- Python compilation and test suite sanity
This repository includes standard open-source policy files:
- CONTRIBUTING.md
- CODE_OF_CONDUCT.md
- SECURITY.md
- .github/ISSUE_TEMPLATE
- .github/pull_request_template.md
- Code license: Apache-2.0
- Citation metadata: CITATION.cff
- Third-party notices: THIRD_PARTY_NOTICES.md
- Upstream provenance: docs/UPSTREAM.md
We deeply appreciate the open-source community for their foundational work. In particular, we would like to acknowledge:
- OpenRLHF: Our RL infrastructure is built upon the OpenRLHF framework. We are grateful to the OpenRLHF team for providing an easy-to-use, scalable, and high-performance agentic RL foundation based on Ray and vLLM.
If you find this repository or paper useful for your research, please cite:
@misc{chen2026contextualcounterfactualcreditassignment,
title={Contextual Counterfactual Credit Assignment for Multi-Agent Reinforcement Learning in LLM Collaboration},
author={Yanjun Chen and Yirong Sun and Hanlin Wang and Xinming Zhang and Xiaoyu Shen and Wenjie Li and Wei Zhang},
year={2026},
eprint={2603.06859},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2603.06859}
}

