Skip to content

Battam1111/C3

C3 Logo

Contextual Counterfactual Credit Assignment

Project Page arXiv 2603.06859 arXiv PDF Paper Implementation Audit Release Policy CI Lite Gate License Apache 2.0 Python 3.11

Project Page | arXiv | PDF | Getting Started | Code Map | Implementation Checklist | Release Checklist

Reference implementation for the paper Contextual Counterfactual Credit Assignment for Multi-Agent Reinforcement Learning in LLM Collaboration.

Paper status: now available on arXiv as 2603.06859. The companion project page is available at eit-east-lab.github.io/C3, and the official PDF is available here.

TL;DR

Terminal-only feedback in multi-agent LLM collaboration diffuses credit across an entire trajectory. C3 freezes transcript-derived context and estimates local causal credit with fixed-context replay plus a leave-one-out baseline, outperforming MAPPO and MAGRPO while improving fidelity, variance, and inter-agent influence under matched budgets.

Mechanism | Results | Quickstart | Workflows | Release Gate

Core Mechanism

C3 mechanism overview figure

Figure 1: C3 mechanism overview. Protocol-level replay, fixed-context alternatives, and leave-one-out credit assignment.

Key Results

Pareto return versus token budget   Learning dynamics comparison across methods

Main paper results: Sample-efficiency and performance trajectories against baseline methods (e.g., MAPPO, MAGRPO). C3 reaches superior performance given matched training token budgets.

Repository Purpose

C3 is a paper-aligned research codebase designed to support:

  • Protocol Reproduction: Executing the multi-agent task protocols described in the paper.
  • Experiment Execution: Running full training sweeps, evaluation-only probes, and performance analyses.
  • Implementation Audit: Providing a transparent mapping from theoretical mechanisms to executable code.
  • Extension and Testing: Modifying or testing the framework without relying on private paths or bundled artifacts.

Release Scope

To maintain a clean public-release surface, this repository does not distribute:

  • Prepared datasets
  • Trained model checkpoints
  • Cached model weights
  • Generated run-time artifacts (e.g., run logs, checkpoints, reports, and experiment outputs)

Corresponding local directories (data/, artifacts/, ckpt/, runs/, wandb/, models/) are treated as local working outputs and remain empty or absent in the public release. For details, see the Release Policy.

Repository Structure

  • c3/: Core C3 implementation, including multi-agent protocol handling, environments, credit assignment logic, and analysis tools.
  • openrlhf/: Vendored upstream RLHF training stack, augmented with C3-specific integration points.
  • configs/: Configurations for tasks, roles, analyses, execution registries, and data manifests.
  • scripts/: Entrypoints for data preparation, experiment reproduction, model utilities, and release gating.
  • docs/: Documentation covering release policies, implementation audits, upstream provenance, and data-source contracts.
  • project-page/: Static companion site for the paper, deployed via GitHub Pages.

Quick Navigation

System Requirements

  • OS: Linux (x86_64)
  • Python: 3.11
  • GPU Stack: NVIDIA GPUs with CUDA 12.8-compatible runtime
  • PyTorch: torch/torchaudio/torchvision == 2.9.0+cu128

If your execution environment differs substantially, you may need to manually adapt the pinned dependencies.

Installation

Create a standard Python environment, install the pinned stack, and validate the installation:

python -m pip install -U pip
python -m pip install -r requirements.txt --no-build-isolation
python -m pip check

Installation Notes:

  • requirements.txt acts as a strict full-lock snapshot to guarantee reproducibility.
  • --no-build-isolation is strongly recommended due to build-sensitive packages (e.g., flash_attn).
  • Experiment reproduction scripts automatically export the repository root to PYTHONPATH via scripts/reproduce/common_env.sh.
  • For lightweight local development or CI checks, an editable install is supported: python -m pip install -e .[test]

30-Second Quickstart

# 1. Install dependencies
python -m pip install -r requirements.txt --no-build-isolation

# 2. Prepare local datasets (datasets are not shipped with the repo)
bash scripts/data/prepare_all.sh --out_dir data

# 3. Run a fast E2E wiring smoke test
bash scripts/reproduce/smoke.sh --task math --limit 1 --print_example 0

Data Preparation

Prepared datasets are generated locally from strictly pinned upstream sources. They are not bundled with the codebase.

bash scripts/data/prepare_all.sh --out_dir data

The authoritative source of truth for all data derivations is configs/data_manifest.yaml. See Data Sources and Third-Party Notices for provenance details.

Model Preparation

While Transformers/vLLM will automatically download weights on first use, we provide a dedicated utility to pre-download and cache all required HuggingFace base models. This is highly recommended for reproducibility, offline clusters, or avoiding concurrent download races:

# Log in to Hugging Face (required for gated models like Qwen)
huggingface-cli login

# Pre-download all base models referenced in the results registry
bash scripts/models/download_models.sh \
  --registry configs/main_results_registry.yaml \
  --out_dir models

Main Workflows

Smoke Test

Fast end-to-end wiring check to verify environment and protocol integrity:

bash scripts/reproduce/smoke.sh

SFT-only Main Results Sweep

Run the evaluation matrix exclusively for the SFT baseline:

bash scripts/reproduce/paper_main_results.sh sweep \
  --registry configs/main_results_registry.yaml \
  --only_methods SFT

Full Paper Training Matrix

Execute the complete set of model training runs:

export PRETRAIN='Qwen/Qwen2.5-3B-Instruct'
bash scripts/reproduce/paper_train.sh

Full Main Results Sweep

Run the full paper evaluation matrix across all methods:

bash scripts/reproduce/paper_main_results.sh sweep \
  --registry configs/main_results_registry.yaml

Paper Analyses

Generate analysis figures directly from local run directories:

bash scripts/reproduce/paper_analysis_figs.sh fig2 \
  --suite math \
  --run_c3 ckpt/_runs/<C3_run_dir> \
  --run_mappo ckpt/_runs/<MAPPO_run_dir> \
  --run_magrpo ckpt/_runs/<MAGRPO_run_dir> \
  --run_sft ckpt/_runs/_sft_main_results/<SFT_dir> \
  --mappo_critic_ckpt <PATH_TO_MAPPO_CRITIC>

Implementation Note

Note on C3 Algorithm Location: The primary credit assignment mechanism discussed in the paper is not located in c3/algorithms/c3.py (which serves as a fallback compatibility calculator). The paper-facing C3 implementation is deeply integrated into the experience generation phase. Key entry points include:

Please consult the Implementation Audit for a comprehensive mapping between the paper's theoretical framework and the codebase.

Audit and Release Gate

Before making any public release, execute the pre-release checks:

bash scripts/audit/pre_release.sh

To run the complete local release gate (including syntax checks and unit tests):

bash scripts/audit/release_gate.sh

For a single-command preflight reproduction check:

bash scripts/reproduce/preflight_repro.sh --task math

These gating scripts rigorously verify:

  • Absence of hard-coded private paths
  • Absence of obvious leaked secrets
  • Absence of bundled datasets or generated release-surface artifacts
  • Bash scripting syntax sanity
  • Python compilation and test suite sanity

Governance

This repository includes standard open-source policy files:

License and Attribution

Acknowledgements

We deeply appreciate the open-source community for their foundational work. In particular, we would like to acknowledge:

  • OpenRLHF: Our RL infrastructure is built upon the OpenRLHF framework. We are grateful to the OpenRLHF team for providing an easy-to-use, scalable, and high-performance agentic RL foundation based on Ray and vLLM.

Citation

If you find this repository or paper useful for your research, please cite:

@misc{chen2026contextualcounterfactualcreditassignment,
  title={Contextual Counterfactual Credit Assignment for Multi-Agent Reinforcement Learning in LLM Collaboration},
  author={Yanjun Chen and Yirong Sun and Hanlin Wang and Xinming Zhang and Xiaoyu Shen and Wenjie Li and Wei Zhang},
  year={2026},
  eprint={2603.06859},
  archivePrefix={arXiv},
  primaryClass={cs.LG},
  url={https://arxiv.org/abs/2603.06859}
}

About

Personal implementation of the paper "Contextual Counterfactual Credit Assignment for Multi-Agent Reinforcement Learning in LLM Collaboration".

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors