GitHub - Battam1111/C3: Personal implementation of the paper "Contextual Counterfactual Credit Assignment for Multi-Agent Reinforcement Learning in LLM Collaboration".

Contextual Counterfactual Credit Assignment

Reference implementation for the paper Contextual Counterfactual Credit Assignment for Multi-Agent Reinforcement Learning in LLM Collaboration.

Paper status: now available on arXiv as 2603.06859. The companion project page is available at eit-east-lab.github.io/C3, and the official PDF is available here.

TL;DR

Terminal-only feedback in multi-agent LLM collaboration diffuses credit across an entire trajectory. C3 freezes transcript-derived context and estimates local causal credit with fixed-context replay plus a leave-one-out baseline, outperforming MAPPO and MAGRPO while improving fidelity, variance, and inter-agent influence under matched budgets.

Mechanism | Results | Quickstart | Workflows | Release Gate

Core Mechanism

Figure 1: C3 mechanism overview. Protocol-level replay, fixed-context alternatives, and leave-one-out credit assignment.

Key Results

Learning dynamics comparison across methods

Main paper results: Sample-efficiency and performance trajectories against baseline methods (e.g., MAPPO, MAGRPO). C3 reaches superior performance given matched training token budgets.

Repository Purpose

C3 is a paper-aligned research codebase designed to support:

Protocol Reproduction: Executing the multi-agent task protocols described in the paper.
Experiment Execution: Running full training sweeps, evaluation-only probes, and performance analyses.
Implementation Audit: Providing a transparent mapping from theoretical mechanisms to executable code.
Extension and Testing: Modifying or testing the framework without relying on private paths or bundled artifacts.

Release Scope

To maintain a clean public-release surface, this repository does not distribute:

Prepared datasets
Trained model checkpoints
Cached model weights
Generated run-time artifacts (e.g., run logs, checkpoints, reports, and experiment outputs)

Corresponding local directories (data/, artifacts/, ckpt/, runs/, wandb/, models/) are treated as local working outputs and remain empty or absent in the public release. For details, see the Release Policy.

Repository Structure

c3/: Core C3 implementation, including multi-agent protocol handling, environments, credit assignment logic, and analysis tools.
openrlhf/: Vendored upstream RLHF training stack, augmented with C3-specific integration points.
configs/: Configurations for tasks, roles, analyses, execution registries, and data manifests.
scripts/: Entrypoints for data preparation, experiment reproduction, model utilities, and release gating.
docs/: Documentation covering release policies, implementation audits, upstream provenance, and data-source contracts.
project-page/: Static companion site for the paper, deployed via GitHub Pages.

Quick Navigation

Project Page: C3 Paper Page
Paper: arXiv Abstract
PDF: arXiv PDF
New User: Getting Started Guide
Code Layout: Code Map
Paper-to-Code Mapping: Implementation Audit
Development Invariants: Implementation Checklist
Data Provenance: Data Sources
Release Verification: Release Checklist
Upstream Lineage: Upstream Provenance

System Requirements

OS: Linux (x86_64)
Python: 3.11
GPU Stack: NVIDIA GPUs with CUDA 12.8-compatible runtime
PyTorch: torch/torchaudio/torchvision == 2.9.0+cu128

If your execution environment differs substantially, you may need to manually adapt the pinned dependencies.

Installation

Create a standard Python environment, install the pinned stack, and validate the installation:

python -m pip install -U pip
python -m pip install -r requirements.txt --no-build-isolation
python -m pip check

Installation Notes:

requirements.txt acts as a strict full-lock snapshot to guarantee reproducibility.
--no-build-isolation is strongly recommended due to build-sensitive packages (e.g., flash_attn).
Experiment reproduction scripts automatically export the repository root to PYTHONPATH via scripts/reproduce/common_env.sh.
For lightweight local development or CI checks, an editable install is supported: python -m pip install -e .[test]

30-Second Quickstart

# 1. Install dependencies
python -m pip install -r requirements.txt --no-build-isolation

# 2. Prepare local datasets (datasets are not shipped with the repo)
bash scripts/data/prepare_all.sh --out_dir data

# 3. Run a fast E2E wiring smoke test
bash scripts/reproduce/smoke.sh --task math --limit 1 --print_example 0

Data Preparation

Prepared datasets are generated locally from strictly pinned upstream sources. They are not bundled with the codebase.

bash scripts/data/prepare_all.sh --out_dir data

The authoritative source of truth for all data derivations is configs/data_manifest.yaml. See Data Sources and Third-Party Notices for provenance details.

Model Preparation

While Transformers/vLLM will automatically download weights on first use, we provide a dedicated utility to pre-download and cache all required HuggingFace base models. This is highly recommended for reproducibility, offline clusters, or avoiding concurrent download races:

# Log in to Hugging Face (required for gated models like Qwen)
huggingface-cli login

# Pre-download all base models referenced in the results registry
bash scripts/models/download_models.sh \
  --registry configs/main_results_registry.yaml \
  --out_dir models

Main Workflows

Smoke Test

Fast end-to-end wiring check to verify environment and protocol integrity:

bash scripts/reproduce/smoke.sh

SFT-only Main Results Sweep

Run the evaluation matrix exclusively for the SFT baseline:

bash scripts/reproduce/paper_main_results.sh sweep \
  --registry configs/main_results_registry.yaml \
  --only_methods SFT

Full Paper Training Matrix

Execute the complete set of model training runs:

export PRETRAIN='Qwen/Qwen2.5-3B-Instruct'
bash scripts/reproduce/paper_train.sh

Full Main Results Sweep

Run the full paper evaluation matrix across all methods:

bash scripts/reproduce/paper_main_results.sh sweep \
  --registry configs/main_results_registry.yaml

Paper Analyses

Generate analysis figures directly from local run directories:

bash scripts/reproduce/paper_analysis_figs.sh fig2 \
  --suite math \
  --run_c3 ckpt/_runs/<C3_run_dir> \
  --run_mappo ckpt/_runs/<MAPPO_run_dir> \
  --run_magrpo ckpt/_runs/<MAGRPO_run_dir> \
  --run_sft ckpt/_runs/_sft_main_results/<SFT_dir> \
  --mappo_critic_ckpt <PATH_TO_MAPPO_CRITIC>

Implementation Note

Note on C3 Algorithm Location: The primary credit assignment mechanism discussed in the paper is not located in c3/algorithms/c3.py (which serves as a fallback compatibility calculator). The paper-facing C3 implementation is deeply integrated into the experience generation phase. Key entry points include:

Please consult the Implementation Audit for a comprehensive mapping between the paper's theoretical framework and the codebase.

Audit and Release Gate

Before making any public release, execute the pre-release checks:

bash scripts/audit/pre_release.sh

To run the complete local release gate (including syntax checks and unit tests):

bash scripts/audit/release_gate.sh

For a single-command preflight reproduction check:

bash scripts/reproduce/preflight_repro.sh --task math

These gating scripts rigorously verify:

Absence of hard-coded private paths
Absence of obvious leaked secrets
Absence of bundled datasets or generated release-surface artifacts
Bash scripting syntax sanity
Python compilation and test suite sanity

Governance

This repository includes standard open-source policy files:

License and Attribution

Code license: Apache-2.0
Citation metadata: CITATION.cff
Third-party notices: THIRD_PARTY_NOTICES.md
Upstream provenance: docs/UPSTREAM.md

Acknowledgements

We deeply appreciate the open-source community for their foundational work. In particular, we would like to acknowledge:

OpenRLHF: Our RL infrastructure is built upon the OpenRLHF framework. We are grateful to the OpenRLHF team for providing an easy-to-use, scalable, and high-performance agentic RL foundation based on Ray and vLLM.

Citation

If you find this repository or paper useful for your research, please cite:

@misc{chen2026contextualcounterfactualcreditassignment,
  title={Contextual Counterfactual Credit Assignment for Multi-Agent Reinforcement Learning in LLM Collaboration},
  author={Yanjun Chen and Yirong Sun and Hanlin Wang and Xinming Zhang and Xiaoyu Shen and Wenjie Li and Wei Zhang},
  year={2026},
  eprint={2603.06859},
  archivePrefix={arXiv},
  primaryClass={cs.LG},
  url={https://arxiv.org/abs/2603.06859}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TL;DR

Core Mechanism

Key Results

Repository Purpose

Release Scope

Repository Structure

Quick Navigation

System Requirements

Installation

30-Second Quickstart

Data Preparation

Model Preparation

Main Workflows

Smoke Test

SFT-only Main Results Sweep

Full Paper Training Matrix

Full Main Results Sweep

Paper Analyses

Implementation Note

Audit and Release Gate

Governance

License and Attribution

Acknowledgements

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github		.github
c3		c3
configs		configs
docs		docs
figs		figs
openrlhf		openrlhf
project-page		project-page
scripts		scripts
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
THIRD_PARTY_NOTICES.md		THIRD_PARTY_NOTICES.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

TL;DR

Core Mechanism

Key Results

Repository Purpose

Release Scope

Repository Structure

Quick Navigation

System Requirements

Installation

30-Second Quickstart

Data Preparation

Model Preparation

Main Workflows

Smoke Test

SFT-only Main Results Sweep

Full Paper Training Matrix

Full Main Results Sweep

Paper Analyses

Implementation Note

Audit and Release Gate

Governance

License and Attribution

Acknowledgements

Citation

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages