Skip to content

OpenMLRL/CoMLRL

Repository files navigation

OpenMLRL Hugging Face arXiv

PyPI version Conda version PyPI downloads Documentation

Python Version CI pre-commit.ci Docs Build

code style: black License: BSD-3-Clause

Cooperative Multi-LLM Reinforcement Learning (CoMLRL) is an open-source library for training multiple LLMs to collaborate using Multi-Agent Reinforcement Learning (MARL). It provides implementations of various MARL algorithms for LLM collaboration and support for different environments and benchmarks.

Installation

Install from PyPI

pip install comlrl
# install PyTorch compatible with your device

Install from conda-forge

conda install -c conda-forge comlrl
# install PyTorch compatible with your device

Install from source

To access the latest features, you can install CoMLRL from source:

git clone https://github.com/OpenMLRL/CoMLRL.git
cd CoMLRL
pip install -e .
# install PyTorch compatible with your device

Features

  • MARL trainers to optimize LLM collaboration:

    • Multi-Agent REINFORCE: Critic-free policy gradient methods, including MAREINFROCE, MAGRPO, MARLOO, MAREMAX.
      • Aligned individual response joint with joint_mode='align'.
      • Memory-efficient cross joint with joint_mode='cross'.
    • Multi-Agent PPO: Critic-based policy gradient methods, including IPPO.
      • Canonical IPPO with a separate critic with use_separate_critic=True.
      • Memory-efficient critic with value-head over actor with use_separate_critic=False.
  • Environments that simulate real-world tasks for training and evaluating LLM collaboration:

    • Writing Collaboration: Multiple LLM agents collaborate on processing articles.
      • TLDR - Summarizing Reddit posts.
      • ArXiv - Expanding abstracts into introductions.
    • Code Generation: Generate code solutions for programming problems.
      • MBPP - Mostly basic python problems.
      • HumanEval - Handwritten evaluation problems
      • CoopHumanEval - HumanEval with cooperative nature.
    • Code Completion: Complete code snippets based on given contexts.
      • ClassEval - Complete class-level code based on method stubs and docstrings.

Usage

Quick start by training 2 Qwen-2.5 agents to summarize Reddit posts with MAGRPO:

from datasets import load_dataset
from transformers import AutoTokenizer
from comlrl.trainers.magrpo import MAGRPOConfig, MAGRPOTrainer

# Load dataset and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-0.5B")
dataset = load_dataset("trl-lib/tldr", split="train").select(range(128))

# Initialize trainer and start training
trainer = MAGRPOTrainer(
    model="Qwen/Qwen2.5-0.5B",
    num_agents=2,
    tokenizer=tokenizer,
    train_dataset=dataset,
    reward_func=lambda a, b: [abs(max(len(b[0]), 1) / max(len(a[0]), 1) - 3.0)],
    formatters=[lambda example: example["prompt"]] * 2,
    args=MAGRPOConfig(
        per_device_train_batch_size=1,
    ),
)
trainer.train()

Contributing

We welcome contributions from the community! Please see contributing guidelines on setting up a development environment and contribute.

Thanks to the gracious help of contributors:


Shuo Liu

πŸ€” 🚧 πŸ’» πŸ“–

Tianle Chen

🚧 πŸ’» πŸ›

Ryan Amiri

🚧 πŸ’» πŸ›

Zeyu Liang

πŸ“– πŸ›
πŸ€”: Foundational Ideas; 🚧: Maintenance; πŸ’»: Code; πŸ“–: Documentation; πŸ›: Bug Report.

Citation

Please cite our paper if you find this library useful in your research:

@misc{liu2025comlrl,
      title={LLM Collaboration With Multi-Agent Reinforcement Learning},
      author={Shuo Liu and Tianle Chen and Zeyu Liang and Xueguang Lyu and Christopher Amato},
      year={2025},
      eprint={2508.04652},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2508.04652},
}