GitHub - OpenMLRL/CoMLRL: Open-Source Library for Cooperative Multi-LLM Reinforcement Learning

Cooperative Multi-LLM Reinforcement Learning (CoMLRL) is an open-source library for training multiple LLMs to collaborate using Multi-Agent Reinforcement Learning (MARL). It provides implementations of various MARL algorithms for LLM collaboration and support for different environments and benchmarks.

Installation

Install from PyPI

pip install comlrl
# install PyTorch compatible with your device

Install from conda-forge

conda install -c conda-forge comlrl
# install PyTorch compatible with your device

Install from source

To access the latest features, you can install CoMLRL from source:

git clone https://github.com/OpenMLRL/CoMLRL.git
cd CoMLRL
pip install -e .
# install PyTorch compatible with your device

Features

MARL trainers to optimize LLM collaboration:
- Multi-Agent REINFORCE: Critic-free policy gradient methods, including MAREINFROCE, MAGRPO, MARLOO, MAREMAX.
  - Aligned individual response joint with joint_mode='align'.
  - Memory-efficient cross joint with joint_mode='cross'.
- Multi-Agent PPO: Critic-based policy gradient methods, including IPPO.
  - Canonical IPPO with a separate critic with use_separate_critic=True.
  - Memory-efficient critic with value-head over actor with use_separate_critic=False.
Environments that simulate real-world tasks for training and evaluating LLM collaboration:
- Writing Collaboration: Multiple LLM agents collaborate on processing articles.
  - TLDR - Summarizing Reddit posts.
  - ArXiv - Expanding abstracts into introductions.
- Code Generation: Generate code solutions for programming problems.
  - MBPP - Mostly basic python problems.
  - HumanEval - Handwritten evaluation problems
  - CoopHumanEval - HumanEval with cooperative nature.
- Code Completion: Complete code snippets based on given contexts.
  - ClassEval - Complete class-level code based on method stubs and docstrings.

Usage

Quick start by training 2 Qwen-2.5 agents to summarize Reddit posts with MAGRPO:

from datasets import load_dataset
from transformers import AutoTokenizer
from comlrl.trainers.magrpo import MAGRPOConfig, MAGRPOTrainer

# Load dataset and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-0.5B")
dataset = load_dataset("trl-lib/tldr", split="train").select(range(128))

# Initialize trainer and start training
trainer = MAGRPOTrainer(
    model="Qwen/Qwen2.5-0.5B",
    num_agents=2,
    tokenizer=tokenizer,
    train_dataset=dataset,
    reward_func=lambda a, b: [abs(max(len(b[0]), 1) / max(len(a[0]), 1) - 3.0)],
    formatters=[lambda example: example["prompt"]] * 2,
    args=MAGRPOConfig(
        per_device_train_batch_size=1,
    ),
)
trainer.train()

Contributing

We welcome contributions from the community! Please see contributing guidelines on setting up a development environment and contribute.

Thanks to the gracious help of contributors:

_{Shuo Liu}
🤔 🚧 💻 📖

_{Tianle Chen}
🚧 💻 🐛

_{Ryan Amiri}
🚧 💻 🐛

_{Zeyu Liang}
📖 🐛

_{🤔: Foundational Ideas; 🚧: Maintenance; 💻: Code; 📖: Documentation; 🐛: Bug Report.}

Citation

Please cite our paper if you find this library useful in your research:

@misc{liu2025comlrl,
      title={LLM Collaboration With Multi-Agent Reinforcement Learning},
      author={Shuo Liu and Tianle Chen and Zeyu Liang and Xueguang Lyu and Christopher Amato},
      year={2025},
      eprint={2508.04652},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2508.04652},
}

Name		Name	Last commit message	Last commit date
Latest commit History 197 Commits
.github/workflows		.github/workflows
comlrl		comlrl
docs		docs
examples		examples
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CITATION.cff		CITATION.cff
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Installation

Install from PyPI

Install from conda-forge

Install from source

Features

Usage

Contributing

Citation

About

Uh oh!

Releases 23

Packages

Contributors 3

Languages

License

OpenMLRL/CoMLRL

Folders and files

Latest commit

History

Repository files navigation

Installation

Install from PyPI

Install from conda-forge

Install from source

Features

Usage

Contributing

Citation

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 23

Packages 0

Contributors 3

Languages

Packages