9f57f7fce0bc345ea787405f6c5fa162_raw.mp4
[2026.04.28] All collaboration styles and model checkpoints, with examplified downstream inference are now available. Stay tuned for the complete training/inference pipeline and additional features!
[2026.04.28] We have released the RecursiveMAS paper!
RecursiveMAS is a multi-agent framework that scales agent collaboration through latent-space recursion. Instead of treating each LLM agent as an isolated module, RecursiveMAS casts the entire multi-agent system as a unified recursive computation. Heterogeneous agents are connected through lightweight RecursiveLink modules, allowing agents to iteratively exchange, refine, and evolve their latent states across recursion rounds.
β Release All Collaboration Patterns (Sequential, Mixture, Deliberation, Distillation).
β Release Demo Code for Inference (Commands Provided Below).
βοΈ Add Complete Inference Pipeline Across All Downstreams.
βοΈ Add All Training Data & Implementation Details.
βοΈ Add Additional Supported Model Family & MAS Collaboration Patterns.
This repository provides the code for running RecursiveMAS under different multi-agent collaboration styles.
To begin with, we recommend creating a new conda environment:
conda create -n recursivemas python=3.10 -y
conda activate recursivemasInstall the required packages:
pip install -r requirements.txtFor Deliberation-Style, the Tool-Caller Agent requires external search tools to retrieve information.
Please set up a search API key (e.g., a Tavily API key) in .env file:
TAVILY_API_KEY=your_tavily_api_key_hereTo run RecursiveMAS, you need to download and store the checkpoints for each agent role in the multi-agent system from our Hugging Face release.
The checkpoints are organized by collaboration style. Each collection contains the individual role-specific agent together with their RecursiveLink modules.
| Model Organization | Download |
|---|---|
| Sequential-Light-Planner-Qwen3-1.7B | π€ HuggingFace |
| Sequential-Light-Critic-Llama3.2-1B | π€ HuggingFace |
| Sequential-Light-Solver-Qwen2.5-Math-1.5B | π€ HuggingFace |
| Sequential-Light-Outerlinks | π€ HuggingFace |
| Model Organization | Download |
|---|---|
| Sequential-Scaled-Planner-Gemma3-4B | π€ HuggingFace |
| Sequential-Scaled-Critic-Llama3.2-3B | π€ HuggingFace |
| Sequential-Scaled-Solver-Qwen3.5-4B | π€ HuggingFace |
| Sequential-Scaled-Outerlinks | π€ HuggingFace |
| Model Organization | Download |
|---|---|
| Mixture-Math-DeepSeek-R1-Distill-Qwen-1.5B | π€ HuggingFace |
| Mixture-Code-Qwen2.5-Coder-3B | π€ HuggingFace |
| Mixture-Science-BioMistral-7B | π€ HuggingFace |
| Mixture-Summarizer-Qwen3.5-2B | π€ HuggingFace |
| Mixture-Outerlinks | π€ HuggingFace |
| Model Organization | Download |
|---|---|
| Distillation-Expert-Qwen3.5-9B | π€ HuggingFace |
| Distillation-Learner-Qwen3.5-4B | π€ HuggingFace |
| Distillation-Outerlinks | π€ HuggingFace |
| Model Organization | Download |
|---|---|
| Deliberation-Reflector-Qwen3.5-4B | π€ HuggingFace |
| Deliberation-Toolcaller-Qwen3.5-4B | π€ HuggingFace |
| Deliberation-Outerlinks | π€ HuggingFace |
Here is an example of how to load the whole MAS pipeline:
from system_loader import load_mas_system
mas = load_mas_system(
style="sequential_light",
device="cuda",
trust_remote_code=True,
)
planner = mas.agents["planner"].model
critic = mas.agents["critic"].model
solver = mas.agents["solver"].modelDetailed running code for loading agents and running RecursiveMAS on downstream tasks is provided in run.py.
Next, clone our repository and enter the project directory:
git clone https://github.com/RecursiveMAS/RecursiveMAS.git
cd RecursiveMASThe current repository is organized as follows:
RecursiveMAS/
βββ README.md
βββ __init__.py
βββ run.py
βββ load_from_repo.py
βββ hf_resolver.py
βββ modeling.py
βββ system_loader.py
βββ prompts.py
βββ requirements.txt
βββ assets/
βββ dataset/
βββ inference_utils/
βββ __init__.py
βββ answer_utils.py
βββ lcb_utils.py
βββ reflector_tool_notes.py
βββ inference_mas.py
βββ inference_mas_mixture.py
βββ inference_mas_distill.py
βββ inference_mas_deliberation.py
The key components are:
run.py: the unified entry point for running RecursiveMAS inference.load_from_repo.py: maps each MAS style to our released Hugging Face checkpoints and dataset defaults.hf_resolver.py: resolves and load the Hugging Face checkpoints.modeling.py: implements RecursiveLink modules.system_loader.py: provides a high-level API for loading a full released multi-agent system.prompts.py: stores prompts for different MAS collaboration styles.inference_utils/: contains inference pipelines and evaluation utilities for different MAS structures.
We provide Sequential-style RecursiveMAS under both lightweight and scaled settings.
- Sequential-style (Light) uses lightweight agents for efficient recursive collaboration.
python run.py --style sequential_light --batch_size 32 --temperature 0.6 --top_p 0.95 --dataset math500 --seed 42 --trust_remote_code 1 --device cuda- Sequential-style (Scaled) uses stronger LLM agents to further improve reasoning performance.
python run.py --style sequential_scaled --batch_size 16 --temperature 0.6 --top_p 0.95 --dataset math500 --seed 42 --trust_remote_code 1 --device cudaRecursiveMAS can also be adapted to different MAS collaboration patterns beyond the sequential setting.
- Mixture-style RecursiveMAS coordinates multiple domain-specialized agents and aggregates their information through a summarizer.
python run.py --style mixture --batch_size 16 --temperature 0.6 --top_p 0.95 --dataset math500 --seed 42 --trust_remote_code 1 --device cuda- Distillation-style RecursiveMAS enables a larger Expert and a smaller Learner to interact recursively, improving the Learner while retaining better efficiency.
python run.py --style distillation --batch_size 16 --temperature 0.6 --top_p 0.95 --dataset math500 --seed 42 --trust_remote_code 1 --device cuda- Deliberation-style RecursiveMAS supports recursive coordination between a Reflector and a Tool-Caller for tool-integrated reasoning.
python run.py --style deliberation --batch_size 16 --temperature 0.6 --top_p 0.95 --dataset math500 --seed 42 --trust_remote_code 1 --device cudaThis project is built upon the excellent open-source community. We sincerely thank the developers and maintainers of the following libraries and resources:
- vLLM for supporting efficient LLM inference and serving.
- ARPO for providing useful references on agentic tool-use systems and efficient tool-calling workflows.
- TextGrad for its pioneering framework on text-based optimization and natural-language feedback for compound agentic systems.
@misc{recursivemas,
title={Recursive Multi-Agent Systems},
author={Xiyuan Yang and Jiaru Zou and Rui Pan and Ruizhong Qiu and Pan Lu and Shizhe Diao and Jindong Jiang and Hanghang Tong and Tong Zhang and Markus J. Buehler and Jingrui He and James Zou},
year={2026},
eprint={2604.25917},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2604.25917},
}
