The long-standing goal of artificial intelligence is to create agents capable of general, adaptive behaviour in open-ended environments. Guided by the "Bitter Lesson", we argue that the most effective path toward this goal is to systematically remove human priors and allow intelligence to emerge through interaction with a "Big World" that is orders of magnitude more complex than the agent itself.
We propose the mobile Graphical User Interface (GUI) as a practical proxy for such a world and introduce Darwin Mobile Agent, an open-source infrastructure designed as a foundation for autonomous reinforcement learning in this domain. Our framework addresses the data-collection bottleneck in real-world mobile interactions by using an asynchronous agent-environment loop across parallel cloud-phone instances. We further propose a conceptual roadmap to systematically remove human priors from three fundamental pillars of a self-evolving agent: task curriculum, outcome verification, and memory management.
We validate that the Darwin infrastructure provides the stability and scalability required for policy optimisation in the GUI domain. This work establishes the practical and theoretical foundation necessary to move toward truly autonomous, self-evolving agents.
| Category | Capabilities |
|---|---|
| "Big World" GUI | Mobile GUI as an experimentally tractable, open-ended "Big World" proxy for general intelligence emergence. |
| Async Architecture | Enables efficient parallelization by pipelining model inference and environment execution to hide interaction latency. |
| Self-Evolution | Systematic removal of human priors from Task Curriculum, Outcome Verification, and Agent State (Memory). |
| Task Lifecycle | Automated Setup → Execution → Teardown protocol to internalize environment state management. |
| Cloud-Native Fleet | Stable infrastructure using cloud Android devices via Alibaba Cloud instead of local emulators. |
| Verification | LLM-based semantic verification to drive rewards for open-ended tasks without programmatic state access. |
- [2025.12.22] Initial release of Darwin Mobile Agent and paper
- Cloud phone infrastructure with Alibaba Cloud
- Multi-model support (UI-TARS, Qwen3-VL)
- Asynchronous agent-environment loop
- Curriculum-based task sampling
- LLM-based outcome verification
- Task lifecycle protocol (setup-task-cleanup)
- Pre-trained model checkpoints release
- Extended task benchmark
- Device abstraction layer for emulators and physical devices
- Automated task generation (LLM-based proposer & fine-tuning)
- Knowledge distillation & Memory Management
# 1. Clone and install
git clone https://github.com/ai-agents-2030/darwin-mobile-agent.git
cd darwin-mobile-agent
pip install -e .
# 2. Install dependencies
pip install vllm==0.8.5 trl==0.25.1 "transformers>=4.57.0"
# 3. Configure cloud phones (see docs for details)
# Edit your device addresses in the training script
# 4. Run training
bash examples/darwin_agent/run_spabench_ui_tars.sh| Guide | Description |
|---|---|
| Getting Started | Overview and quick start |
| Local Setup | Python environment, ADB, dependencies |
| Cloud Devices | Alibaba Cloud phone configuration |
| Task Benchmarks | Available tasks and custom task creation |
We evaluate the stability of the Darwin Mobile Agent infrastructure in a controlled reinforcement learning setting. In this experiment, eight tasks are executed concurrently across eight cloud phones, enabling parallel data collection while keeping the overall system configuration fixed.
Across training, the agent demonstrates a clear and consistent improvement in mean task success rate. Despite operating on real devices with inherent latency and asynchronous agent–environment interactions, the training process remains stable, with no observable divergence or performance collapse. This behaviour indicates that the Darwin infrastructure can reliably sustain end-to-end policy optimisation in the mobile GUI domain, providing a solid baseline for future investigations into larger-scale and more complex settings.
Darwin Mobile Agent builds upon several excellent open-source projects:
- verl-agent - Multi-turn RL framework for LLM agents
- veRL - Distributed RL training infrastructure
- Android World - Android task benchmark
- SPA-Bench - Smartphone agent benchmark
We thank the developers and maintainers of these projects for their contributions to the community.
This project is licensed under the Apache 2.0 License.
If you use Darwin Mobile Agent in your research, please cite:
@article{darwin2025,
title={Darwin Mobile Agent: A Roadmap for Self-Evolution},
author={Beechey, Daniel and Yuen, Derek and Liu, Jianheng and Luo, Dezhao and He, Tiantian and Luo, Weilin and Wang, Jun and Shao, Kun},
journal={arXiv preprint},
year={2025}
}Questions or Issues? Open an issue on GitHub

