CuSearch: Curriculum Rollout Sampling via Search Depth for Agentic RAG

CuSearch is a lightweight trajectory-selection layer for RLVR / GRPO agentic RAG: from N·G rollouts per step it keeps K updates by search depth (well-formed searches with non-empty retrieval), reallocating budget toward deeper trajectories via Search-Depth Greedy Allocation (SDGA)—SDGA-Auto (implicit curriculum) or SDGA-Phase (explicit phase threshold). Rewards, model, and environment stay unchanged.

On ZeroSearch (Qwen2.5-3B), SDGA-Phase improves average EM by up to +11.8 vs standard GRPO-Full, with consistent gains under Search-R1 and other reported settings.

🛠 Dependencies

conda create -n cusearch python=3.9
conda activate cusearch
conda install -c conda-forge pyarrow pandas numpy
pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu121
pip install vllm==0.6.3 --no-build-isolation
pip install "wandb<0.13"
pip install serpapi

# verl
pip install -e . --no-build-isolation

# flash attention 2
# pip3 install flash-attn --no-build-isolation

📖 Quick Start

(1) Download the training dataset.

huggingface-cli download --repo-type dataset --resume-download Alibaba-NLP/ZeroSearch_dataset --local-dir ZeroSearch_dataset

(2) Download the simulation LLMs.

# Simulation LLMs are available in different parameter sizes. Choose the one that best suits your needs. The 7B version is recommended for its stable and reliable simulation performance.
huggingface-cli download --resume-download Alibaba-NLP/Simulation_LLM_google_7B_V2 --local-dir Simulation_LLM_google_7B

(3) Launch a local simulation server.

# Prompt-based simulation
bash ./simulator_vLLM.sh

(4) Conduct RL training (default script: GRPO + optional CuSearch / SDGA).

# Activate the same conda env as in Dependencies (section above)
conda activate cusearch

# Real Google / SerpAPI search (only if you use a non-simulated retriever in your config)
export SER_API_KEY=your_api_key

# CuSearch knobs live in train_grpo.sh: enable_sdga, sdga_variant (auto | phase), etc.
# START_THRESHOLD / END_THRESHOLD there control the simulated-retrieval curriculum schedule
# (interpolated in the agent loop; see llm_agent/generation.py), not the SDGA depth allocator.

## Prompt-based simulation
bash train_grpo.sh

💡 Performance

Figure 1 summarizes Exact Match (EM, %) on seven open-domain QA benchmarks (NQ, TriviaQA, PopQA, HotpotQA, 2WikiMultiHopQA, MuSiQue, Bamboogle) under VERL + GRPO with ZeroSearch / Search-R1-style setups. Figures 2–3 show search-depth behavior and training curves.

📊 Figures

Figure 1. Main experimental results (benchmarks, backbones, and retrieval setups).

Figure 2. Average valid search count during training and within-batch search-depth distribution shift.

Figure 3. ZeroSearch + Qwen2.5-3B: EM on NQ, mean reward, and average search depth vs. training step.

🙏 Acknowledgements

This repository builds on the ZeroSearch training stack and ecosystem (including simulated retrieval workflows) and the VERL RL training framework used throughout the paper’s experiments. We thank the authors of ZeroSearch (Sun et al., 2025), Search-R1 (Jin et al., 2025), veRL, and RAGEN for open-sourcing strong baselines and infrastructure.

📧 Contact

For this codebase and method questions, please open an issue in the repository.

🚩Citation

If this work is helpful, please cite the CuSearch paper (venue/version subject to camera-ready updates):

@misc{cusearch2026curriculum,
  title={CuSearch: Curriculum Rollout Sampling via Search Depth for Agentic {RAG}},
  author={Shen, Jianghan and Luo, Siqi and Cheng, Xinyu and Xiong, Jing and Li, Yue and Liu, Jiyao and Lin, Jiashi and Chen, Yirong and He, Junjun},
  howpublished={Manuscript under review},
  year={2026}
}

If you use the ZeroSearch data/simulator pipeline, please also cite:

@article{sun2025zerosearch,
  title={ZeroSearch: Incentivize the Search Capability of LLMs without Searching},
  author={Sun, Hao and Qiao, Zile and Guo, Jiayan and Fan, Xuanbo and Hou, Yingyan and Jiang, Yong and Xie, Pengjun and Huang, Fei and Zhang, Yan},
  journal={arXiv preprint arXiv:2505.04588},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
llm_agent		llm_agent
verl		verl
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
eval_grpo_google.sh		eval_grpo_google.sh
eval_grpo_simulator.sh		eval_grpo_simulator.sh
pyproject.toml		pyproject.toml
setup.py		setup.py
simulator_vLLM.sh		simulator_vLLM.sh
train_grpo.sh		train_grpo.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CuSearch: Curriculum Rollout Sampling via Search Depth for Agentic RAG

🛠 Dependencies

📖 Quick Start

💡 Performance

📊 Figures

🙏 Acknowledgements

📧 Contact

🚩Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CuSearch: Curriculum Rollout Sampling via Search Depth for Agentic RAG

🛠 Dependencies

📖 Quick Start

💡 Performance

📊 Figures

🙏 Acknowledgements

📧 Contact

🚩Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages