Skip to content

MrToser/CuSearch

Repository files navigation

CuSearch: Curriculum Rollout Sampling via Search Depth for Agentic RAG

CuSearch method overview

CuSearch is a lightweight trajectory-selection layer for RLVR / GRPO agentic RAG: from N·G rollouts per step it keeps K updates by search depth (well-formed searches with non-empty retrieval), reallocating budget toward deeper trajectories via Search-Depth Greedy Allocation (SDGA)SDGA-Auto (implicit curriculum) or SDGA-Phase (explicit phase threshold). Rewards, model, and environment stay unchanged.

On ZeroSearch (Qwen2.5-3B), SDGA-Phase improves average EM by up to +11.8 vs standard GRPO-Full, with consistent gains under Search-R1 and other reported settings.

🛠 Dependencies

conda create -n cusearch python=3.9
conda activate cusearch
conda install -c conda-forge pyarrow pandas numpy
pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu121
pip install vllm==0.6.3 --no-build-isolation
pip install "wandb<0.13"
pip install serpapi

# verl
pip install -e . --no-build-isolation

# flash attention 2
# pip3 install flash-attn --no-build-isolation

📖 Quick Start

(1) Download the training dataset.

huggingface-cli download --repo-type dataset --resume-download Alibaba-NLP/ZeroSearch_dataset --local-dir ZeroSearch_dataset

(2) Download the simulation LLMs.

# Simulation LLMs are available in different parameter sizes. Choose the one that best suits your needs. The 7B version is recommended for its stable and reliable simulation performance.
huggingface-cli download --resume-download Alibaba-NLP/Simulation_LLM_google_7B_V2 --local-dir Simulation_LLM_google_7B

(3) Launch a local simulation server.

# Prompt-based simulation
bash ./simulator_vLLM.sh

(4) Conduct RL training (default script: GRPO + optional CuSearch / SDGA).

# Activate the same conda env as in Dependencies (section above)
conda activate cusearch

# Real Google / SerpAPI search (only if you use a non-simulated retriever in your config)
export SER_API_KEY=your_api_key

# CuSearch knobs live in train_grpo.sh: enable_sdga, sdga_variant (auto | phase), etc.
# START_THRESHOLD / END_THRESHOLD there control the simulated-retrieval curriculum schedule
# (interpolated in the agent loop; see llm_agent/generation.py), not the SDGA depth allocator.

## Prompt-based simulation
bash train_grpo.sh

💡 Performance

Figure 1 summarizes Exact Match (EM, %) on seven open-domain QA benchmarks (NQ, TriviaQA, PopQA, HotpotQA, 2WikiMultiHopQA, MuSiQue, Bamboogle) under VERL + GRPO with ZeroSearch / Search-R1-style setups. Figures 2–3 show search-depth behavior and training curves.

📊 Figures

Figure 1. Main experimental results (benchmarks, backbones, and retrieval setups).

Main experimental results table and comparisons

Figure 2. Average valid search count during training and within-batch search-depth distribution shift.

Search count and depth distribution

Figure 3. ZeroSearch + Qwen2.5-3B: EM on NQ, mean reward, and average search depth vs. training step.

Training curves: EM, reward, search depth

🙏 Acknowledgements

This repository builds on the ZeroSearch training stack and ecosystem (including simulated retrieval workflows) and the VERL RL training framework used throughout the paper’s experiments. We thank the authors of ZeroSearch (Sun et al., 2025), Search-R1 (Jin et al., 2025), veRL, and RAGEN for open-sourcing strong baselines and infrastructure.

📧 Contact

For this codebase and method questions, please open an issue in the repository.

🚩Citation

If this work is helpful, please cite the CuSearch paper (venue/version subject to camera-ready updates):

@misc{cusearch2026curriculum,
  title={CuSearch: Curriculum Rollout Sampling via Search Depth for Agentic {RAG}},
  author={Shen, Jianghan and Luo, Siqi and Cheng, Xinyu and Xiong, Jing and Li, Yue and Liu, Jiyao and Lin, Jiashi and Chen, Yirong and He, Junjun},
  howpublished={Manuscript under review},
  year={2026}
}

If you use the ZeroSearch data/simulator pipeline, please also cite:

@article{sun2025zerosearch,
  title={ZeroSearch: Incentivize the Search Capability of LLMs without Searching},
  author={Sun, Hao and Qiao, Zile and Guo, Jiayan and Fan, Xuanbo and Hou, Yingyan and Jiang, Yong and Xie, Pengjun and Huang, Fei and Zhang, Yan},
  journal={arXiv preprint arXiv:2505.04588},
  year={2025}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors