SearchGym: Bootstrapping Real-World Search Agents via Cost-Effective and High-Fidelity Environment Simulation
| π° Paper | π€ Models | π€ Datasets |
SearchGym is a high-fidelity simulation environment designed to train robust search agents without the prohibitive costs and noise associated with live web training. By constructing a verifiable knowledge graph and an aligned document corpus, SearchGym provides a closed-loop environment where every reasoning task is factually grounded and strictly solvable.
SearchGym operates across three distinct search environments, each serving a specific purpose in the pipeline (Training vs. Evaluation).
| Environment Type | Backend | Purpose | Code Identifier | Required Setup |
|---|---|---|---|---|
| 1. Synthetic (SearchGym) | Meilisearch | Training (RL). High-speed, typo-tolerant, verifiable ground truth. | meilisearch-local |
Meilisearch Binary + Mini-Wiki Data |
| 2. Local (Wikipedia) | Pyserini / FAISS | Standard Eval (NQ, HotpotQA). Static 2018 Wiki snapshot. | async-search-access |
Local RAG Server + Index Files |
| 3. Live Web | Serper + Jina | Open-Ended Eval (GAIA, DeepSearch). Real-time web browsing. | async-web-search-access |
API Keys (Serper, Jina) |
Create a conda environment and install dependencies. Note that SearchGym relies on AReaL for asynchronous RL training.
# 1. Create conda environment
conda create -n SearchGym python=3.12
conda activate SearchGym
# 2. Install dependencies
# Navigate to AReal directory (assumed submodule or copied)
cd AReal
bash examples/env/setup-pip-deps.sh
# 3. Validate installation
python examples/env/validate_installation.pyDownload the training data (Synthetic Corpus) and evaluation benchmarks.
git clone https://huggingface.co/datasets/hkuzxc/SearchGym-test-data
# Ensure the directory structure is: project_root/SearchGym-test-data/Used for: Stage 1 & Stage 2 RL Training
-
Install & Start Meilisearch:
# Install curl -L https://install.meilisearch.com | sh # Start Server (Background) # Master key matches config in SearchGym/meilisearch_client.py mkdir -p logs && nohup ./meilisearch --master-key="aSampleMasterKey" > logs/meilisearch.log 2>&1 &
-
Generate & Index Data (Optional if you have the JSON): If you need to regenerate the synthetic data:
cd mini-wiki export DEEPSEEK_API_KEY="your-key" python scripts/run_all_steps.py --steps all
-
Push Data to Meilisearch:
curl -X POST 'http://127.0.0.1:7700/indexes/wiki/documents?primaryKey=id' \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer aSampleMasterKey' \ --data-binary @mini-wiki/outputs/wiki/wiki_with_urls.json
Used for: NQ, HotpotQA, TriviaQA, etc.
-
Setup Environment:
conda create -n retriever python=3.10 conda activate retriever # Install PyTorch with CUDA support conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.1 -c pytorch -c nvidia # Install dependencies pip install transformers datasets pyserini # Install GPU-accelerated FAISS conda install -c pytorch -c nvidia faiss-gpu=1.8.0 # Install API server dependencies pip install uvicorn fastapi
-
Download Indices: Download the E5 retriever index and corpus from ASearcher-Local-Knowledge.
-
Launch Retrieval Server: Modify
scripts/launch_local_server.shwith your paths, then run:bash scripts/launch_local_server.sh 8000 /path/to/server_address_log/
This starts a FastAPI server that acts as the search engine.
Used for: GAIA, xBench-DeepSearch
This environment requires external API keys. No local server is needed, but configuration files must be updated.
Located in SearchGym/SearchGym/configs/.
Example: SearchGym_stage1.yaml
# ... (Cluster settings)
# Model Path
actor:
path: /path/to/base/model # e.g., Qwen2.5-3B
# Environment Selection
# "meilisearch-local" points to the setup in Section 2A
search_client_type: meilisearch-local
# Concurrency & Queue
use_queue: true
redis_config:
url: "redis://localhost:6379"
# Dataset Paths (Relative to project root)
train_dataset:
path: ../SearchGym-test-data/mini_wiki_train/stage1/stage1_train.jsonlLocated in SearchGym/evaluation/eval_config.yaml.
api_keys:
# For Web Env (GAIA/xBench)
serper_api_key: "your-serper-api-key"
jina_api_key: "your-jina-api-key"
settings:
# For Local Env (NQ/HotpotQA)
# Matches the IP/Port from Section 2B
local_server:
address: "127.0.0.1"
port: "8000"We use a curriculum learning approach. Ensure SEARCHGYM_ROOT and WANDB_API_KEY are set in the scripts.
Stage 1: Foundational Skill Acquisition
cd SearchGym
bash run_SearchGym_stage1.shStage 2: Advanced Reasoning Development
Update run_SearchGym_stage2.sh to point to the checkpoint from Stage 1.
bash run_SearchGym_stage2.shWe provide scripts for both Local and Online evaluations.
Local Benchmarks (Bamboogle, NQ, etc.):
Edit SearchGym/evaluation/batch_run_eval_local.sh:
AGENT_TYPE=SearchGymSEARCH_CLIENT_TYPE=async-search-access(Uses Local RAG Server)
cd SearchGym/evaluation
bash batch_run_eval_local.shOnline Benchmarks (GAIA, xBench):
Edit SearchGym/evaluation/batch_run_eval_online.sh:
AGENT_TYPE=SearchGymSEARCH_CLIENT_TYPE=async-web-search-access(Uses Serper/Jina)
cd SearchGym/evaluation
bash batch_run_eval_online.shWe provide pre-trained SearchGym models on HuggingFace: SearchGym Collection
| Model | Size | Link |
|---|---|---|
| SearchGym_Qwen_2.5_3B_Base | 3B | hkuzxc/SearchGym_Qwen_2.5_3B_Base |
| SearchGym_Qwen_2.5_3B_Instruct | 3B | hkuzxc/SearchGym_Qwen_2.5_3B_Instruct |
| SearchGym_Qwen_2.5_7B_Base | 7B | hkuzxc/SearchGym_Qwen_2.5_7B_Base |
| SearchGym_Qwen_2.5_7B_Instruct | 7B | hkuzxc/SearchGym_Qwen_2.5_7B_Instruct |
| SearchGym_Qwen_3_4B | 4B | hkuzxc/SearchGym_Qwen_3_4B |
| SearchGym_Qwen_3_8B | 8B | hkuzxc/SearchGym_Qwen_3_8B |
| SearchGym_Llama_3.2_3B_Instruct | 3B | hkuzxc/SearchGym_Llama_3.2_3B_Instruct |
@misc{zhang2026searchgymbootstrappingrealworldsearch,
title={SearchGym: Bootstrapping Real-World Search Agents via Cost-Effective and High-Fidelity Environment Simulation},
author={Xichen Zhang and Ziyi He and Yinghao Zhu and Sitong Wu and Shaozuo Yu and Meng Chu and Wenhu Zhang and Haoru Tan and Jiaya Jia},
year={2026},
eprint={2601.14615},
archivePrefix={arXiv},
primaryClass={cs.CL}
}This project is built upon the outstanding work of:
- AReaL - A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning and Agents, developed by the AReaL Team at Ant Group and Tsinghua IIIS.
- ASearcher - An Open-Source Large-Scale Reinforcement Learning Project for Search Agents.
We are deeply grateful to the authors and contributors of these projects for their pioneering work in asynchronous RL training and search agent development.
