- 2026.03.20 📣 We’ve released a new batch of higher-quality, more challenging data. For details, please contact us at yr991129@sjtu.edu.cn.
- 2026.03.17 🚀 We open-sourced OpenSeeker-v1 (all data and models). Using 11.7K training examples, we fine-tuned Qwen3-30B-A3B-Thinking-2507 and achieved scores of 48.4 on BrowseComp-ZH, 29.5 on BrowseComp, 74.0 on xbench-DeepSearch, and 59.4 on WideSearch.
OpenSeeker is an open-source search agent system that democratizes access to frontier search capabilities by fully open-sourcing its training data. This project enables researchers and developers to build, evaluate, and deploy advanced search agents for complex information-seeking tasks.
OpenSeeker represents the first work by a purely academic team to achieve state-of-the-art performance on frontier search benchmarks while simultaneously open-sourcing the full training data.
Clone the repository and set up the environment:
# Clone repository
git clone https://github.com/rui-ye/OpenSeeker.git
cd OpenSeeker
# Create conda environment
conda create --name openseeker python=3.10
conda activate openseeker
pip install -r requirements.txtDownload and deploy the OpenSeeker model:
# 1. Install git-xet (required for downloading the model)
brew install git-xet
git xet install
# 2. Clone the OpenSeeker model repository
git clone https://huggingface.co/OpenSeeker/OpenSeeker-v1-30B-SFT
# 3. Update MODEL_PATH in run_openseeker.sh to point to the downloaded model directory
# Edit run_openseeker.sh and set MODEL_PATH="/path/to/OpenSeeker-v1-30B-SFT"
# 4. Deploy the model server
bash run_openseeker.sh# Edit setup_env.sh with your API endpoints and keys
source setup_env.shGenerate answers and evaluate results:
# Generate answers for your dataset
python eval/generate_answer.py \
--dataset_path /path/to/your/dataset.jsonl \
--out_dir /path/to/output/directory
# Evaluate the generated results
python eval/eval.py \
--data_path /path/to/output/directory/result_tool200.jsonl \
--max_workers 20OpenSeeker/
├── eval/ # Evaluation scripts
│ ├── eval.py # Main evaluation script
│ ├── generate_answer.py # Answer generation script
│ └── prompt.py # Prompt templates
├── src/ # Core source code
│ ├── llm_tool_openseeker.py # LLM tool interface
│ ├── config/ # Configuration files
│ │ └── chat_template.jinja # Chat template configuration
│ └── tools/ # Tool implementations
│ ├── search.py # Search tool
│ └── visit.py # Web visit tool
├── run_openseeker.sh # Model server startup script
├── setup_env.sh # Environment variable template
└── README.md # This file
If you find OpenSeeker useful in your research, please consider citing:
@misc{du2026openseekerdemocratizingfrontiersearch,
title = {OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data},
author = {Yuwen Du and Rui Ye and Shuo Tang and Xinyu Zhu and Yijun Lu and Yuzhu Cai and Siheng Chen},
year = {2026},
eprint = {2603.15594},
archivePrefix= {arXiv},
primaryClass = {cs.AI},
url = {https://arxiv.org/abs/2603.15594},
}