Skip to content

Bruce-anle/OpenSeeker

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data

Benchmark Results

📰 News

  • 2026.03.20 📣 We’ve released a new batch of higher-quality, more challenging data. For details, please contact us at yr991129@sjtu.edu.cn.
  • 2026.03.17 🚀 We open-sourced OpenSeeker-v1 (all data and models). Using 11.7K training examples, we fine-tuned Qwen3-30B-A3B-Thinking-2507 and achieved scores of 48.4 on BrowseComp-ZH, 29.5 on BrowseComp, 74.0 on xbench-DeepSearch, and 59.4 on WideSearch.

Overview

OpenSeeker is an open-source search agent system that democratizes access to frontier search capabilities by fully open-sourcing its training data. This project enables researchers and developers to build, evaluate, and deploy advanced search agents for complex information-seeking tasks.


🌟 Key Achievement

OpenSeeker represents the first work by a purely academic team to achieve state-of-the-art performance on frontier search benchmarks while simultaneously open-sourcing the full training data.


Quick Start

Installation

Clone the repository and set up the environment:

# Clone repository
git clone https://github.com/rui-ye/OpenSeeker.git
cd OpenSeeker

# Create conda environment
conda create --name openseeker python=3.10
conda activate openseeker
pip install -r requirements.txt

Model Setup

Download and deploy the OpenSeeker model:

# 1. Install git-xet (required for downloading the model)
brew install git-xet
git xet install

# 2. Clone the OpenSeeker model repository
git clone https://huggingface.co/OpenSeeker/OpenSeeker-v1-30B-SFT

# 3. Update MODEL_PATH in run_openseeker.sh to point to the downloaded model directory
# Edit run_openseeker.sh and set MODEL_PATH="/path/to/OpenSeeker-v1-30B-SFT"

# 4. Deploy the model server
bash run_openseeker.sh

Configuration

# Edit setup_env.sh with your API endpoints and keys
source setup_env.sh

Usage

Generate answers and evaluate results:

# Generate answers for your dataset
python eval/generate_answer.py \
    --dataset_path /path/to/your/dataset.jsonl \
    --out_dir /path/to/output/directory

# Evaluate the generated results
python eval/eval.py \
    --data_path /path/to/output/directory/result_tool200.jsonl \
    --max_workers 20

Project Structure

OpenSeeker/
├── eval/                    # Evaluation scripts
│   ├── eval.py             # Main evaluation script
│   ├── generate_answer.py  # Answer generation script
│   └── prompt.py           # Prompt templates
├── src/                     # Core source code
│   ├── llm_tool_openseeker.py  # LLM tool interface
│   ├── config/             # Configuration files
│   │   └── chat_template.jinja  # Chat template configuration
│   └── tools/               # Tool implementations
│       ├── search.py       # Search tool
│       └── visit.py        # Web visit tool
├── run_openseeker.sh       # Model server startup script
├── setup_env.sh            # Environment variable template
└── README.md               # This file

📚 Citation

If you find OpenSeeker useful in your research, please consider citing:

@misc{du2026openseekerdemocratizingfrontiersearch,
  title        = {OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data},
  author       = {Yuwen Du and Rui Ye and Shuo Tang and Xinyu Zhu and Yijun Lu and Yuzhu Cai and Siheng Chen},
  year         = {2026},
  eprint       = {2603.15594},
  archivePrefix= {arXiv},
  primaryClass = {cs.AI},
  url          = {https://arxiv.org/abs/2603.15594},
}

⭐ Star History

Star History Chart

About

OpenSeeker: A search agent with open-source data and models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 88.3%
  • Shell 6.9%
  • Jinja 4.8%