VibeSearchBench

Hardest — vague multi-turn proactive search in the wild.
Verifiable — schema-free knowledge graph evaluation.
Long-horizon — persona-driven progressive disclosure.

Leaderboard

Browse the full leaderboard and individual task trajectories at vibebench.github.io/VibeSearchBench.github.io.

Evaluation:

Primary metric: Triplet F1. Predicted knowledge graphs are matched against ground truth via LLM-as-judge node alignment and triplet semantic equivalence.
Frameworks: ReAct and OpenClaw, evaluated on VibeSearch-Pro and VibeSearch-Daily.
Best reported score: 30.3 triplet F1 (Claude Opus 4.6, OpenClaw).

Explore: Leaderboard · Task trajectories · Paper

Tasks

200 tasks across 2 subsets and 20 domains. Each task pairs a vague initial query with a ground-truth knowledge graph and a persona simulator.

Split	Count	Description
`pro`	100	Professional research — literature reviews, market analysis, technical due diligence
`daily`	100	Daily-life search — shopping, travel, lifestyle with evolving preferences

Real users rarely specify full intent upfront. VibeSearch captures bidirectional convergence: agents interleave partial results with follow-up questions while users progressively disclose needs. VibeSearchBench evaluates schema-free knowledge graphs via graph matching (Precision / Recall / F1).

Dataset

Available on Hugging Face: VibeSearchBench/VibeSearchBench

Live site

https://vibebench.github.io/VibeSearchBench.github.io/

Static project website for VibeSearchBench. This repo is under the VibeBench org as a project site.

Enable GitHub Pages (required once)

The Publish site to gh-pages workflow builds the site and pushes the gh-pages branch. Then:

Open Settings → Pages: https://github.com/VibeBench/VibeSearchBench.github.io/settings/pages
Build and deployment → Source → Deploy from a branch
Branch gh-pages, folder / (root) → Save
Wait 2–5 min → https://vibebench.github.io/VibeSearchBench.github.io/

If Actions cannot push, enable Settings → Actions → General → Workflow permissions → Read and write.

Update from the main benchmark repo

cd /path/to/VibeSearchBench
bash scripts/publish_github_io.sh

Or build only:

SITE_DIR=../VibeSearchBench.github.io bash scripts/build_website.sh
cd ../VibeSearchBench.github.io && git add -A && git commit -m "Update site" && git push

Trajectory layout

Pro source (jsonl): data/trajs/pro/*.jsonl → viewer JSON via scripts/convert_pro_trajs.py
Daily source (jsonl): data/trajs/claude-opus-4.6_custom_serper_simulated/trajs_reextract/
Viewer (json): data/trajs/pro/ (001.json …), data/trajs/daily/ (task_*.json)

python3 scripts/convert_pro_trajs.py
python3 scripts/build_final_extractions.py
python3 scripts/build_tasks_index.py
python3 scripts/fetch_ground_truth.py

Then commit and push this repository.

VibeSearchBench · Rednote-Hilab & Unipat AI

Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
.cursor/rules		.cursor/rules
.github/workflows		.github/workflows
assets		assets
data		data
scripts		scripts
.gitignore		.gitignore
.nojekyll		.nojekyll
README.md		README.md
favicon.png		favicon.png
index.html		index.html
leaderboard.html		leaderboard.html
tasks.html		tasks.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VibeSearchBench

Leaderboard

Tasks

Dataset

Live site

Enable GitHub Pages (required once)

Update from the main benchmark repo

Trajectory layout

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VibeSearchBench

Leaderboard

Tasks

Dataset

Live site

Enable GitHub Pages (required once)

Update from the main benchmark repo

Trajectory layout

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages