Skip to content

feat: streamline benchmark quick/all execution flow#9

Merged
outbounder merged 2 commits into
mainfrom
88-make-benchmarks-run-quickly-and-be-easy-to-execute
Apr 17, 2026
Merged

feat: streamline benchmark quick/all execution flow#9
outbounder merged 2 commits into
mainfrom
88-make-benchmarks-run-quickly-and-be-easy-to-execute

Conversation

@outbounder
Copy link
Copy Markdown
Member

Summary

  • add root-level benchmark workflows (bench:quick, bench:all, bench:quick:norerank) driven by scripts/bench-with-stack.sh to start required services, warm benchmark credentials, and run benchmark suites end-to-end
  • improve benchmark resilience and runtime behavior by supporting reranker soft-skip/strict modes, scaling freshness FAISS baseline defaults for quick runs, and adding DB warm-up/credential sync via scripts/bench-warm-db.ts
  • fix repeated benchmark auth/cache issues by loading .env.benchmark in benchmark compose, persisting /root/.cache for HF/model assets, failing fast on 401/403 in the adapter, and updating benchmark docs/SPEC accordingly

Test plan

  • Run BENCH_SKIP_RERANKER=1 npm run bench:quick and verify successful completion (EXIT_CODE=0)
  • Confirm benchmark archives are created under tests/benchmarks/runs/ for freshness/hotpot/msmarco quick runs
  • Re-run BENCH_SKIP_RERANKER=1 npm run bench:quick and confirm faster second-run execution with cache in place

Made with Cursor

Add root-level bench commands that start required services, warm benchmark credentials, and support optional reranker/FAISS skips while fixing benchmark container env/cache behavior and auth failure handling.

Made-with: Cursor
Make OpenAI/db type usage compatible across SDK and TS lib variants, and scope root lint to the configured workspace so CI lint/typecheck checks pass reliably.

Made-with: Cursor
@outbounder outbounder merged commit e5a6fc1 into main Apr 17, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant