feat(cli): synthbench submit scaffold (refs #256)#263
Merged
Conversation
added 2 commits
May 1, 2026 21:05
Scaffolds the vendor self-submission entry point requested in #256. Wesley's directive: "We want synthpanel competitors to run against SynthBench" — this is the no-friction path that lets a vendor point the CLI at an adapter module and produce a leaderboard-ready artifact, without having to learn the full benchmark harness. This PR is scaffold-only by design (per the issue's scope). The CLI surface, adapter contract, and artifact filenames are pinned so vendors can start integrating against them in parallel with the real evaluation pipeline build-out. What's wired up: * synthbench.adapter.Adapter — abstract base with name/version properties and an async respond(question, persona, context) method. Documented contract. Ships a RandomAdapter reference impl that emits trivial yes/no/Likert answers, used by the submit path's smoke tests and as a known-good target vendors can point at while debugging their setup. * synthbench.submit_adapter — Click command 'submit-adapter' with --adapter / --vendor / --vendor-version / --api-env-var / --suite / --output-dir. Validates adapter module is importable (exit 2), API env var is present without ever reading the secret (exit 3), then writes placeholder submission.md + run.json to --output-dir and prints stubbed leaderboard PR URL + verification curl. * One-line wire-up in src/synthbench/cli.py to register the command with the main group. No restructuring of existing commands. * tests/test_cli_submit_adapter.py — 5 tests covering --help, bad-import, missing env var, happy path (RandomAdapter), and filesystem-path adapter loading. All pass in 0.03s. * docs/submit.md — vendor workflow doc with a worked RandomAdapter example and the exit-code table. What's out of scope (tracked as follow-ups in the PR body): * Driving Adapter.respond through the core suite runner * Content-addressed run-hash computation * Auto-generating the leaderboard PR via gh * CI validation of submission PRs Naming note: there's already a 'synthbench submit' command that posts a pre-computed result.json to the hosted API. To avoid restructuring outside the new files (per Wesley's constraint), this scaffold registers as 'submit-adapter'. Final naming is flagged in the PR body for a human call. Refs #256 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Executed-By: mayor
openclaw-dv
added a commit
that referenced
this pull request
May 14, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes the scaffolding ask in #256.
Summary
synthbench submit-adapter— the vendor self-submission CLI requested for the SynthPanel-competitor onboarding path.synthbench.adapter.Adapterabstract base +RandomAdapterreference impl for wiring-up tests.docs/submit.mdwalking vendors through the three-step flow (adapter → submit → PR).This is scaffold-only by design (per the issue scope). Inputs are validated end-to-end, but the produced
submission.md+run.jsonare placeholders. The full eval pipeline, run-hash addressing, and leaderboard-PR auto-generation are deferred to follow-up issues against #256.What's complete
synthbench.adapter.Adapterabstract base —name,version,async respond(question, persona, context=None).synthbench.adapter.RandomAdapterreference impl (yes/no/Likert), seeded for deterministic tests.synthbench submit-adapterClick command with--adapter,--vendor,--vendor-version,--api-env-var,--suite(defaultcore),--output-dir(default./synthbench-submission/).submission.md+run.json) with the eventual leaderboard PR URL + stubbed verification curl.src/synthbench/cli.py— no other existing files touched.tests/test_cli_submit_adapter.pycovering--help, missing module, missing env var, happy path, and filesystem-path import. All pass (0.03s). Existingtest_cli_submit.py+test_cli_run_submit.py(29 tests) still pass.docs/submit.mdwith workedRandomAdapterexample + exit-code table.TODO (tracked as follow-ups to #256)
Adapter.respondthrough thecoresuite runner (the actual eval). Stubbed today.run_hash: nullin the scaffold.gh. Scaffold just prints the target URL.Architectural decisions punted to a human eye
Command name collision with existing
synthbench submit. The repo already has asubmitsubcommand that posts a pre-computedresult.jsonto the hosted SynthBench API (tests/test_cli_submit.pypins that contract). The new vendor-adapter flow is a different verb (vendor brings code, harness drives the eval), so I registered it assubmit-adapterto honor the "don't restructure outside the new files" constraint. Options for final naming:submit-adapterpermanently.submit→submit-result(orpush) and reclaimsubmitfor the adapter flow (breaking change for current API users — needs deprecation cycle).submita subgroup withsubmit resultandsubmit adaptersubcommands (also breaking).I have no opinion on which; flagging for Wesley/maintainer call.
Module location. The directive specified
src/synthbench/cli/submit.py, butcliis a single 2577-line module here, not a package. Converting it to a package would be a much bigger refactor (touches everyfrom synthbench.cli import mainimport in the test suite). I put the new logic insrc/synthbench/submit_adapter.pyinstead and wired it via one import incli.py. If maintainers want the CLI broken up into a package eventually, this is a clean seam.Tests live at
tests/test_cli_submit_adapter.py, nottests/cli/test_submit.py. The existing test layout is flat (tests/test_cli_*.py); creating atests/cli/subdir would be inconsistent.RandomAdapterships insynthbench.adapterrather than undertests/. Reasoning: vendors are explicitly going to point--adapter synthbench.adapterat it as the known-good smoke target while debugging their setup. Test-only placement would make that flow fragile. It's clearly marked "not a baseline" in the docstring.Test plan
pytest tests/test_cli_submit_adapter.py— 5/5 passpytest tests/test_cli_submit.py tests/test_cli_run_submit.py— 29/29 pass (no regressions to existing submit flow)synthbench submit-adapter --helpexits 0 with the documented optionssynthbench --helpshows bothsubmitandsubmit-adapterruff checkclean on all new filesRefs #256