Skip to content

feat(cli): synthbench submit scaffold (refs #256)#263

Merged
openclaw-dv merged 2 commits into
mainfrom
feat/submit-cli-scaffold-issue-256
May 14, 2026
Merged

feat(cli): synthbench submit scaffold (refs #256)#263
openclaw-dv merged 2 commits into
mainfrom
feat/submit-cli-scaffold-issue-256

Conversation

@openclaw-dv
Copy link
Copy Markdown
Collaborator

Closes the scaffolding ask in #256.

Summary

  • Adds synthbench submit-adapter — the vendor self-submission CLI requested for the SynthPanel-competitor onboarding path.
  • Adds synthbench.adapter.Adapter abstract base + RandomAdapter reference impl for wiring-up tests.
  • Adds docs/submit.md walking vendors through the three-step flow (adapter → submit → PR).

This is scaffold-only by design (per the issue scope). Inputs are validated end-to-end, but the produced submission.md + run.json are placeholders. The full eval pipeline, run-hash addressing, and leaderboard-PR auto-generation are deferred to follow-up issues against #256.

What's complete

  • synthbench.adapter.Adapter abstract base — name, version, async respond(question, persona, context=None).
  • synthbench.adapter.RandomAdapter reference impl (yes/no/Likert), seeded for deterministic tests.
  • synthbench submit-adapter Click command with --adapter, --vendor, --vendor-version, --api-env-var, --suite (default core), --output-dir (default ./synthbench-submission/).
  • Validation: adapter importable (exit 2), Adapter subclass present (exit 2), API env var set without reading it (exit 3), no-arg constructible (exit 2).
  • Placeholder artifact writer (submission.md + run.json) with the eventual leaderboard PR URL + stubbed verification curl.
  • Filesystem-path and dotted-import adapter resolution (vendors will use file paths; in-tree adapters use dotted).
  • One-line wire-up in src/synthbench/cli.py — no other existing files touched.
  • 5 tests in tests/test_cli_submit_adapter.py covering --help, missing module, missing env var, happy path, and filesystem-path import. All pass (0.03s). Existing test_cli_submit.py + test_cli_run_submit.py (29 tests) still pass.
  • docs/submit.md with worked RandomAdapter example + exit-code table.

TODO (tracked as follow-ups to #256)

  • Drive Adapter.respond through the core suite runner (the actual eval). Stubbed today.
  • Content-addressed run-hash over (adapter id, suite manifest, persona seeds, raw responses). run_hash: null in the scaffold.
  • Auto-generate the leaderboard PR body and open it via gh. Scaffold just prints the target URL.
  • CI workflow validating submission PRs (run-hash reproducibility + registry of accepted adapter shapes).

Architectural decisions punted to a human eye

  1. Command name collision with existing synthbench submit. The repo already has a submit subcommand that posts a pre-computed result.json to the hosted SynthBench API (tests/test_cli_submit.py pins that contract). The new vendor-adapter flow is a different verb (vendor brings code, harness drives the eval), so I registered it as submit-adapter to honor the "don't restructure outside the new files" constraint. Options for final naming:

    • Keep submit-adapter permanently.
    • Rename the existing submitsubmit-result (or push) and reclaim submit for the adapter flow (breaking change for current API users — needs deprecation cycle).
    • Make submit a subgroup with submit result and submit adapter subcommands (also breaking).
      I have no opinion on which; flagging for Wesley/maintainer call.
  2. Module location. The directive specified src/synthbench/cli/submit.py, but cli is a single 2577-line module here, not a package. Converting it to a package would be a much bigger refactor (touches every from synthbench.cli import main import in the test suite). I put the new logic in src/synthbench/submit_adapter.py instead and wired it via one import in cli.py. If maintainers want the CLI broken up into a package eventually, this is a clean seam.

  3. Tests live at tests/test_cli_submit_adapter.py, not tests/cli/test_submit.py. The existing test layout is flat (tests/test_cli_*.py); creating a tests/cli/ subdir would be inconsistent.

  4. RandomAdapter ships in synthbench.adapter rather than under tests/. Reasoning: vendors are explicitly going to point --adapter synthbench.adapter at it as the known-good smoke target while debugging their setup. Test-only placement would make that flow fragile. It's clearly marked "not a baseline" in the docstring.

Test plan

  • pytest tests/test_cli_submit_adapter.py — 5/5 pass
  • pytest tests/test_cli_submit.py tests/test_cli_run_submit.py — 29/29 pass (no regressions to existing submit flow)
  • synthbench submit-adapter --help exits 0 with the documented options
  • synthbench --help shows both submit and submit-adapter
  • ruff check clean on all new files

Refs #256

Agent Mani added 2 commits May 1, 2026 21:05
Scaffolds the vendor self-submission entry point requested in #256.
Wesley's directive: "We want synthpanel competitors to run against
SynthBench" — this is the no-friction path that lets a vendor point
the CLI at an adapter module and produce a leaderboard-ready
artifact, without having to learn the full benchmark harness.

This PR is scaffold-only by design (per the issue's scope). The CLI
surface, adapter contract, and artifact filenames are pinned so
vendors can start integrating against them in parallel with the
real evaluation pipeline build-out.

What's wired up:

  * synthbench.adapter.Adapter — abstract base with name/version
    properties and an async respond(question, persona, context)
    method. Documented contract. Ships a RandomAdapter reference
    impl that emits trivial yes/no/Likert answers, used by the
    submit path's smoke tests and as a known-good target vendors
    can point at while debugging their setup.

  * synthbench.submit_adapter — Click command 'submit-adapter'
    with --adapter / --vendor / --vendor-version / --api-env-var
    / --suite / --output-dir. Validates adapter module is
    importable (exit 2), API env var is present without ever
    reading the secret (exit 3), then writes placeholder
    submission.md + run.json to --output-dir and prints stubbed
    leaderboard PR URL + verification curl.

  * One-line wire-up in src/synthbench/cli.py to register the
    command with the main group. No restructuring of existing
    commands.

  * tests/test_cli_submit_adapter.py — 5 tests covering --help,
    bad-import, missing env var, happy path (RandomAdapter), and
    filesystem-path adapter loading. All pass in 0.03s.

  * docs/submit.md — vendor workflow doc with a worked
    RandomAdapter example and the exit-code table.

What's out of scope (tracked as follow-ups in the PR body):

  * Driving Adapter.respond through the core suite runner
  * Content-addressed run-hash computation
  * Auto-generating the leaderboard PR via gh
  * CI validation of submission PRs

Naming note: there's already a 'synthbench submit' command that
posts a pre-computed result.json to the hosted API. To avoid
restructuring outside the new files (per Wesley's constraint), this
scaffold registers as 'submit-adapter'. Final naming is flagged in
the PR body for a human call.

Refs #256

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Executed-By: mayor
@openclaw-dv openclaw-dv merged commit b70c474 into main May 14, 2026
8 of 10 checks passed
@openclaw-dv openclaw-dv deleted the feat/submit-cli-scaffold-issue-256 branch May 14, 2026 16:56
openclaw-dv added a commit that referenced this pull request May 14, 2026
Post-merge fix: PRs #263 and #264 introduced these files unformatted.
CI ruff format --check fails on every PR against main as a result.

semver: patch

Co-authored-by: mayor <mani@Wesleys-Mini.localdomain>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant