Skip to content

feat: add seeded run creation#730

Merged
Atharva-Kanherkar merged 2 commits into
mainfrom
codex/roadmap-693-run-seeds
May 10, 2026
Merged

feat: add seeded run creation#730
Atharva-Kanherkar merged 2 commits into
mainfrom
codex/roadmap-693-run-seeds

Conversation

@Atharva-Kanherkar
Copy link
Copy Markdown
Collaborator

@Atharva-Kanherkar Atharva-Kanherkar commented May 10, 2026

Summary

  • add agentclash run create --seeds N to create a seeded eval session with one child run per seed
  • persist explicit seed fanout metadata on the eval-session routing snapshot and child run execution plans
  • surface seeded_runs from eval-session creation and seed on eval-session child-run reads
  • support --max-iter for seeded eval sessions and document the new API shape

Compatibility note

  • buildEvalSessionBody now sends execution_mode: "comparison" for multi-deployment eval sessions. This is the backend-supported value; the previous CLI value, "multi_agent", was rejected by the eval-session API, so this fixes the existing --repetitions multi-deployment path rather than changing stored backend semantics.

Tests

  • cd backend && go test ./internal/api -run 'TestGetEvalSessionEndpointReturnsDetail|TestCreateEvalSessionEndpointReturnsCreated|TestDecodeEvalSessionConfigRejectsInvalidSeedFanout|TestRunCreationManagerCreateEvalSessionPersistsSeedFanout' -count=1
  • cd cli && go test ./cmd -run 'TestBuildSeededEvalSessionBody|TestRunCreateSeedsRoutesToEvalSessions|TestRunCreateSeedsRejectsFollow|TestBuildSeededEvalSessionBodyRejectsUnsupportedScope|TestBuildSeededEvalSessionBodyRejectsInvalidFlagRanges|TestBuildEvalSessionBody_MultiDeployment_LabelsAndMode' -count=1
  • git diff --check
  • cd backend && go test ./...
  • cd backend && go vet ./...
  • cd cli && go build ./...
  • cd cli && go vet ./...
  • cd cli && go test -short -race -count=1 ./...
  • cd cli && go run github.com/goreleaser/goreleaser/v2@latest check

Fixes #699

@Atharva-Kanherkar
Copy link
Copy Markdown
Collaborator Author

@greptileai review

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 10, 2026

Greptile Summary

This PR adds agentclash run create --seeds N to create a seeded eval session with one child run per seed (seeds [1..N]), persisting seed metadata in both each run's ExecutionPlan and the eval-session routing snapshot, and surfacing it through seeded_runs on creation and seed on child-run reads. It also wires --max-iter support into seeded eval sessions and fixes the existing multi-deployment execution_mode value from the rejected "multi_agent" to the backend-accepted "comparison".

  • Seeded fanout: buildSeededEvalSessionBody generates sequential seeds [1..N], embeds them per-run in ExecutionPlan, and records the fanout in the routing snapshot. The seededRuns mapping in the response is derived from each returned run's ExecutionPlan seed (not from insertion order), making it resilient to any repository-level reordering.
  • Validation: decodeEvalSessionSeedFanout enforces strategy=\"explicit\", seed count equals repetitions, all values ≥ 1, and no duplicates; --max-iter, --follow, race-context flags, and suite_only scope are all explicitly rejected or handled for the seeded path.
  • Backend + CLI test coverage: new tests cover seed persistence, fanout snapshot shape, HTTP request decoding, and validation rejections.

Confidence Score: 5/5

Safe to merge — all changed paths have clear validation and the seed-to-run mapping is derived from durable execution-plan data rather than insertion order.

The seeded fanout feature is well-contained: seeds are embedded per-run in the execution plan at write time and re-read from it at response time, so the mapping survives any repository-level reordering. Validation in both the CLI and the backend handler covers range, uniqueness, strategy, and incompatible flag combinations. No auth or data-integrity boundaries are affected by the change.

No files require special attention.

Important Files Changed

Filename Overview
backend/internal/api/eval_session_service.go Core seeded-run logic: moves executionPlan build inside the per-repetition loop, injects seed from SeedFanout[i], and derives seededRuns from each run's ExecutionPlan rather than positional matching.
backend/internal/api/eval_sessions.go Adds MaxIterations decoding/validation and decodeEvalSessionSeedFanout; validates strategy, array length, positive integers, no duplicates.
backend/internal/api/eval_session_reads.go Adds Seed field to child-run response; evalSessionChildRunSeed correctly extracts seed from ExecutionPlan JSON and filters seeds < 1.
cli/cmd/run_create_helpers.go Adds buildSeededEvalSessionBody; correctly zeroes MaxIterations before calling buildEvalSessionBody, then injects it top-level; seeds generated as sequential int64 [1..N].
cli/cmd/run.go Routes seeds > 0 to seeded eval session path; --follow + --seeds returns a clear error.
cli/cmd/eval_session_helpers.go Changes executionMode to 'comparison' for multi-deployment eval sessions; adds seeded-run seed display in presentCreatedEvalSession.
docs/api-server/openapi.yaml Documents max_iterations, seed_fanout, EvalSessionSeedFanoutConfig, EvalSessionSeededRun, and seed on child runs; constraints match backend validation.

Sequence Diagram

sequenceDiagram
    participant CLI as CLI (run create --seeds N)
    participant API as POST /v1/eval-sessions
    participant Svc as RunCreationManager
    participant Repo as Repository

    CLI->>CLI: "buildSeededEvalSessionBody seeds=[1..N]"
    CLI->>API: "POST body {seed_fanout, max_iterations}"
    API->>API: decodeEvalSessionSeedFanout validate
    API->>Svc: CreateEvalSession(input)
    loop repetition 0..N-1
        Svc->>Svc: buildExecutionPlan(runInput + seed[i])
        Svc->>Svc: append childRun with ExecutionPlan
    end
    Svc->>Svc: buildRoutingTaskSnapshot embed seed_fanout
    Svc->>Repo: CreateEvalSessionWithQueuedRuns
    Repo-->>Svc: Session + Runs[]
    loop each returned Run
        Svc->>Svc: evalSessionChildRunSeed(run.ExecutionPlan)
    end
    Svc-->>API: Session + RunIDs + SeededRuns
    API-->>CLI: eval_session + run_ids + seeded_runs
    CLI->>CLI: print run_id (seed N) per line
Loading

Reviews (2): Last reviewed commit: "fix: derive seeded run mapping from plan..." | Re-trigger Greptile

@@ -48,7 +48,7 @@ func buildEvalSessionBody(workspaceID string, request runCreateRequest, repetiti

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Silent behavioral change for existing --repetitions multi-deployment users

Changing executionMode from "multi_agent" to "comparison" in buildEvalSessionBody affects all multi-deployment eval sessions (not only seeded ones), including agentclash run create --repetitions N --deployments A,B. If any backend or downstream consumers previously accepted only "multi_agent", those calls will start sending "comparison" after this release with no migration notice. The correction appears intentional, but it might be worth noting in the PR or release notes since it's a silent breaking change for existing callers on the repetitions path.

Prompt To Fix With AI
This is a comment left during a code review.
Path: cli/cmd/eval_session_helpers.go
Line: 48

Comment:
**Silent behavioral change for existing `--repetitions` multi-deployment users**

Changing `executionMode` from `"multi_agent"` to `"comparison"` in `buildEvalSessionBody` affects all multi-deployment eval sessions (not only seeded ones), including `agentclash run create --repetitions N --deployments A,B`. If any backend or downstream consumers previously accepted only `"multi_agent"`, those calls will start sending `"comparison"` after this release with no migration notice. The correction appears intentional, but it might be worth noting in the PR or release notes since it's a silent breaking change for existing callers on the repetitions path.

How can I resolve this? If you propose a fix, please make it concise.

Fix in Codex

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted in the PR body under Compatibility note. This is an intentional correction to the backend-supported eval-session value: the existing multi_agent payload was rejected by the API, while comparison is already the accepted execution mode for multi-participant eval sessions.

@Atharva-Kanherkar
Copy link
Copy Markdown
Collaborator Author

@greptileai review

@Atharva-Kanherkar Atharva-Kanherkar merged commit d24dd6b into main May 10, 2026
7 checks passed
@Atharva-Kanherkar Atharva-Kanherkar deleted the codex/roadmap-693-run-seeds branch May 10, 2026 12:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[roadmap #693] Run create seeds

1 participant