feat: evo preview features — config bundles, batch evaluation, recommendations, AB testing by notgitika · Pull Request #1068 · aws/agentcore-cli

notgitika · 2026-04-30T21:23:09Z

Summary

Adds preview support for the Evo feature set: config bundles, batch evaluation, recommendations, and AB testing.

Config Bundles [preview]

add config-bundle — add versioned runtime configuration bundles
cb versions — list version history for a bundle
cb diff — diff two versions of a bundle
cb create-branch — create a new branch on an existing bundle
--with-config-bundle flag on agent creation auto-wires config bundle support
Config bundle baggage passed on invoke for runtime config injection

Batch Evaluation [preview]

run batch-evaluation — run evaluators across all agent sessions in CloudWatch
stop batch-evaluation — stop a running batch evaluation
Ground truth support (assertions, expected trajectory, turns)
Name validation against API pattern [a-zA-Z][a-zA-Z0-9_]{0,47}

Recommendations [preview]

run recommendation — optimize system prompts or tool descriptions using agent traces
Supports inline, file, and config bundle input sources
Config bundle integration: reads current prompt, writes optimized version back
JSONPath resolution from --runtime flag for multi-component bundles

AB Testing [preview]

Target-based AB test routing
AB test detail screen with p-value significance display

Other

TUI routing fixes for agentcore add config-bundle and agentcore add ab-test
Documentation for all preview features (docs/config-bundles.md, docs/batch-evaluation.md, docs/recommendations.md)
README updated with preview commands and doc links

Companion PR

CDK constructs: aws/agentcore-l3-cdk-constructs (separate PR)

Test plan

Unit tests passing
Manual CLI testing: batch eval (valid name, hyphens rejected, fake evaluator error, ground truth, multiple evaluators, lookback days, stop)
Manual CLI testing: recommendations (inline, file, config bundle, tool descriptions, nonexistent agent)
Manual CLI testing: config bundles (versions, diff, create-branch, add)
Manual CLI testing: status shows config bundles
Validate command passes on current schema

Add ConfigBundle as a new resource type with full lifecycle: - Schema: ConfigBundleSchema with name validation, component configurations - Primitive: ConfigBundlePrimitive for add/remove operations - API client: SigV4-signed HTTP requests for config bundle CRUD operations - Deploy: post-deploy hook to sync config bundles with control plane - Status: config-bundle resource type in status command - TUI: add wizard (name, description, components, branch, commit message), remove flow, ResourceGraph integration - State: carry forward configBundles across redeploys in buildDeployedState

The signing service must be 'bedrock-agentcore' for all stages, not 'bedrock-agentcore-control' for prod. The endpoint hostname differs from the signing service name.

- Add config bundle post-deploy setup to TUI deploy flow (useDeployFlow) - Add clientToken to config bundle update API call - Add parentVersionIds on update (required by API) - Default branchName to "main" and commitMessage when not specified - Add placeholders for branch/message in TUI wizard - Fallback to find-by-name or create when update fails (stale IDs) - Remove debug logging from actions.ts

- Add `agentcore edit config-bundle` CLI command with --bundle, --components, --components-file, --description, --branch, --message, --json flags - Add interactive TUI wizard for editing config bundles (select bundle, input method, components, commit message, branch name, confirm) - Add diff check to post-deploy: skip API update when components and description are unchanged, avoiding unnecessary version creation - Use getConfigurationBundleVersion instead of getConfigurationBundle to avoid branch-not-found errors on bundles created with different branches - Align default branch name to 'mainline' (API default) instead of 'main' - For updates, inherit branch from current API state when not specified

- post-deploy-config-bundles: 13 tests covering create, update, skip (diff check), delete, branch inheritance, fallback paths, errors - ConfigBundlePrimitive.edit: 7 tests covering component updates, optional field handling, missing bundle errors, field preservation - useEditConfigBundleWizard: 16 tests covering step navigation, setters, goBack, reset, currentIndex tracking, step labels

feat: add configuration bundle support

* chore: remove edit config-bundle command Users should edit agentcore.json directly to update config bundles. Removes the edit CLI command, TUI screens, wizard hooks, and tests. * feat: add config-bundle CLI commands for version history Adds `agentcore config-bundle` with three subcommands: - `versions` — list version history grouped by branch - `get-version` — view specific version details and components - `diff` — client-side deep diff between two versions Also adds filter support (branchName, latestPerBranch, createdBy) to the listConfigurationBundleVersions API client. * feat: add config bundle hub TUI screens Add TUI screens for browsing config bundles, viewing version history with branch grouping, version detail drill-down, and diff comparison between versions. * fix: resolve config bundle versionId when falling back to list API (#49) The Recommendation API requires versionId to be non-null when using configurationBundle input. When resolveBundleByName fell back to the list API (bundle not in deployed state), it returned no versionId, causing a 400 validation error. Now calls getConfigurationBundle after list to fetch the latest versionId. Also adds versionId to the ResolvedBundle interface and returns it from the deployed-state fast path. * chore: remove get-version subcommand from config-bundle CLI The versions --json and diff commands cover all practical use cases. Keeps the command surface lean: versions + diff only.

* feat: add Recommendation API wrappers, CLI commands, and operations layer Implement the Recommendations/Optimization feature for AgentCore CLI: - SigV4-signed HTTP client for Start/Get/List/Delete Recommendation (DP) - Operations layer with orchestration, polling, and local storage - CLI commands: evals recommend, evals recommendation history/delete, run promote - 27 unit tests covering API, storage, and orchestration logic - Live-validated field names and ARN formats against prod API * feat: add recommendation TUI wizard with session discovery and multi-evaluator support - Add full recommendation wizard TUI (type, agent, evaluators, input, trace source, sessions, confirm) - Add session discovery flow: discover sessions from CloudWatch, multi-select specific sessions - Support both CloudWatch logs and session ID trace sources - Pass selected sessionIds to recommendation API cloudwatchLogs config - Add request ID capture and error detail extraction for debugging FAILED recommendations - Fix recommendation API test mocks (add headers for requestId capture) - Add scrollable list support (maxVisibleItems) to MultiSelectList, SelectList, WizardSelect - Wire recommendation screen into App.tsx and EvalHubScreen navigation * feat: add session span fetching, recommendation tests, and TUI integration - Add fetch-session-spans module for retrieving OTEL spans from aws/spans and log records from runtime log groups with session ID filtering - Add comprehensive tests for fetch-session-spans (9 tests) and extend run-recommendation tests (12 new tests covering file input, spans-file trace source, tool-desc auto-fetch, error handling, ARN passthrough) - Wire recommendation hub, history screen, and list/delete CLI commands - Update TUI routing for recommendation flows from eval and run hubs - Add recommendation constants (poll intervals, terminal statuses) * chore: remove list commands and promote stub, fix agents→runtimes rename Remove `agentcore list recommendations` and `agentcore list recommendation --id` commands (top-level `list` command deleted entirely). Remove `run promote` stub. Fix typecheck errors from agents→runtimes schema rename in recommendation files.

#26) * feat: add EvaluationJob resource — schema, primitive, deploy hook, TUI, and tests Phase 1 of EvalJobRunner: CRUD + deploy integration for the EvaluationJob control plane resource. - Schema: EvaluationJobSchema in agentcore.json, deployed state tracking - Primitive: EvaluationJobPrimitive with add/remove lifecycle - AWS client: SigV4-signed HTTP wrappers for EvalJob CP operations - Deploy: post-deploy hook creates/updates/deletes eval jobs imperatively - CFN outputs: parse eval job execution role ARN from stack outputs - TUI: add evaluation-job wizard flow + remove flow integration - Tests: 53 tests across schema, primitive, AWS client, deploy hook, and TUI * feat: add `run evaluation-job` command with DP API wrappers and orchestration - Data plane API wrappers (RunEvaluationJob, GetEvaluationJobRun, ListEvaluationJobRuns) with SigV4 signing against bedrock-agentcore service - Orchestration: resolve job from deployed state, generate runId, start run, poll for completion, fetch results from CW Logs output group - CLI command: `agentcore run evaluation-job --job <name> --session-id <ids...>` with --json output and progress callbacks - Tests: 17 new tests covering DP wrappers, runId generation, orchestration (error handling, polling, CW Logs result parsing) * feat: complete US1/US2 quick wins — run name, cancel, update, stage-aware endpoints - Add --run flag to `run evaluation-job` for custom run name prefixes - Add `run cancel-evaluation-job` command with StopEvaluationJobRun DP API - Add `update evaluation-job` primitive method and CLI subcommands - Add `agentcore update experiment` parent command (backward-compatible) - Make CP/DP endpoints stage-aware via AGENTCORE_STAGE env var (beta/gamma/prod) - Fix beta SigV4 service name (bedrock-agentcore vs bedrock-agentcore-control) - Update AddEvaluationJobFlow success screen with next-steps guidance * feat: add TUI run wizard, progress steps, and local result storage for eval jobs - Add RunEvalJobFlow TUI: select job → enter sessions → name run → confirm → execute - Add StepProgress display during eval job polling (starting → polling → fetching → saving) - Add elapsed time counter during run execution - Add eval-job-storage module: save/load/list run results per job in .cli/eval-job-results/ - Auto-save results on both CLI and TUI paths - Add "Evaluation Job" option to TUI Run screen - Add 9 unit tests for eval-job-storage * feat: add CloudWatch session discovery to eval job TUI wizard - Add source type picker: "Discover from CloudWatch" vs "Enter manually" - Add lookback days input (1-90 days) for CloudWatch discovery - Discover sessions via CW Insights query using agent's runtimeId - Multi-select from discovered sessions with span count + timestamps - Auto-fallback to manual entry when agent not deployed (no runtimeId) - Improve error display: show failed step in StepProgress before transitioning * feat: migrate evaluation from resource CRUD to stateless batch evaluation Replace the old EvaluationJob resource model (create/update/delete via agentcore.json + deploy hooks) with a flat BatchEvaluation API model: - Add `run batch-evaluation` and `run stop-batch-evaluation` CLI commands - Add batch evaluation TUI wizard under the Run menu - Add SigV4 API client for batch eval endpoints (start/get/list/stop) - Add CloudWatch results fetching from outputDataConfig - Remove all old evaluation-job infrastructure: primitive, deploy hook, schema, TUI add/remove screens, CP CRUD operations - Remove evaluationJobs from agentcore.json schema Tested end-to-end on gamma (account 998846730471) with Builtin.Faithfulness evaluator against 3 agent sessions — all returning correct scores. * chore: remove executionRoleArn now that FAS creds are live on gamma The batch evaluation API no longer requires an execution role ARN. Remove the --execution-role CLI option and all executionRoleArn plumbing from the API client and orchestration layer. * Revert "chore: remove executionRoleArn now that FAS creds are live on gamma" This reverts commit f1706ff7ea4b7695d1466e609cde29e38cb00afb. * refactor: move stop-batch-evaluation to top-level stop command Move `agentcore run stop-batch-evaluation` to `agentcore stop batch-evaluation` as a higher-level verb, consistent with pause/resume pattern.

- Restore --days flag on `run eval` (was renamed to --lookback, breaking existing scripts) - Restore onListCloudWatchTraces/onGetCloudWatchTrace handlers in browser-mode.ts from public/main

github-actions · 2026-04-30T22:12:45Z

Coverage Report

Status	Category	Percentage	Covered / Total
🔵	Lines	42.91%	8934 / 20817
🔵	Statements	42.18%	9480 / 22475
🔵	Functions	39.66%	1537 / 3875
🔵	Branches	39.89%	5744 / 14397

Generated in workflow #2250 for commit 90939c2 by the Vitest Coverage Report Action

The AB test CLI flag was renamed from --gateway-arn to --gateway and made optional. Tests now use --runtime instead, matching config-bundle mode defaults.

jariy17

Re-reviewed at HEAD 6e085f4. All previously flagged regressions are resolved:

✅ --days flag restored on run eval
✅ onListCloudWatchTraces / onGetCloudWatchTrace restored in agentcore dev
✅ RESOURCE_SUFFIX isolation restored in e2e import tests
✅ Version and agent-inspector dep back to 0.12.2 / 0.3.0
✅ PRIVATE_DEV_DISTRO config reverted

No regressions against the private repo baseline. The 4 issues flagged by agentcore-cli-automation (hardcoded amazonaws.com in recommendation/config-bundle wrappers, stale JSON schema, silent agentcore.json mutation on deploy, config bundle/AB test teardown leak) are separate functional issues worth addressing but not regressions from this PR's changes.

jariy17

Updating review — the 4 issues flagged by agentcore-cli-automation are blocking and need to be addressed before merge.

1. Hardcoded `amazonaws.com` breaks non-commercial partitions

Files:

src/cli/aws/agentcore-recommendation.ts:228
src/cli/aws/agentcore-config-bundles.ts:181

Both hardcode https://bedrock-agentcore..amazonaws.com / https://bedrock-agentcore-control..amazonaws.com. The sibling wrappers in this same PR (agentcore-ab-tests.ts, agentcore-batch-evaluation.ts, agentcore-http-gateways.ts) correctly use dnsSuffix(region) from ./partition. Recommendations and config bundles will silently fail in GovCloud and China partitions.

Fix: import dnsSuffix from ./partition and replace the hardcoded literal in both files.

2. `schemas/agentcore.schema.v1.json` is stale

The Zod schemas now include configBundles, abTests, and httpGateways as top-level fields on AgentCoreProjectSpecSchema, but the committed JSON schema was not regenerated. Users whose editors validate agentcore.json against the published schema (VS Code, etc.) will see false "property not allowed" errors on every new preview field.

Fix: run npm run build:lib && npm run build:schema and commit the regenerated schemas/agentcore.schema.v1.json.

3. `validateProject()` silently rewrites `agentcore.json` on every deploy

File: src/cli/operations/deploy/preflight.ts

The deploy preflight injects type: "ConfigurationBundle" into config bundle entries and writes the file back with JSON.stringify(rawJson, null, 2). This runs on every agentcore deploy, producing surprise git diffs for users and clobbering their file's original formatting (tabs, trailing newlines, key order). The Zod ConfigBundleSchema already applies this default in-memory, so the write-back is unnecessary.

Fix options:

Fix the CDK side to consume the Zod-parsed spec with defaults applied, and drop this patching code.
Inject type only into the in-memory object the CDK reads at synth time, without persisting to disk.
If persisting is truly required: only write when patched === true (already done), preserve trailing newline, and print a warning to the user that their file was modified.

4. Config bundles and AB tests are leaked on stack teardown

performStackTeardown explicitly calls deleteHttpGatewayWithTargets for HTTP gateways before destroying the CFN stack, but there is no equivalent cleanup for config bundles or AB tests. When a user runs agentcore remove all + deploy, the CFN stack is destroyed but any config bundles and AB tests in deployed-state remain orphaned in AWS — silently accumulating charges with no CLI surface to clean them up.

Fix: extend performStackTeardown to iterate deployedState.targets[target].resources.configBundles and .abTests and delete them, mirroring what it already does for httpGateways.

The auto-generated gateway references --runtime, which must exist in the project. Remove noAgent:true and use project.agentName dynamically.

notgitika · 2026-04-30T22:45:47Z

Follow up PR for addressing:

validateProject() silently rewrites agentcore.json on every deploy
Config bundles and AB tests are leaked on stack teardown

stale

avi-alpert and others added 30 commits March 5, 2026 13:20

feat: add sync workflow

d15ce2e

fix: formatting

7d35986

fix: only sync to main branch

2acb841

fix: codeql permissions

05258f3

Merge pull request #1 from aws/aalpert/workflow

216140f

chore: sync main with public/main

c6099d4

Merge remote-tracking branch 'public/main'

c2bef91

chore: sync main with public/main

1a361af

chore: sync main with public/main

cc83d81

chore: sync main with public/main

6054e95

chore: sync main with public/main

11ec86e

chore: sync main with public/main

1d64fd8

chore: sync main with public/main

4c2c674

Merge remote-tracking branch 'origin/main'

ee71ff3

chore: sync main with public/main

cfd1cdb

chore: sync main with public/main

93d7bbc

chore: sync main with public/main

d0c495f

chore: sync main with public/main

c2121ec

fix: use correct SigV4 service name for config bundle API

c190b0c

The signing service must be 'bedrock-agentcore' for all stages, not 'bedrock-agentcore-control' for prod. The endpoint hostname differs from the signing service name.

fix: use nullish coalescing for branchName default

f1e34d2

fix: address review comments

f671122

fix: remove duplicate config-bundle subcommand from edit command

bb013c1

Merge pull request #37 from aws/feat/config-bundles

45f98e0

feat: add configuration bundle support

fix: restore --days flag on run eval and CloudWatch trace handlers

cfe8e67

- Restore --days flag on `run eval` (was renamed to --lookback, breaking existing scripts) - Restore onListCloudWatchTraces/onGetCloudWatchTrace handlers in browser-mode.ts from public/main