feat: evo preview features — config bundles, batch evaluation, recommendations, AB testing#1068
feat: evo preview features — config bundles, batch evaluation, recommendations, AB testing#1068
Conversation
Add ConfigBundle as a new resource type with full lifecycle: - Schema: ConfigBundleSchema with name validation, component configurations - Primitive: ConfigBundlePrimitive for add/remove operations - API client: SigV4-signed HTTP requests for config bundle CRUD operations - Deploy: post-deploy hook to sync config bundles with control plane - Status: config-bundle resource type in status command - TUI: add wizard (name, description, components, branch, commit message), remove flow, ResourceGraph integration - State: carry forward configBundles across redeploys in buildDeployedState
The signing service must be 'bedrock-agentcore' for all stages, not 'bedrock-agentcore-control' for prod. The endpoint hostname differs from the signing service name.
- Add config bundle post-deploy setup to TUI deploy flow (useDeployFlow) - Add clientToken to config bundle update API call - Add parentVersionIds on update (required by API) - Default branchName to "main" and commitMessage when not specified - Add placeholders for branch/message in TUI wizard - Fallback to find-by-name or create when update fails (stale IDs) - Remove debug logging from actions.ts
- Add `agentcore edit config-bundle` CLI command with --bundle, --components, --components-file, --description, --branch, --message, --json flags - Add interactive TUI wizard for editing config bundles (select bundle, input method, components, commit message, branch name, confirm) - Add diff check to post-deploy: skip API update when components and description are unchanged, avoiding unnecessary version creation - Use getConfigurationBundleVersion instead of getConfigurationBundle to avoid branch-not-found errors on bundles created with different branches - Align default branch name to 'mainline' (API default) instead of 'main' - For updates, inherit branch from current API state when not specified
- post-deploy-config-bundles: 13 tests covering create, update, skip (diff check), delete, branch inheritance, fallback paths, errors - ConfigBundlePrimitive.edit: 7 tests covering component updates, optional field handling, missing bundle errors, field preservation - useEditConfigBundleWizard: 16 tests covering step navigation, setters, goBack, reset, currentIndex tracking, step labels
feat: add configuration bundle support
* chore: remove edit config-bundle command Users should edit agentcore.json directly to update config bundles. Removes the edit CLI command, TUI screens, wizard hooks, and tests. * feat: add config-bundle CLI commands for version history Adds `agentcore config-bundle` with three subcommands: - `versions` — list version history grouped by branch - `get-version` — view specific version details and components - `diff` — client-side deep diff between two versions Also adds filter support (branchName, latestPerBranch, createdBy) to the listConfigurationBundleVersions API client. * feat: add config bundle hub TUI screens Add TUI screens for browsing config bundles, viewing version history with branch grouping, version detail drill-down, and diff comparison between versions. * fix: resolve config bundle versionId when falling back to list API (#49) The Recommendation API requires versionId to be non-null when using configurationBundle input. When resolveBundleByName fell back to the list API (bundle not in deployed state), it returned no versionId, causing a 400 validation error. Now calls getConfigurationBundle after list to fetch the latest versionId. Also adds versionId to the ResolvedBundle interface and returns it from the deployed-state fast path. * chore: remove get-version subcommand from config-bundle CLI The versions --json and diff commands cover all practical use cases. Keeps the command surface lean: versions + diff only.
* feat: add Recommendation API wrappers, CLI commands, and operations layer Implement the Recommendations/Optimization feature for AgentCore CLI: - SigV4-signed HTTP client for Start/Get/List/Delete Recommendation (DP) - Operations layer with orchestration, polling, and local storage - CLI commands: evals recommend, evals recommendation history/delete, run promote - 27 unit tests covering API, storage, and orchestration logic - Live-validated field names and ARN formats against prod API * feat: add recommendation TUI wizard with session discovery and multi-evaluator support - Add full recommendation wizard TUI (type, agent, evaluators, input, trace source, sessions, confirm) - Add session discovery flow: discover sessions from CloudWatch, multi-select specific sessions - Support both CloudWatch logs and session ID trace sources - Pass selected sessionIds to recommendation API cloudwatchLogs config - Add request ID capture and error detail extraction for debugging FAILED recommendations - Fix recommendation API test mocks (add headers for requestId capture) - Add scrollable list support (maxVisibleItems) to MultiSelectList, SelectList, WizardSelect - Wire recommendation screen into App.tsx and EvalHubScreen navigation * feat: add session span fetching, recommendation tests, and TUI integration - Add fetch-session-spans module for retrieving OTEL spans from aws/spans and log records from runtime log groups with session ID filtering - Add comprehensive tests for fetch-session-spans (9 tests) and extend run-recommendation tests (12 new tests covering file input, spans-file trace source, tool-desc auto-fetch, error handling, ARN passthrough) - Wire recommendation hub, history screen, and list/delete CLI commands - Update TUI routing for recommendation flows from eval and run hubs - Add recommendation constants (poll intervals, terminal statuses) * chore: remove list commands and promote stub, fix agents→runtimes rename Remove `agentcore list recommendations` and `agentcore list recommendation --id` commands (top-level `list` command deleted entirely). Remove `run promote` stub. Fix typecheck errors from agents→runtimes schema rename in recommendation files.
#26) * feat: add EvaluationJob resource — schema, primitive, deploy hook, TUI, and tests Phase 1 of EvalJobRunner: CRUD + deploy integration for the EvaluationJob control plane resource. - Schema: EvaluationJobSchema in agentcore.json, deployed state tracking - Primitive: EvaluationJobPrimitive with add/remove lifecycle - AWS client: SigV4-signed HTTP wrappers for EvalJob CP operations - Deploy: post-deploy hook creates/updates/deletes eval jobs imperatively - CFN outputs: parse eval job execution role ARN from stack outputs - TUI: add evaluation-job wizard flow + remove flow integration - Tests: 53 tests across schema, primitive, AWS client, deploy hook, and TUI * feat: add `run evaluation-job` command with DP API wrappers and orchestration - Data plane API wrappers (RunEvaluationJob, GetEvaluationJobRun, ListEvaluationJobRuns) with SigV4 signing against bedrock-agentcore service - Orchestration: resolve job from deployed state, generate runId, start run, poll for completion, fetch results from CW Logs output group - CLI command: `agentcore run evaluation-job --job <name> --session-id <ids...>` with --json output and progress callbacks - Tests: 17 new tests covering DP wrappers, runId generation, orchestration (error handling, polling, CW Logs result parsing) * feat: complete US1/US2 quick wins — run name, cancel, update, stage-aware endpoints - Add --run flag to `run evaluation-job` for custom run name prefixes - Add `run cancel-evaluation-job` command with StopEvaluationJobRun DP API - Add `update evaluation-job` primitive method and CLI subcommands - Add `agentcore update experiment` parent command (backward-compatible) - Make CP/DP endpoints stage-aware via AGENTCORE_STAGE env var (beta/gamma/prod) - Fix beta SigV4 service name (bedrock-agentcore vs bedrock-agentcore-control) - Update AddEvaluationJobFlow success screen with next-steps guidance * feat: add TUI run wizard, progress steps, and local result storage for eval jobs - Add RunEvalJobFlow TUI: select job → enter sessions → name run → confirm → execute - Add StepProgress display during eval job polling (starting → polling → fetching → saving) - Add elapsed time counter during run execution - Add eval-job-storage module: save/load/list run results per job in .cli/eval-job-results/ - Auto-save results on both CLI and TUI paths - Add "Evaluation Job" option to TUI Run screen - Add 9 unit tests for eval-job-storage * feat: add CloudWatch session discovery to eval job TUI wizard - Add source type picker: "Discover from CloudWatch" vs "Enter manually" - Add lookback days input (1-90 days) for CloudWatch discovery - Discover sessions via CW Insights query using agent's runtimeId - Multi-select from discovered sessions with span count + timestamps - Auto-fallback to manual entry when agent not deployed (no runtimeId) - Improve error display: show failed step in StepProgress before transitioning * feat: migrate evaluation from resource CRUD to stateless batch evaluation Replace the old EvaluationJob resource model (create/update/delete via agentcore.json + deploy hooks) with a flat BatchEvaluation API model: - Add `run batch-evaluation` and `run stop-batch-evaluation` CLI commands - Add batch evaluation TUI wizard under the Run menu - Add SigV4 API client for batch eval endpoints (start/get/list/stop) - Add CloudWatch results fetching from outputDataConfig - Remove all old evaluation-job infrastructure: primitive, deploy hook, schema, TUI add/remove screens, CP CRUD operations - Remove evaluationJobs from agentcore.json schema Tested end-to-end on gamma (account 998846730471) with Builtin.Faithfulness evaluator against 3 agent sessions — all returning correct scores. * chore: remove executionRoleArn now that FAS creds are live on gamma The batch evaluation API no longer requires an execution role ARN. Remove the --execution-role CLI option and all executionRoleArn plumbing from the API client and orchestration layer. * Revert "chore: remove executionRoleArn now that FAS creds are live on gamma" This reverts commit f1706ff7ea4b7695d1466e609cde29e38cb00afb. * refactor: move stop-batch-evaluation to top-level stop command Move `agentcore run stop-batch-evaluation` to `agentcore stop batch-evaluation` as a higher-level verb, consistent with pause/resume pattern.
- Restore --days flag on `run eval` (was renamed to --lookback, breaking existing scripts) - Restore onListCloudWatchTraces/onGetCloudWatchTrace handlers in browser-mode.ts from public/main
Coverage Report
|
The AB test CLI flag was renamed from --gateway-arn to --gateway and made optional. Tests now use --runtime instead, matching config-bundle mode defaults.
jariy17
left a comment
There was a problem hiding this comment.
Re-reviewed at HEAD 6e085f4. All previously flagged regressions are resolved:
- ✅
--daysflag restored onrun eval - ✅
onListCloudWatchTraces/onGetCloudWatchTracerestored inagentcore dev - ✅
RESOURCE_SUFFIXisolation restored in e2e import tests - ✅ Version and
agent-inspectordep back to 0.12.2 / 0.3.0 - ✅
PRIVATE_DEV_DISTROconfig reverted
No regressions against the private repo baseline. The 4 issues flagged by agentcore-cli-automation (hardcoded amazonaws.com in recommendation/config-bundle wrappers, stale JSON schema, silent agentcore.json mutation on deploy, config bundle/AB test teardown leak) are separate functional issues worth addressing but not regressions from this PR's changes.
jariy17
left a comment
There was a problem hiding this comment.
Updating review — the 4 issues flagged by agentcore-cli-automation are blocking and need to be addressed before merge.
1. Hardcoded amazonaws.com breaks non-commercial partitions
Files:
src/cli/aws/agentcore-recommendation.ts:228src/cli/aws/agentcore-config-bundles.ts:181
Both hardcode https://bedrock-agentcore..amazonaws.com / https://bedrock-agentcore-control..amazonaws.com. The sibling wrappers in this same PR (agentcore-ab-tests.ts, agentcore-batch-evaluation.ts, agentcore-http-gateways.ts) correctly use dnsSuffix(region) from ./partition. Recommendations and config bundles will silently fail in GovCloud and China partitions.
Fix: import dnsSuffix from ./partition and replace the hardcoded literal in both files.
2. schemas/agentcore.schema.v1.json is stale
The Zod schemas now include configBundles, abTests, and httpGateways as top-level fields on AgentCoreProjectSpecSchema, but the committed JSON schema was not regenerated. Users whose editors validate agentcore.json against the published schema (VS Code, etc.) will see false "property not allowed" errors on every new preview field.
Fix: run npm run build:lib && npm run build:schema and commit the regenerated schemas/agentcore.schema.v1.json.
3. validateProject() silently rewrites agentcore.json on every deploy
File: src/cli/operations/deploy/preflight.ts
The deploy preflight injects type: "ConfigurationBundle" into config bundle entries and writes the file back with JSON.stringify(rawJson, null, 2). This runs on every agentcore deploy, producing surprise git diffs for users and clobbering their file's original formatting (tabs, trailing newlines, key order). The Zod ConfigBundleSchema already applies this default in-memory, so the write-back is unnecessary.
Fix options:
- Fix the CDK side to consume the Zod-parsed spec with defaults applied, and drop this patching code.
- Inject
typeonly into the in-memory object the CDK reads at synth time, without persisting to disk. - If persisting is truly required: only write when
patched === true(already done), preserve trailing newline, and print a warning to the user that their file was modified.
4. Config bundles and AB tests are leaked on stack teardown
performStackTeardown explicitly calls deleteHttpGatewayWithTargets for HTTP gateways before destroying the CFN stack, but there is no equivalent cleanup for config bundles or AB tests. When a user runs agentcore remove all + deploy, the CFN stack is destroyed but any config bundles and AB tests in deployed-state remain orphaned in AWS — silently accumulating charges with no CLI surface to clean them up.
Fix: extend performStackTeardown to iterate deployedState.targets[target].resources.configBundles and .abTests and delete them, mirroring what it already does for httpGateways.
The auto-generated gateway references --runtime, which must exist in the project. Remove noAgent:true and use project.agentName dynamically.
|
Follow up PR for addressing:
|
Summary
Adds preview support for the Evo feature set: config bundles, batch evaluation, recommendations, and AB testing.
Config Bundles [preview]
add config-bundle— add versioned runtime configuration bundlescb versions— list version history for a bundlecb diff— diff two versions of a bundlecb create-branch— create a new branch on an existing bundle--with-config-bundleflag on agent creation auto-wires config bundle supportBatch Evaluation [preview]
run batch-evaluation— run evaluators across all agent sessions in CloudWatchstop batch-evaluation— stop a running batch evaluation[a-zA-Z][a-zA-Z0-9_]{0,47}Recommendations [preview]
run recommendation— optimize system prompts or tool descriptions using agent traces--runtimeflag for multi-component bundlesAB Testing [preview]
Other
agentcore add config-bundleandagentcore add ab-testdocs/config-bundles.md,docs/batch-evaluation.md,docs/recommendations.md)Companion PR
Test plan