Skip to content

fix: stop rewriting agentcore.json on deploy and clean up evo resources on teardown#1072

Merged
notgitika merged 1 commit intomainfrom
fix/deploy-preflight-teardown-v2
Apr 30, 2026
Merged

fix: stop rewriting agentcore.json on deploy and clean up evo resources on teardown#1072
notgitika merged 1 commit intomainfrom
fix/deploy-preflight-teardown-v2

Conversation

@notgitika
Copy link
Copy Markdown
Contributor

Summary

Two fixes for issues identified during #1068 review:

1. Deploy preflight no longer rewrites agentcore.json

validateProject() was injecting type: "ConfigurationBundle" into config bundle entries and writing the file back to disk on every agentcore deploy. This caused surprise git diffs and clobbered file formatting. The type discriminator is now applied in-memory only — the CDK reads the patched object at synth time without touching the user's file.

2. Config bundles and AB tests cleaned up on stack teardown

performStackTeardown already deleted HTTP gateways before destroying the CFN stack, but config bundles and AB tests were left orphaned in AWS. Now iterates deployedState.targets[target].resources.configBundles and .abTests and deletes them, matching the existing HTTP gateway pattern.

Closes #1070

Test plan

  • npm run typecheck passes
  • Pre-commit hooks pass (eslint, prettier, secretlint)
  • Manual: agentcore deploy with config bundles does not modify agentcore.json on disk
  • Manual: agentcore remove all + agentcore deploy cleans up config bundles and AB tests

@notgitika notgitika requested a review from a team April 30, 2026 22:54
@github-actions github-actions Bot added size/s PR size: S agentcore-harness-reviewing AgentCore Harness review in progress labels Apr 30, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 30, 2026

Coverage Report

Status Category Percentage Covered / Total
🔵 Lines 42.89% 8930 / 20820
🔵 Statements 42.15% 9476 / 22478
🔵 Functions 39.66% 1537 / 3875
🔵 Branches 39.86% 5742 / 14404
Generated in workflow #2253 for commit df4f480 by the Vitest Coverage Report Action

@agentcore-cli-automation
Copy link
Copy Markdown

Preflight change may not actually achieve its goal — CDK subprocess reads agentcore.json fresh from disk

The previous code patched the raw JSON on disk specifically because the CDK synth runs as a separate Node subprocess (node dist/bin/cdk.js spawned by @aws-cdk/toolkit-lib). That subprocess calls configIO.readProjectSpec() itself and reads the file fresh from disk — it never sees the in-memory object mutated in validateProject().

See src/assets/cdk/bin/cdk.ts:

const configIO = new ConfigIO({ baseDir: configRoot });
const spec = await configIO.readProjectSpec();  // fresh read from disk

So the PR description's claim — "the CDK reads the patched object at synth time without touching the user's file" — isn't accurate. The in-memory patch is dead code with respect to the CDK subprocess.

If the published @aws/agentcore-cdk schema genuinely requires the type discriminator without a default (which was the stated motivation of the original on-disk patch: "The CDK package now requires this discriminator during synthesis validation"), this PR reintroduces the synthesis failure for any agentcore.json written before the type field existed.

Also worth noting: the PR test plan has Manual: agentcore deploy with config bundles does not modify agentcore.json on disk unchecked, so the end-to-end path has not been verified.

A few ways to fix:

  1. Verify the CDK schema actually uses .default('ConfigurationBundle') (like src/schema/schemas/primitives/config-bundle.ts here does). If it does, the original write-to-disk was never needed and this PR is fine — but please confirm with a real deploy against an agentcore.json that has a config bundle without type and update the PR description accordingly.
  2. Keep the on-disk patch but preserve formatting — e.g., read the raw JSON, patch only if needed, and write back using the same indentation/newline style the file already had. This preserves the original fix while reducing spurious diffs. (Still churns the file once per user, but only on the first deploy after upgrade.)
  3. Migrate once on upgrade — detect missing type during agentcore startup or a dedicated migration step, patch on disk once with a clear log message, and never touch it again in validateProject. This makes the rewrite a user-visible, explicit migration rather than a silent per-deploy side effect.
  4. Push a default into the CDK package's schema so consumers don't need any patch.

@agentcore-cli-automation
Copy link
Copy Markdown

Teardown of AB tests is missing three things the normal removal flow does

The new teardown loop over abTests just calls deleteABTest(...) directly, but compare against deleteOrphanedABTests / the main AB test deletion path in src/cli/operations/deploy/post-deploy-ab-tests.ts:

  1. Running AB tests aren't stopped first. The orphan-cleanup path explicitly stops the test and polls for executionStatus === 'STOPPED' before calling delete, with the comment: "Stop the AB test first — running tests cannot be deleted" (post-deploy-ab-tests.ts:330). If a user runs remove all while an AB test is RUNNING, the delete call in teardown will fail and the test will be left orphaned in AWS — exactly the bug this PR is trying to fix.

  2. CLI-created IAM roles aren't deleted. ABTestDeployedStateSchema carries roleArn and roleCreatedByCli, and the normal deletion path calls deleteABTestRole(region, testState.roleArn) when roleCreatedByCli && roleArn. This PR ignores both fields, so tearing down leaves orphan IAM roles + inline policies in the account. The HTTP gateway cleanup in the same function does this correctly (passes roleArn / roleCreatedByCli into deleteHttpGatewayWithTargets) — AB tests should follow the same pattern.

  3. Ordering: AB tests are deleted after HTTP gateways. The header comment on deleteOrphanedABTests says: "AB tests create rules on HTTP gateways, so they must be deleted before the gateway can be deleted." The current loop order is gateways → bundles → tests, which is the opposite. This could cause the gateway delete to fail on accounts that still have AB test rules attached.

Suggested fix: either

  • Factor the existing orphan-cleanup logic (stop + poll + delete + role cleanup) into a shared helper like deleteABTestWithRole(...) (mirroring deleteHttpGatewayWithTargets) and call that from both deleteOrphanedABTests and teardown, and reorder the loops to do AB tests → HTTP gateways → config bundles, or
  • Have teardown delegate to deleteOrphanedABTests / deleteOrphanedHttpGateways directly by synthesizing an empty projectSpec (everything becomes "orphaned"), which reuses the well-tested paths without duplication.

The config bundle cleanup looks fine as-is — just deleteConfigurationBundle with no side resources to track.

@github-actions github-actions Bot removed the agentcore-harness-reviewing AgentCore Harness review in progress label Apr 30, 2026
…urces

- Preflight: remove config bundle type patching entirely — the CDK
  schema does not have a type field, so neither the on-disk write
  (original) nor the in-memory patch was needed
- Teardown: delegate to deleteOrphanedABTests and deleteOrphanedHttpGateways
  instead of raw deleteABTest/deleteHttpGatewayWithTargets — reuses
  stop/poll/delete/role-cleanup logic. Correct ordering: AB tests first
  (they create rules on gateways), then gateways, then config bundles.

Closes #1070
@notgitika notgitika force-pushed the fix/deploy-preflight-teardown-v2 branch from 65c6431 to df4f480 Compare April 30, 2026 23:08
@github-actions github-actions Bot added size/s PR size: S and removed size/s PR size: S labels Apr 30, 2026
@notgitika notgitika merged commit 0e38e9e into main Apr 30, 2026
24 checks passed
@notgitika notgitika deleted the fix/deploy-preflight-teardown-v2 branch April 30, 2026 23:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/s PR size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: deploy preflight silently rewrites agentcore.json and config bundles/AB tests leak on teardown

3 participants