Skip to content

refactor(infra): route backend ECS_TASK_DEFINITION through SSM (fix cross-stack lock)#329

Merged
prez2307 merged 1 commit into
mainfrom
chore/ssm-task-def-arn
Apr 21, 2026
Merged

refactor(infra): route backend ECS_TASK_DEFINITION through SSM (fix cross-stack lock)#329
prez2307 merged 1 commit into
mainfrom
chore/ssm-task-def-arn

Conversation

@prez2307
Copy link
Copy Markdown
Contributor

Summary

Replaces the cross-stack Fn::ImportValue on the task-def revision ARN with an SSM parameter read via ecs.Secret.fromSsmParameter. Durable fix for the 2026-04-20 "export is in use" deploy lock.

The core problem

service-stack.ts imported container.openclawTaskDef.taskDefinitionArn — a value that includes the task-def revision and changes on every image bump. CFN blocked the producer stack from updating or deleting an export while a consumer stack still referenced it via Fn::ImportValue. Every prod prod.tag bump attempt rolled back.

New design

  • Producer (container-stack.ts): writes the ARN into SSM at /isol8/{env}/openclaw-task-def-arn. Exposes the param (not the ARN) to consumers.
  • Consumer (service-stack.ts): moves ECS_TASK_DEFINITION from environmentsecrets via ecs.Secret.fromSsmParameter(param). ECS resolves at task start, so a backend task refresh always sees the current revision. CDK auto-grants ssm:GetParameters to the execution role.
  • What the cross-stack import references now: the SSM parameter name — a stable string (/isol8/{env}/openclaw-task-def-arn). That doesn't change per task-def revision, so the lock can't recur.

Transitional exportValue

Prod-service's LIVE template still holds Fn::ImportValue on the old auto-generated export (previous deploys skipped the service-stack update). Without a workaround, CDK would emit a DELETE on that export in the new synth and CFN would reject it. I added this.exportValue(this.openclawTaskDef.taskDefinitionArn) in container-stack so the old export name persists through this transition. Remove in a cleanup PR once both envs have deployed past the drop.

Deploy flow expected

  • isol8-{env}-container deploys: keeps old export alive, adds SSM param + new auto-export for the param name.
  • isol8-{env}-service deploys: drops old Fn::ImportValue on the task-def ARN, adds new Fn::ImportValue on the SSM param name (stable string, immune to re-lock), puts ECS_TASK_DEFINITION in secrets.ValueFrom.

Follow-ups

  • PR-B (small): bump openclaw-version.json prod.tag back to "2026.4.5-bf9f699". With no consumer importing the task-def-revision export, container-stack can update the task-def freely and the SSM param value tracks the new ARN. New provisions land on the extended image with clawhub.
  • Cleanup PR: remove this.exportValue(...) once both envs have settled.
  • Fleet rollout: POST /container/updates with owner_id:"all" to queue banners for existing users.

Test plan

  • Local npx cdk synth is clean (verified; SSM param resource + new auto-export visible in cdk.out/assembly-prod/prodisol8prodcontainer*.template.json).
  • isol8-prod-container deploys — no UPDATE_ROLLBACK.
  • isol8-prod-service deploys — backend task-def env becomes secrets.ECS_TASK_DEFINITION.ValueFrom → SSM ARN.
  • Backend restart reads SSM; describe_task_definition with the ARN still resolves.
  • aws ssm get-parameter --name /isol8/prod/openclaw-task-def-arn returns the current task-def ARN.

🤖 Generated with Claude Code

…ross-stack lock)

Replaces the cross-stack Fn::ImportValue on the task-def revision ARN
with an SSM parameter that the backend reads via ecs.Secret.fromSsmParameter.
This is the durable fix for the 2026-04-20 "export is in use" deploy lock:
the Fn::ImportValue referenced a value that changed on every task-def
revision, so CFN blocked the producer's export update while the consumer
still imported the old value.

Changes:
- container-stack.ts: write the task-def ARN to
  /isol8/{env}/openclaw-task-def-arn on every deploy.
- service-stack.ts: drop ECS_TASK_DEFINITION from `environment`; add to
  `secrets` as Secret.fromSsmParameter. ECS resolves the SSM value at
  container start, so a backend task refresh always sees the current
  CDK-managed revision.
- isol8-stage.ts / local-stage.ts: wire the SSM param through the
  container props.

Transitional wart: container-stack calls this.exportValue() on the
task-def ARN to keep the auto-generated cross-stack export alive through
this deploy. Prod-service's LIVE template still holds an Fn::ImportValue
on it, and without exportValue() CDK would emit a delete that CFN
rejects (consumer still importing in live state). Remove the
exportValue() call in a cleanup PR after both dev and prod have
deployed past the import drop.

The SSM param exposes its NAME cross-stack (a stable string — the param
name itself, "/isol8/{env}/openclaw-task-def-arn"). That value never
changes across task-def revisions, so the new cross-stack reference
won't re-create the same lock.

Follow-up: bump openclaw-version.json prod.tag back to
"2026.4.5-bf9f699" in a separate PR. With no consumer importing the
task-def-revision export anymore, container-stack can freely update the
task-def resource and the SSM param value tracks the new ARN.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@prez2307 prez2307 merged commit 28256f2 into main Apr 21, 2026
prez2307 added a commit that referenced this pull request Apr 21, 2026
Re-applies PR #323 now that PR #329 decoupled the task-def ARN from a
cross-stack Fn::ImportValue via SSM. container-stack can freely register
a new task-def revision with the extended OpenClaw image — the SSM param
value tracks the new ARN automatically, and no consumer imports the
revision-embedded export anymore.

After deploy:
- New provisions land on the extended image (clawhub baked in).
- Existing per-user services still launch from the task-def revisions
  they were registered against at provision time. Roll them forward via
  POST /container/updates with owner_id:"all" (banner + Update Now), or
  force-apply per-owner in a follow-up.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant