Skip to content

Ensure build instance replacement on rack version updates#3780

Open
ntner wants to merge 1 commit intomasterfrom
build-instance-update-replace
Open

Ensure build instance replacement on rack version updates#3780
ntner wants to merge 1 commit intomasterfrom
build-instance-update-replace

Conversation

@ntner
Copy link
Copy Markdown
Contributor

@ntner ntner commented Mar 18, 2026

Summary

  • Add Version parameter reference to BuildLaunchTemplate UserData to guarantee a new LaunchTemplate version is created on every rack update
  • Eliminates a race condition where the rack API receives new TLS certificates before the build instance is replaced, causing a tls: unknown certificate authority error window during updates

Background

During a rack version update, self-signed Docker TLS certificates are regenerated by a CloudFormation custom resource Lambda. The rack API ECS tasks restart with the new certificates quickly, but the build EC2 instance replacement depends on CloudFormation detecting a change in the BuildLaunchTemplate UserData (via custom resource output propagation). If that propagation chain doesn't produce a detectable change, the ASG rolling update is not triggered, leaving the build instance running with stale certificates.

Even when the rolling update IS triggered, the API tasks can restart with new certificates before the build instance replacement completes, creating a failure window where builds fail with:

remote error: tls: unknown certificate authority

Change

One line added to BuildLaunchTemplate UserData in provider/aws/formation/rack.json:

"# rack-version: ", { "Ref": "Version" }, "\n",

This is a cloud-config comment with no effect on instance boot behavior. Because { "Ref": "Version" } resolves to a different value on every rack version update, the UserData content always changes, which forces a new LaunchTemplate version, which triggers the AutoScalingRollingUpdate policy on the BuildInstances ASG.

@ntner ntner requested a review from nightfury1204 March 18, 2026 15:37
ntner added a commit that referenced this pull request Mar 26, 2026
## Summary

- Add `Version` parameter reference to `BuildLaunchTemplate` UserData to guarantee a new LaunchTemplate version is created on every rack update
- Eliminates a race condition where the rack API receives new TLS certificates before the build instance is replaced, causing a `tls: unknown certificate authority` error window during updates

## Background

During a rack version update, self-signed Docker TLS certificates are regenerated by a CloudFormation custom resource Lambda. The rack API ECS tasks restart with the new certificates quickly, but the build EC2 instance replacement depends on CloudFormation detecting a change in the `BuildLaunchTemplate` UserData (via custom resource output propagation). If that propagation chain doesn't produce a detectable change, the ASG rolling update is not triggered, leaving the build instance running with stale certificates.

Even when the rolling update IS triggered, the API tasks can restart with new certificates before the build instance replacement completes, creating a failure window where builds fail with:

```
remote error: tls: unknown certificate authority
```

## Change

One line added to `BuildLaunchTemplate` UserData in `provider/aws/formation/rack.json`:

```json
"# rack-version: ", { "Ref": "Version" }, "\n",
```

This is a cloud-config comment with no effect on instance boot behavior. Because `{ "Ref": "Version" }` resolves to a different value on every rack version update, the UserData content always changes, which forces a new LaunchTemplate version, which triggers the `AutoScalingRollingUpdate` policy on the `BuildInstances` ASG.
@ntner ntner mentioned this pull request Mar 26, 2026
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant