Skip to content

ci(workers): publish SHA-tagged workers image alongside :latest-workers#238

Merged
therealbrad merged 1 commit intomainfrom
ci/sha-tagged-workers-images
Apr 23, 2026
Merged

ci(workers): publish SHA-tagged workers image alongside :latest-workers#238
therealbrad merged 1 commit intomainfrom
ci/sha-tagged-workers-images

Conversation

@therealbrad
Copy link
Copy Markdown
Contributor

Description

Publishes a SHA-pinned workers image tag on every build so the k8s Deployment can stop pointing at the floating :latest-workers tag.

Background. On 2026-04-22, rollback of the multitenant-workers crashloop was effectively impossible: every revision in kubectl rollout history pointed at the same tag (:latest-workers), and the broken image had already overwritten the good one in GHCR. "Undo" had nowhere to roll back to.

What this PR does — in .github/workflows/release.yml:

  • Derives SHORT_SHA=$(echo "${{ github.sha }}" | cut -c1-8) inside each build job.
  • Adds one extra tag to the workers image in all 4 build jobs (tag-push + manual-dispatch, amd64 + arm64):
    • ghcr.io/testplanit/testplanit:workers-sha-<short-sha>-<arch>
  • Adds a multi-arch manifest for the SHA tag in both merge jobs:
    • ghcr.io/testplanit/testplanit:workers-sha-<short-sha> (manifest list over amd64 + arm64)

Nothing is removed. :latest-workers and :${VERSION}-workers are still published and :latest-workers is still retagged by the existing "is this the newest semver" logic. This PR is purely additive.

What is NOT in this PR

The k8s Deployment manifest that references :latest-workers lives in the private-ops repo, not here (see the comment block in testplanit/k8s/multitenant-workers-rbac.yaml — it describes this split). A companion PR in private-ops is needed to change the Deployment's image: to :workers-sha-<short-sha>, parameterized per deploy.

Intermediate state while that PR is pending: this repo publishes the SHA tag but nothing consumes it yet. That's safe — the existing :latest-workers path is unchanged, so no behavior differs until ops flips the Deployment.

Related Issue

Follow-up to hotfix c804cac (workers crashloop 2026-04-22 16:30 UTC). Third of three post-mortem follow-ups:

Type of Change

  • Bug fix (non-breaking change that fixes an issue)

Technically a CI/ops-infrastructure fix. No runtime code changes.

How Has This Been Tested?

  • Unit tests
  • Integration tests
  • E2E tests
  • Manual testing

The workflow file itself was validated:

  • YAML parses cleanly (js-yaml load, six jobs present).
  • All 4 build jobs set SHORT_SHA and add a workers-sha-${SHORT_SHA}-<arch> tag to the bake --set.
  • Both merge jobs create a multi-arch manifest list for :workers-sha-${SHORT_SHA}.

The first real tag-push after this merges will publish the new tag as a side effect — that's the soonest a live run happens. The changes are additive and the tag-creation logic mirrors the established shape of the existing version/workers tags, so risk of breaking the current release path is low.

Test Configuration:

  • N/A — workflow file change only.

Checklist

  • My code follows the project's style guidelines
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation (inline in the workflow header)
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published
  • I have signed the CLA

Additional Notes

Tag format choice. Short 8-char SHA (e.g., workers-sha-c804cace) matches the example in the post-mortem and the convention elsewhere in the project. Full 40-char SHAs are unambiguous but unwieldy as image tags.

Why workers only, not production. The 2026-04-22 incident was workers-specific and the user's post-mortem scoped this PR to workers. The production image has the same failure mode in principle (:latest is also floating), but extending the SHA-tag convention to production is a separate decision with its own deploy-flow implications. Happy to open a follow-up if ops wants it.

Rebase note. #237 also touches release.yml (adding smoke-test steps to the same build jobs). The two PRs are logically independent; whichever merges first, the second needs a trivial rebase — edits are in adjacent-but-not-overlapping blocks.

Every workers image build now also publishes
`ghcr.io/testplanit/testplanit:workers-sha-<short-sha>` (8-char git SHA).
The existing :latest-workers alias is preserved for convenience, but
the multitenant-workers Deployment should pin to the SHA tag so
`kubectl rollout undo` has real history to roll back to — during the
2026-04-22 incident, rollback was impossible because every Deployment
revision pointed at the same floating :latest-workers tag, so the
broken image had already overwritten the good one in the registry.

Changes in .github/workflows/release.yml:
- build-amd64 / build-arm64 (tag-push): add
  --set workers.tags=...:workers-sha-${SHORT_SHA}-<arch>
- docker-manual-amd64 / docker-manual-arm64: same
- merge-manifests / docker-manual-merge: create multi-arch manifest
  ghcr.io/testplanit/testplanit:workers-sha-${SHORT_SHA}

Companion work (NOT in this PR, since the manifest lives elsewhere):
the multitenant-workers Deployment in the private-ops repo must be
updated to reference :workers-sha-<short-sha> instead of :latest-workers.
Until that PR lands, this repo publishes the SHA tag but nothing
consumes it — safe no-op in that intermediate state.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@therealbrad therealbrad merged commit 0efca15 into main Apr 23, 2026
5 checks passed
@therealbrad therealbrad deleted the ci/sha-tagged-workers-images branch April 23, 2026 09:54
@therealbrad
Copy link
Copy Markdown
Contributor Author

🎉 This PR is included in version 0.22.8 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant