Skip to content

Add DEPLOY.md — document deploy process and infrastructure#24

Merged
DominicBM merged 4 commits into
mainfrom
add-deploy-docs
Apr 8, 2026
Merged

Add DEPLOY.md — document deploy process and infrastructure#24
DominicBM merged 4 commits into
mainfrom
add-deploy-docs

Conversation

@DominicBM
Copy link
Copy Markdown
Contributor

@DominicBM DominicBM commented Apr 7, 2026

Summary

  • Documents the two-phase deploy process (ECR build via GH Actions → CodePipeline)
  • Describes blue/green CodeDeploy setup with auto-rollback
  • Documents 90s ALB slow-start on thumbnail-api-tg-blue and thumbnail-api-tg-green (prevents cold-start errors on new tasks)
  • References ~/bin/deploy-api-services as the preferred way to deploy both API services together in one maintenance window

Test plan

  • Verify infrastructure reference table matches current AWS config
  • Run a deploy using the documented manual steps and confirm they work

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Documentation
    • Added a comprehensive manual deployment guide for the Thumbnail API on AWS ECS Fargate. Covers manual build-and-publish of container images, image verification, triggering and monitoring pipeline deployments, observed runtime behaviors (blue/green traffic shifting with automatic rollback and ALB slow-start), post-deploy health-check expectations (HTTP 200) and an infrastructure reference table for deployment components.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 7, 2026

Walkthrough

Adds DEPLOY.md, a new manual two‑phase AWS ECS Fargate deployment guide for the DPLA Thumbnail API covering GitHub Actions image publish, ECR verification, manual CodePipeline start, CodeDeploy blue/green behavior, ALB health checks, and an infrastructure reference.

Changes

Cohort / File(s) Summary
Deployment Documentation
DEPLOY.md
Adds a manual two‑phase deployment guide: manual dispatch of ecr.yml to build/publish multi-arch images and upload Sentry source maps; verify images with aws ecr describe-images; start thumbnail-api-pipeline via aws codepipeline start-pipeline-execution; monitor stages with aws codepipeline get-pipeline-state; notes on CodeDeploy blue/green (ECSAllAtOnce) with automatic rollback, ALB target-group 90s slow start, curl health-check expectations and failure mappings; includes infrastructure reference (repos, workflow, ECR, pipeline/CodeBuild/CodeDeploy IDs, ECS cluster/service, task count, ALB target groups, deployment type, region).

Sequence Diagram(s)

sequenceDiagram
    participant Dev as Developer
    participant GH as GitHub Actions (ecr.yml)
    participant Sentry as Sentry
    participant ECR as Amazon ECR
    participant CP as AWS CodePipeline
    participant CD as AWS CodeDeploy (ECS)
    participant ECS as ECS Fargate
    participant ALB as ALB
    participant Client as Client

    Dev->>GH: manual dispatch "push to ecr: production"
    GH->>Sentry: upload source maps
    GH->>ECR: push multi-arch Docker image (tags: latest, branch, sha)
    GH->>Dev: post-build verification (aws ecr describe-images)
    Dev->>CP: aws codepipeline start-pipeline-execution (thumbnail-api-pipeline)
    CP->>ECR: fetch image/artifact metadata
    CP->>CD: trigger blue/green deploy (ECSAllAtOnce)
    CD->>ECS: deploy new task set (green)
    ECS->>ALB: register green targets (90s slow start)
    ALB->>ECS: health checks
    Client->>ALB: curl /health
    ALB-->>Client: 200 OK or 404/502/400 (failure indicators)
Loading

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely describes the primary change: adding a DEPLOY.md file that documents the deployment process and infrastructure for the service.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch add-deploy-docs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Documents the manual two-phase deploy (ECR build via GH Actions, then
CodePipeline), blue/green CodeDeploy setup, 90s ALB slow-start on both
target groups, and the preferred combined deploy script for deploying
alongside the DPLA API in one maintenance window.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@DEPLOY.md`:
- Around line 94-95: Update the DEPLOY.md health-check text for the /thumb/*
endpoint: replace the phrase "Expect HTTP 200 or 302 (redirect to upstream image
URL)" with "Expect HTTP 200" and mention that failures may return 404, 502, or
400 depending on path/input; ensure the doc reflects that upstream redirects are
translated to 502 rather than returning 302 for /thumb/*.
- Around line 79-83: The doc claims specific deployment settings (ECSAllAtOnce,
5-minute blue task termination, 90-second slow start for
thumbnail-api-tg-blue/thumbnail-api-tg-green, and auto-rollback) but there is no
source-of-truth in code; update either by (a) adding a link to the authoritative
IaC or AWS resource (CloudFormation/Terraform stack or console permalink) with
last-verified date that shows these settings (referencing appspec_template.yaml
and the ALB target groups thumbnail-api-tg-blue/thumbnail-api-tg-green and
deployment config ECSAllAtOnce), or (b) change the text to state these were
"verified in AWS console on [date]" and note that appspec_template.yaml does not
enforce them; also apply the same change for the similar statements in the later
block (lines 110–112).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 57c6bd77-d913-42e0-a225-c76d3fb85bb0

📥 Commits

Reviewing files that changed from the base of the PR and between b9993eb and 73c745a.

📒 Files selected for processing (1)
  • DEPLOY.md

Comment thread DEPLOY.md Outdated
Comment thread DEPLOY.md Outdated
- Fix health check: service never returns 302 (fetch follows redirects
  internally); document actual failure codes (404, 502, 400)
- Add verification dates for blue/green settings (ECSAllAtOnce, 5-min
  blue termination, 90s slow start, auto-rollback) — these are managed
  in AWS console, not enforced by appspec_template.yaml

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@DominicBM
Copy link
Copy Markdown
Contributor Author

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 8, 2026

✅ Actions performed

Full review triggered.

@DominicBM
Copy link
Copy Markdown
Contributor Author

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 8, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
DEPLOY.md (2)

89-95: Use /health as the primary readiness check; keep /thumb/* as functional smoke test.

The current check relies on a specific content ID that can legitimately change behavior over time, which can cause noisy false alarms after deploy.

Suggested doc tweak
+curl -sf "https://thumb.dp.la/health" | jq .
+# Expect HTTP 200
+
 curl -sf -o /dev/null -w "HTTP %{http_code}\n" \
   "https://thumb.dp.la/thumb/f293d15b0515ac8a5478cbd9c02af79c"
 
-Expect HTTP 200. Failures return 404 (item not found or upstream 404/410), 502 (other upstream error), or 400 (malformed ID). The service follows upstream redirects internally — 302 is never returned to clients.
+Then run the `/thumb/*` call as a functional smoke test.
+Expect HTTP 200 for a healthy hit; failures may return 404/502/400 depending on item/upstream/input. The service follows upstream redirects internally — 302 is never returned to clients.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@DEPLOY.md` around lines 89 - 95, Update the deployment docs to use /health as
the primary readiness check and keep the existing /thumb/* example as a
secondary functional smoke test: replace the curl readiness example that targets
the specific content ID URI
("https://thumb.dp.la/thumb/f293d15b0515ac8a5478cbd9c02af79c") with a curl
against /health, document that /health should return 200 for readiness, and
retain the current /thumb/* curl (and its expected HTTP codes/behaviour) as a
separate smoke-test paragraph so operators use /health for automated readiness
and /thumb/* only for manual functional verification.

11-15: Document where ~/bin/deploy-api-services comes from.

This path is machine-local and may not exist for other operators. Add a source/location (repo path or setup doc) to keep this runbook reproducible.

Suggested doc tweak
 ~/bin/deploy-api-services          # deploy both
 ~/bin/deploy-api-services thumb    # thumbnail-api only
 ~/bin/deploy-api-services api      # api only
+
+If this script is not installed locally, follow: <link to script/setup instructions>.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@DEPLOY.md` around lines 11 - 15, The README currently references the
machine-local command `~/bin/deploy-api-services` without saying where that
script comes from; update DEPLOY.md to state the source/location of that script
(e.g., repository path or setup doc that installs it), include the repo file or
package name (or the setup script that places `deploy-api-services` into
`~/bin`), and add brief install or symlink instructions so operators can
reproduce `~/bin/deploy-api-services` on new machines (mention the exact
filename `deploy-api-services` and any required permissions or environment
variables).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@DEPLOY.md`:
- Around line 89-95: Update the deployment docs to use /health as the primary
readiness check and keep the existing /thumb/* example as a secondary functional
smoke test: replace the curl readiness example that targets the specific content
ID URI ("https://thumb.dp.la/thumb/f293d15b0515ac8a5478cbd9c02af79c") with a
curl against /health, document that /health should return 200 for readiness, and
retain the current /thumb/* curl (and its expected HTTP codes/behaviour) as a
separate smoke-test paragraph so operators use /health for automated readiness
and /thumb/* only for manual functional verification.
- Around line 11-15: The README currently references the machine-local command
`~/bin/deploy-api-services` without saying where that script comes from; update
DEPLOY.md to state the source/location of that script (e.g., repository path or
setup doc that installs it), include the repo file or package name (or the setup
script that places `deploy-api-services` into `~/bin`), and add brief install or
symlink instructions so operators can reproduce `~/bin/deploy-api-services` on
new machines (mention the exact filename `deploy-api-services` and any required
permissions or environment variables).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b3751911-060d-40af-b265-2be279aceb34

📥 Commits

Reviewing files that changed from the base of the PR and between b9993eb and 3021d09.

📒 Files selected for processing (1)
  • DEPLOY.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@DominicBM
Copy link
Copy Markdown
Contributor Author

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 8, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
DEPLOY.md (1)

89-97: Add a deterministic /health check before /thumb/* smoke test.

Line 92’s /thumb/* check can fail for upstream/content reasons even when deployment is healthy. Recommend documenting /health as the primary readiness check, then /thumb/* as functional smoke coverage.

Suggested doc update
 ## Post-deploy health check
 
 ```bash
+curl -sf -o /dev/null -w "HTTP %{http_code}\n" "https://thumb.dp.la/health"
+```
+
+Expect HTTP 200.
+
+Functional smoke check:
+
+```bash
 curl -sf -o /dev/null -w "HTTP %{http_code}\n" \
   "https://thumb.dp.la/thumb/f293d15b0515ac8a5478cbd9c02af79c"

-Expect HTTP 200. Failures return 404 (item not found or upstream 404/410), 502 (other upstream error), or 400 (malformed ID). The service follows upstream redirects internally — 302 is never returned to clients.
+For /thumb/*, expect HTTP 200 on success. Failures return 404 (item not found or upstream 404/410), 502 (other upstream error), or 400 (malformed ID). The service follows upstream redirects internally — 302 is never returned to clients.

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @DEPLOY.md around lines 89 - 97, Update DEPLOY.md to add a deterministic
readiness check: document a primary curl against "https://thumb.dp.la/health"
that should return HTTP 200, and keep the existing "/thumb/*" curl as a
secondary functional smoke test; ensure the doc shows the /health check first
with its expected 200 response and then the /thumb/f293... example with the
existing explanation of possible 404/502/400 outcomes and the note about
upstream redirects.


</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @DEPLOY.md:

  • Around line 39-48: The doc currently verifies the image using a mutable tag
    (imageTag=latest) which risks checking the wrong image; update the aws ecr
    describe-images invocation for repository "thumbnail-api" to use the immutable
    commit SHA tag (replace imageTag=latest with imageTag=<COMMIT_SHA>) and expand
    the --query to return pushedAt, imageDigest and imageTags (for example change
    --query 'imageDetails[0].imagePushedAt' to --query
    'imageDetails[0].{pushedAt:imagePushedAt,digest:imageDigest,tags:imageTags}') so
    the verification checks the exact image pushed by the workflow.

Nitpick comments:
In @DEPLOY.md:

  • Around line 89-97: Update DEPLOY.md to add a deterministic readiness check:
    document a primary curl against "https://thumb.dp.la/health" that should return
    HTTP 200, and keep the existing "/thumb/*" curl as a secondary functional smoke
    test; ensure the doc shows the /health check first with its expected 200
    response and then the /thumb/f293... example with the existing explanation of
    possible 404/502/400 outcomes and the note about upstream redirects.

</details>

<details>
<summary>🪄 Autofix (Beta)</summary>

Fix all unresolved CodeRabbit comments on this PR:

- [ ] <!-- {"checkboxId": "4b0d0e0a-96d7-4f10-b296-3a18ea78f0b9"} --> Push a commit to this branch (recommended)
- [ ] <!-- {"checkboxId": "ff5b1114-7d8c-49e6-8ac1-43f82af23a33"} --> Create a new PR with the fixes

</details>

---

<details>
<summary>ℹ️ Review info</summary>

<details>
<summary>⚙️ Run configuration</summary>

**Configuration used**: Organization UI

**Review profile**: CHILL

**Plan**: Pro

**Run ID**: `1d76e548-97c6-42c9-a85a-efc537e6f428`

</details>

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between b9993ebb32228edea585bce3291e552ecfba2a42 and 33abb6baac68780d9b3006dc2bfba8bc24c0558b.

</details>

<details>
<summary>📒 Files selected for processing (1)</summary>

* `DEPLOY.md`

</details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

Comment thread DEPLOY.md
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@DominicBM
Copy link
Copy Markdown
Contributor Author

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 8, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
DEPLOY.md (1)

39-50: ⚠️ Potential issue | 🟠 Major

Use commit SHA as the default verification path (not latest).

The runbook still defaults to a mutable tag, which can validate the wrong artifact if another run updates latest before Phase 2. Make SHA-tag verification the primary command, and keep latest only as optional context.

Suggested doc update
-Verify the new image after the action completes:
+Verify the exact image after the action completes (use the workflow commit SHA tag):

 ```bash
 aws ecr describe-images \
   --repository-name thumbnail-api \
-  --image-ids imageTag=latest \
+  --image-ids imageTag=<COMMIT_SHA> \
   --region us-east-1 \
-  --query 'imageDetails[0].imagePushedAt' \
+  --query 'imageDetails[0].{pushedAt:imagePushedAt,digest:imageDigest,tags:imageTags}' \
   --output text

-The timestamp should match the current deployment. If you need to confirm the exact image (e.g. if another workflow may have updated latest concurrently), use the commit SHA tag from the workflow run output instead of imageTag=latest.
+Use latest only for a quick sanity check, not as deployment proof.

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @DEPLOY.md around lines 39 - 50, Update the verification step so the default
uses the immutable commit SHA tag instead of the mutable imageTag=latest: in the
aws ecr describe-images command replace imageTag=latest with
imageTag=<COMMIT_SHA> (sourced from the workflow run) and expand the --query
from 'imageDetails[0].imagePushedAt' to return pushedAt, imageDigest and
imageTags (e.g.
imageDetails[0].{pushedAt:imagePushedAt,digest:imageDigest,tags:imageTags});
keep the imageTag=latest example only as an optional quick sanity check and
update the surrounding text to state that SHA is the primary verification
method.


</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In @DEPLOY.md:

  • Around line 39-50: Update the verification step so the default uses the
    immutable commit SHA tag instead of the mutable imageTag=latest: in the aws ecr
    describe-images command replace imageTag=latest with imageTag=<COMMIT_SHA>
    (sourced from the workflow run) and expand the --query from
    'imageDetails[0].imagePushedAt' to return pushedAt, imageDigest and imageTags
    (e.g.
    imageDetails[0].{pushedAt:imagePushedAt,digest:imageDigest,tags:imageTags});
    keep the imageTag=latest example only as an optional quick sanity check and
    update the surrounding text to state that SHA is the primary verification
    method.

</details>

---

<details>
<summary>ℹ️ Review info</summary>

<details>
<summary>⚙️ Run configuration</summary>

**Configuration used**: Organization UI

**Review profile**: CHILL

**Plan**: Pro

**Run ID**: `97df460c-554d-47d6-89e8-c0904de1be4c`

</details>

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between b9993ebb32228edea585bce3291e552ecfba2a42 and 156ebc91e07dd7acd5e5c3c193273755cb88edf2.

</details>

<details>
<summary>📒 Files selected for processing (1)</summary>

* `DEPLOY.md`

</details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

@DominicBM DominicBM merged commit d83e479 into main Apr 8, 2026
6 checks passed
@DominicBM DominicBM deleted the add-deploy-docs branch April 8, 2026 20:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant