feat: add dstack-ingress for custom domain TLS (CPL-118)#153
feat: add dstack-ingress for custom domain TLS (CPL-118)#153
Conversation
Add dstack-ingress to docker-compose.phala.yml behind a `custom-domain` profile so it is opt-in and does not affect current deployments. When activated, dstack-ingress terminates TLS inside the TEE with an attestation-bound certificate covering both the per-node domain and the shared api.dev.litprotocol.com ALIAS_DOMAIN (SAN). Uses Route 53 for DNS-01 ACME challenges. Blocked on dstack-examples PR #83 (ALIAS_DOMAIN support). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CPL-118 Add dstack-ingress to docker-compose.phala.yml
Add dstack-ingress service to Uses stock dstack-ingress image — Route 53 is natively supported for DNS-01 ACME challenges. Requires dstack-examples PR #83 to be merged for ALIAS_DOMAIN support. PR #83 usage: We use dstack-ingress automatically handles:
dstack-ingress:
image: dstacktee/dstack-ingress:latest
ports: ["443:443"]
environment:
DOMAIN: "${NODE_DOMAIN}"
ALIAS_DOMAIN: "api.dev.litprotocol.com"
DNS_PROVIDER: "route53"
TARGET_ENDPOINT: "http://lit-api-server:8000"
CERTBOT_EMAIL: "${CERTBOT_EMAIL}"
SET_CAA: "true"
AWS_ACCESS_KEY_ID: "${AWS_ACCESS_KEY_ID}"
AWS_SECRET_ACCESS_KEY: "${AWS_SECRET_ACCESS_KEY}"
# ROUTE53_INITIAL_WEIGHT intentionally NOT set — NLB handles routing
volumes:
- /var/run/dstack.sock:/var/run/dstack.sock
- cert-data:/etc/letsencrypt
restart: unless-stoppedAlso add Blocked on: dstack-examples PR #83 being merged (adds ALIAS_DOMAIN support to dstack-ingress). |
cl4wb0t
left a comment
There was a problem hiding this comment.
Review
Compose profile approach is the right call — keeps dstack-ingress opt-in without affecting existing deployments. A few things to flag:
1. image: dstacktee/dstack-ingress:latest — mutable tag
The rest of the compose file uses @sha256: digest-pinned images (the header comment calls this out as DR-1.1/DR-1.2). :latest breaks that provenance model. Fine while blocked on PR #83, but should be pinned to a digest before production.
2. ALIAS_DOMAIN hardcoded to api.dev.litprotocol.com
This won't work for chipotle-next (different domain). Should this be ${ALIAS_DOMAIN} passed via the deploy workflow, like NODE_DOMAIN?
3. Deploy workflow not updated
The deploy step does sed substitutions for image placeholders and GCP_PROJECT_ID. With the custom-domain profile, the workflow will eventually need to:
- Pass
--profile custom-domainto activate the service - Provide
NODE_DOMAIN,CERTBOT_EMAIL,AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEYas secrets
Probably intentional (blocked on PR #83), but worth noting.
4. No health check on dstack-ingress
If dstack-ingress takes time to get its cert from Let's Encrypt, the NLB could start routing :443 traffic before TLS is ready. A healthcheck on the container (e.g., checking nginx is listening or cert exists) might be worth adding.
5. cert-data volume 👍
Persists certs across container restarts so Let's Encrypt rate limits aren't hit on redeploys.
cl4wb0t
left a comment
There was a problem hiding this comment.
Correction on point 4: Ignore the health check comment — NLB health checks hit lit-api-server on :8000 directly (via /health), not through dstack-ingress on :443. dstack-ingress cert readiness doesn't affect health check availability.
cl4wb0t
left a comment
There was a problem hiding this comment.
Strike the correction, point 4 stands. The whole point of dstack-ingress is that it's the front door — NLB routes real traffic through :443 via dstack-ingress. Health checks should verify the same path real requests take. If dstack-ingress is down but lit-api-server is healthy on :8000, bypassing ingress gives a false positive.
Once the custom-domain profile is active, the NLB health check needs to go through :443, which means dstack-ingress must be ready (cert issued, nginx listening) before the node is marked healthy. A container-level healthcheck that verifies this would prevent routing traffic to nodes where cert issuance is still in progress.
cl4wb0t
left a comment
There was a problem hiding this comment.
Final correction on point 4: it's invalid, disregard. If the NLB health check goes through :443, the request traverses dstack-ingress before reaching /health. If dstack-ingress isn't up (no cert, nginx not listening), the TCP connection fails and the NLB marks the node unhealthy automatically. dstack-ingress being healthy is already an implicit prerequisite for the health probe to succeed — no separate container healthcheck needed.
The inline comment referenced the wrong ticket. The :latest tag didn't exist on Docker Hub and violated the file's own immutable-image policy — pin to 1.4@sha256 instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
dstack-ingressservice todocker-compose.phala.ymlbehindprofiles: ["custom-domain"]so current deployments are unaffectedNODE_DOMAINand sharedapi.dev.litprotocol.com(ALIAS_DOMAIN SAN)cert-datavolume for Let's Encrypt cert persistenceNODE_DOMAIN,CERTBOT_EMAIL,AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEYBlocked on: dstack-examples PR #86 (ALIAS_DOMAIN support)
Non-disruptive merge reasoning
This PR can be safely merged before R53 delegation and custom domain DNS are in place:
profiles: ["custom-domain"]and will not start unless explicitly activated withdocker compose --profile custom-domain up. Standarddocker compose up(used by current Phala deploys) ignores profiled services entirely.cert-datavolume is only mounted by dstack-ingress, which doesn't start without the profile.sedimage substitution step only touches${DOCKER_IMAGE_*}placeholders;dstack-ingressuses a literal image reference.Test plan
docker compose -f docker-compose.phala.yml configvalidates without errors (only expected env var warnings)docker compose up(no profile) does not start dstack-ingressdocker compose --profile custom-domain upstarts dstack-ingress (once PR use first api_payer for account_exists #86 lands and R53 is configured)Resolves CPL-118
🤖 Generated with Claude Code