Skip to content

feat: add dstack-ingress for custom domain TLS (CPL-118)#153

Closed
cl4wb0t wants to merge 2 commits intonextfrom
feature/cpl-118-add-dstack-ingress-to-docker-composephalayml
Closed

feat: add dstack-ingress for custom domain TLS (CPL-118)#153
cl4wb0t wants to merge 2 commits intonextfrom
feature/cpl-118-add-dstack-ingress-to-docker-composephalayml

Conversation

@cl4wb0t
Copy link
Copy Markdown
Collaborator

@cl4wb0t cl4wb0t commented Mar 20, 2026

Summary

  • Add dstack-ingress service to docker-compose.phala.yml behind profiles: ["custom-domain"] so current deployments are unaffected
  • dstack-ingress terminates TLS inside the TEE with an attestation-bound cert covering both the per-node NODE_DOMAIN and shared api.dev.litprotocol.com (ALIAS_DOMAIN SAN)
  • Uses Route 53 for DNS-01 ACME challenges (natively supported by dstack-ingress)
  • Adds cert-data volume for Let's Encrypt cert persistence
  • Documents new CVM secrets: NODE_DOMAIN, CERTBOT_EMAIL, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY

Blocked on: dstack-examples PR #86 (ALIAS_DOMAIN support)

Non-disruptive merge reasoning

This PR can be safely merged before R53 delegation and custom domain DNS are in place:

  1. Profile gating — dstack-ingress is behind profiles: ["custom-domain"] and will not start unless explicitly activated with docker compose --profile custom-domain up. Standard docker compose up (used by current Phala deploys) ignores profiled services entirely.
  2. No changes to existing services — lit-api-server, lit-actions, otel-collector, and lit-static are untouched.
  3. Volume addition is inert — the new cert-data volume is only mounted by dstack-ingress, which doesn't start without the profile.
  4. Deploy workflow unaffected — the sed image substitution step only touches ${DOCKER_IMAGE_*} placeholders; dstack-ingress uses a literal image reference.

Test plan

  • docker compose -f docker-compose.phala.yml config validates without errors (only expected env var warnings)
  • Verify docker compose up (no profile) does not start dstack-ingress
  • Verify docker compose --profile custom-domain up starts dstack-ingress (once PR use first api_payer for account_exists #86 lands and R53 is configured)

Resolves CPL-118

🤖 Generated with Claude Code

Add dstack-ingress to docker-compose.phala.yml behind a `custom-domain`
profile so it is opt-in and does not affect current deployments. When
activated, dstack-ingress terminates TLS inside the TEE with an
attestation-bound certificate covering both the per-node domain and the
shared api.dev.litprotocol.com ALIAS_DOMAIN (SAN). Uses Route 53 for
DNS-01 ACME challenges.

Blocked on dstack-examples PR #83 (ALIAS_DOMAIN support).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@linear
Copy link
Copy Markdown

linear bot commented Mar 20, 2026

CPL-118 Add dstack-ingress to docker-compose.phala.yml

Add dstack-ingress service to docker-compose.phala.yml with ALIAS_DOMAIN + Route 53 configuration.

Uses stock dstack-ingress image — Route 53 is natively supported for DNS-01 ACME challenges. Requires dstack-examples PR #83 to be merged for ALIAS_DOMAIN support.

PR #83 usage: We use ALIAS_DOMAIN for the SAN cert + nginx server_name + shared TXT append. We do NOT set ROUTE53_INITIAL_WEIGHT — the weighted CNAME routing is irrelevant since our NLB handles traffic distribution.

dstack-ingress automatically handles:

  • Cert issuance with both DOMAIN and ALIAS_DOMAIN as SANs (DNS-01 via Route 53)
  • nginx server_name for both domains
  • Per-node DNS record (nodeN.api.dev.litprotocol.com → Phala gateway CNAME)
  • Shared attestation TXT record append (_dstack-app-address.api.dev.litprotocol.com)
dstack-ingress:
  image: dstacktee/dstack-ingress:latest
  ports: ["443:443"]
  environment:
    DOMAIN: "${NODE_DOMAIN}"
    ALIAS_DOMAIN: "api.dev.litprotocol.com"
    DNS_PROVIDER: "route53"
    TARGET_ENDPOINT: "http://lit-api-server:8000"
    CERTBOT_EMAIL: "${CERTBOT_EMAIL}"
    SET_CAA: "true"
    AWS_ACCESS_KEY_ID: "${AWS_ACCESS_KEY_ID}"
    AWS_SECRET_ACCESS_KEY: "${AWS_SECRET_ACCESS_KEY}"
    # ROUTE53_INITIAL_WEIGHT intentionally NOT set — NLB handles routing
  volumes:
    - /var/run/dstack.sock:/var/run/dstack.sock
    - cert-data:/etc/letsencrypt
  restart: unless-stopped

Also add cert-data volume definition.

Blocked on: dstack-examples PR #83 being merged (adds ALIAS_DOMAIN support to dstack-ingress).

Copy link
Copy Markdown
Collaborator Author

@cl4wb0t cl4wb0t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review

Compose profile approach is the right call — keeps dstack-ingress opt-in without affecting existing deployments. A few things to flag:

1. image: dstacktee/dstack-ingress:latest — mutable tag

The rest of the compose file uses @sha256: digest-pinned images (the header comment calls this out as DR-1.1/DR-1.2). :latest breaks that provenance model. Fine while blocked on PR #83, but should be pinned to a digest before production.

2. ALIAS_DOMAIN hardcoded to api.dev.litprotocol.com

This won't work for chipotle-next (different domain). Should this be ${ALIAS_DOMAIN} passed via the deploy workflow, like NODE_DOMAIN?

3. Deploy workflow not updated

The deploy step does sed substitutions for image placeholders and GCP_PROJECT_ID. With the custom-domain profile, the workflow will eventually need to:

  • Pass --profile custom-domain to activate the service
  • Provide NODE_DOMAIN, CERTBOT_EMAIL, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY as secrets

Probably intentional (blocked on PR #83), but worth noting.

4. No health check on dstack-ingress

If dstack-ingress takes time to get its cert from Let's Encrypt, the NLB could start routing :443 traffic before TLS is ready. A healthcheck on the container (e.g., checking nginx is listening or cert exists) might be worth adding.

5. cert-data volume 👍

Persists certs across container restarts so Let's Encrypt rate limits aren't hit on redeploys.

Copy link
Copy Markdown
Collaborator Author

@cl4wb0t cl4wb0t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correction on point 4: Ignore the health check comment — NLB health checks hit lit-api-server on :8000 directly (via /health), not through dstack-ingress on :443. dstack-ingress cert readiness doesn't affect health check availability.

Copy link
Copy Markdown
Collaborator Author

@cl4wb0t cl4wb0t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strike the correction, point 4 stands. The whole point of dstack-ingress is that it's the front door — NLB routes real traffic through :443 via dstack-ingress. Health checks should verify the same path real requests take. If dstack-ingress is down but lit-api-server is healthy on :8000, bypassing ingress gives a false positive.

Once the custom-domain profile is active, the NLB health check needs to go through :443, which means dstack-ingress must be ready (cert issued, nginx listening) before the node is marked healthy. A container-level healthcheck that verifies this would prevent routing traffic to nodes where cert issuance is still in progress.

Copy link
Copy Markdown
Collaborator Author

@cl4wb0t cl4wb0t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final correction on point 4: it's invalid, disregard. If the NLB health check goes through :443, the request traverses dstack-ingress before reaching /health. If dstack-ingress isn't up (no cert, nginx not listening), the TCP connection fails and the NLB marks the node unhealthy automatically. dstack-ingress being healthy is already an implicit prerequisite for the health probe to succeed — no separate container healthcheck needed.

The inline comment referenced the wrong ticket. The :latest tag
didn't exist on Docker Hub and violated the file's own immutable-image
policy — pin to 1.4@sha256 instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants