Skip to content

fix(web): proxy /health through nginx to the api pod#13

Merged
acedatacloud-dev merged 1 commit into
mainfrom
fix/nginx-health-route
May 4, 2026
Merged

fix(web): proxy /health through nginx to the api pod#13
acedatacloud-dev merged 1 commit into
mainfrom
fix/nginx-health-route

Conversation

@acedatacloud-dev
Copy link
Copy Markdown
Member

Caught running the live stack: curl http://localhost:8080/health returned the SPA bundle, not the JSON liveness payload, because the nginx config didn't cover /health. Probes that compare body shape catch it; status-code-only probes don't. Add an exact-match location = /health proxy. Verified locally — JSON status payload now returned.

Caught while verifying the stack end-to-end:

  $ curl http://localhost:8080/health
  <!doctype html>...   <- SPA fallback returned the index page

The nginx config covered /api, /mcp, and /.well-known but NOT /health,
which meant any LB / ingress / curl probing /health on the web pod
got the SPA bundle and a misleading 200. Probes that compare body
shape (JSON vs HTML) catch this; probes that only look at status
code don't.

Add an exact-match `location = /health` that proxies to the api
service. Ad-hoc curl + ingress probes against the web pod now both
return the JSON status payload.
@acedatacloud-dev acedatacloud-dev merged commit 917fe3b into main May 4, 2026
1 check passed
@acedatacloud-dev acedatacloud-dev deleted the fix/nginx-health-route branch May 4, 2026 18:08
acedatacloud-dev added a commit that referenced this pull request May 4, 2026
…site is live (#14)

PR #11/#12/#13 shipped manifests modeled on a generic K8s setup. None of
those actually fit the AceDataCloud TKE cluster + nginx-router ingress
+ wildcard-cert convention, so when the user opened
https://x402guard.acedata.cloud/ they got a "Kubernetes Ingress
Controller Fake Certificate" + 404 (the LB had no rule for the host).

This PR aligns everything with the platform's conventions and the site
is now live at https://x402guard.acedata.cloud/ with a real Let's
Encrypt cert from the existing tls-wildcard-acedata-cloud secret.

Conventions adopted (matching Wisdom + Nexior + MCPs/* in this org):

  namespace                 acedatacloud (was: x402guard)
  ingress class             annotation kubernetes.io/ingress.class:
                            nginx-router (was: ingressClassName: nginx)
  TLS secret                tls-wildcard-acedata-cloud, already in the
                            cluster, signed *.acedata.cloud (was:
                            x402guard-tls + cert-manager annotation)
  image-pull secret         docker-registry, already in the namespace
                            (was: missing imagePullSecrets entirely)
  build tag                 ${TAG} substituted by sed in deploy/run.sh
                            (was: __BUILD__)
  service names             x402guard-api / x402guard-web — qualified
                            with project prefix to avoid colliding with
                            other tenants in acedatacloud namespace
                            (was: api / web)
  storage class             cbs-ssd (WaitForFirstConsumer, 10Gi minimum)
                            (was: cbs default — fails to bind because
                            cbs is Immediate-binding zone-pinned)

What changes:

  deploy/production/
    namespace.yaml             DELETED (use existing acedatacloud ns)
    configmap.yaml             DELETED (env values inlined into Deployment)
    api.yaml                   namespace + names + imagePullSecrets +
                               annotation; ${TAG} placeholder
    web.yaml                   same
    ingress.yaml               nginx-router annotation;
                               tls-wildcard-acedata-cloud;
                               5 path rules (/api, /mcp, /.well-known,
                               /health, /) all on a single Ingress
    postgres.yaml              NEW — single-replica StatefulSet on cbs-ssd
                               with a 10Gi PVC. POSTGRES_PASSWORD reads
                               from the same x402guard-secrets the api
                               consumes. Cluster has no shared Postgres
                               so x402guard hosts its own.

  deploy/run.sh                Sed ${TAG} -> $BUILD_NUMBER + apply 4 yaml
                               in order; rollout wait + /health probe.
                               Bails clearly if the secret is missing.

  docker-compose.yaml          Service names renamed
                               api -> x402guard-api / web -> x402guard-web
                               so the nginx upstream `x402guard-api`
                               works in both docker-compose and K8s
                               without separate configs.

  web/deploy/nginx.conf        proxy_pass updated to http://x402guard-api:8000
                               in all 4 locations.

Live verification (against https://x402guard.acedata.cloud/):

  $ curl -sS https://x402guard.acedata.cloud/health
    {"status":"ok","version":"0.1.0"}
  $ curl -sS https://x402guard.acedata.cloud/.well-known/x402guard
    {"service":"x402guard","version":"0.1.0","cluster":"mainnet",
     "agent_vault_program_id":"5s9rscxc...","usdc_mint":"EPjFWdd5..."}
  $ curl -sS https://x402guard.acedata.cloud/ | grep '<title>'
    <title>x402guard - Solana-native AI agent wallets</title>
  $ openssl s_client ... | openssl x509 -noout -subject -issuer
    subject=CN=acedata.cloud
    issuer=Let's Encrypt E8

  Pods (kubectl -n acedatacloud get pods -l app=x402guard):
    x402guard-api-79c7d796b7-cdlpd   1/1 Running
    x402guard-api-79c7d796b7-f9mpc   1/1 Running
    x402guard-postgres-0             1/1 Running
    x402guard-web-5869d7cd49-29772   1/1 Running
    x402guard-web-5869d7cd49-zvgcb   1/1 Running

Bugs caught while bringing the cluster live (not in this PR but worth
recording so the next deploy doesn't hit them again):

  - Initial image push was darwin/arm64 because docker compose build
    uses host arch on macOS. Cluster is amd64 -> CrashLoopBackOff with
    "exec format error". Fix: use docker buildx --platform linux/amd64.
    The CI workflow .github/workflows/deploy.yaml already does this
    via docker/build-push-action which defaults to linux/amd64, but
    the local-deploy fallback path needs the explicit platform flag.

  - cbs storage class is Immediate-binding zone-pinned and our cluster
    happened to have no spare capacity in the picked zone, so PVCs
    stayed Pending. cbs-ssd uses WaitForFirstConsumer and binds in
    the same zone the pod actually scheduled into.

  - cbs-ssd minimum disk size is 10Gi (Tencent Cloud limit). 5Gi
    requests fail with "disk size is invalid. Must in [10, 32000]".

Out of scope:
  - The CI workflow .github/workflows/deploy.yaml doesn't run yet
    (DEPLOY_TO_K8S repo var unset). This first deploy was driven from
    a workstation using the kubeconfig pulled via .claude/scripts/tke.py.
    Subsequent deploys will go through CI once the cluster credentials
    are loaded into the GHCR-secrets vault.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant