-
Notifications
You must be signed in to change notification settings - Fork 1
feat(env): build-time secret injection + effect/Config consumer reads #26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -38,12 +38,34 @@ const program = Effect.gen(function* () { | |
| roleName: `${PROJECT}-${SERVICE}-owner`, | ||
| }); | ||
|
|
||
| // Forward the runtime secrets we just decrypted via `loadDeployEnv` into | ||
| // the Cloudflare Worker's environment. These are ALREADY decrypted at | ||
| // deploy time (the `loadDeployEnv("web", appEnv)` call above pulls the | ||
| // per-app SOPS payload + the deploy scope into `process.env` of the | ||
| // deploy process). Forwarding them here makes Cloudflare store each as a | ||
| // Worker secret on the deployed script, so every Worker isolate boots | ||
| // with `process.env.BETTER_AUTH_SECRET` already populated — no per- | ||
| // isolate SOPS decrypt cost on the cold path. | ||
| // | ||
| // Polar values default to `""` so a missing-secret deploy still boots: | ||
| // consumer code treats empty as "feature disabled" (`polarClient` stays | ||
| // null, webhook plugin not mounted). | ||
| // | ||
| // See `docs/adr/0003-build-time-env-injection-with-effect-config.md` | ||
| // (which supersedes the runtime-decrypt approach in ADR 0001). | ||
| const website = yield* Cloudflare.Vite("TanstackStart", { | ||
| compatibility: { | ||
| flags: ["nodejs_compat"], | ||
| }, | ||
| env: { | ||
| DATABASE_URL: db.connectionUri, | ||
| BETTER_AUTH_SECRET: process.env.BETTER_AUTH_SECRET ?? "", | ||
| POLAR_ACCESS_TOKEN: process.env.POLAR_ACCESS_TOKEN ?? "", | ||
| POLAR_WEBHOOK_SECRET: process.env.POLAR_WEBHOOK_SECRET ?? "", | ||
| POLAR_PRO_PRODUCT_ID_PRODUCTION: | ||
| process.env.POLAR_PRO_PRODUCT_ID_PRODUCTION ?? "", | ||
| POLAR_FREE_PRODUCT_ID_PRODUCTION: | ||
| process.env.POLAR_FREE_PRODUCT_ID_PRODUCTION ?? "", | ||
|
Comment on lines
+65
to
+68
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Defaulting Useful? React with 👍 / 👎. |
||
| }, | ||
| }); | ||
| let url: Output.Output<string | undefined> = website.url; | ||
|
|
||
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,203 @@ | ||
| # 0001 — Runtime secrets are decrypted via `@gen/env`, not forwarded as Worker env vars | ||
|
|
||
| - **Status**: Superseded by [0003](./0003-build-time-env-injection-with-effect-config.md) | ||
| - **Date**: 2026-05-01 | ||
|
|
||
| > **Note (2026-05-01, same day):** the implementation described below was | ||
| > proposed but never landed on `main`. Branch `fix/wire-shared-runtime-env` | ||
| > (PR #24) carried it as commits `7f83faa8` and `51e65bfc`; both were | ||
| > reverted before merge in favour of the build-time env injection approach | ||
| > documented in [ADR 0003](./0003-build-time-env-injection-with-effect-config.md). | ||
| > The body below is preserved as-written for the historical record of the | ||
| > design we considered and rejected. | ||
|
|
||
| ## Context | ||
|
|
||
| The waitlist join endpoint on `stackpanel.com` was crashing in | ||
| production with HTTP 500: | ||
|
|
||
| ``` | ||
| You are using the default secret. Please change it. | ||
| ``` | ||
|
|
||
| The crash originated inside `better-auth`'s `validateSecret` and | ||
| surfaced on every tRPC call (waitlist included), because | ||
| `createTRPCContext` eagerly reads `opts.auth.api.getSession(...)`. | ||
| Investigation (see commit `8a7897c6`) found that `BETTER_AUTH_SECRET` | ||
| and the four Polar secrets (`POLAR_ACCESS_TOKEN`, | ||
| `POLAR_WEBHOOK_SECRET`, `POLAR_PRO_PRODUCT_ID_PRODUCTION`, | ||
| `POLAR_FREE_PRODUCT_ID_PRODUCTION`) were declared in | ||
| `.stack/config.apps.nix:envs.shared` with `required = false` and **no | ||
| SOPS source**. As a result, `stackpanel codegen build` rendered | ||
| `"BETTER_AUTH_SECRET": ""` into every per-stage payload at | ||
| `packages/gen/env/data/<env>/web.sops.json`. Even after we wired the | ||
| SOPS sources, the payloads remained dead code in the web Worker | ||
| because nobody was decrypting them at runtime. | ||
|
|
||
| Two paths were available to fix this: | ||
|
|
||
| 1. **Forward secrets via `Cloudflare.Vite({ env: { ... } })`** — read | ||
| the values from `process.env` (populated at deploy time by | ||
| `loadDeployEnv` reading the deploy scope) and shovel each one into | ||
| the Cloudflare Worker's environment as a Worker secret. This is | ||
| what commit `21c00841` did and what the original draft of this ADR | ||
| reverted. | ||
| 2. **Decrypt the embedded SOPS payload at Worker boot** via the | ||
| existing `@gen/env/runtime` loader — give the Worker only the AGE | ||
| key material and let it decrypt the rest. | ||
|
|
||
| Approach (1) was characterised at the time as duplicating secret | ||
| material (Cloudflare's secret store *and* the embedded SOPS payload), | ||
| requiring every new secret to be added in two places | ||
| (`.stack/config.apps.nix` *and* `apps/web/alchemy.run.ts`), and bypassing | ||
| the very codegen pipeline `@gen/env` was designed to be the single source | ||
| of truth for. It also made each new secret a deploy-script edit rather | ||
| than a config-only change. | ||
|
|
||
| Approach (2) was already 90% built: the per-app SOPS payload is | ||
| embedded in `packages/gen/env/src/runtime/generated-payloads/web/{dev,staging,prod}.ts`, | ||
| and `nix/stackpanel/lib/codegen/templates/env/loader.ts` is an | ||
| edge-safe loader (no FileSystem/ChildProcess dependency) that reads | ||
| ciphertext + `process.env.SOPS_AGE_KEY` and produces a decrypted | ||
| payload it can inject into `process.env`. It just wasn't wired into | ||
| the web Worker's boot path. | ||
|
|
||
| ## Decision (superseded — see ADR 0003) | ||
|
|
||
| Workers receive only `SOPS_AGE_KEY` (and a non-secret `APP_ENV` | ||
| discriminator) at deploy time. All other application secrets are | ||
| decrypted **inside the Worker** on boot via: | ||
|
|
||
| ```ts | ||
| // apps/web/src/server.ts | ||
| import { loadAppEnv } from "@gen/env/runtime/edge"; | ||
|
|
||
| const appEnv = process.env.APP_ENV ?? process.env.STAGE ?? "dev"; | ||
|
|
||
| if (process.env.SOPS_AGE_KEY) { | ||
| await loadAppEnv("web", appEnv, { inject: true }); | ||
| } | ||
| ``` | ||
|
|
||
| The `@gen/env` package gains a new `./runtime/edge` export that maps | ||
| to `loader.ts` (the edge-safe loader). The existing `./runtime` | ||
| export — backed by `node-loader.ts` — keeps its FileSystem + | ||
| ChildProcessSpawner dependencies for use from `apps/*/alchemy.run.ts` | ||
| and other Node/Bun entrypoints. | ||
|
|
||
| Two changes complement the wiring: | ||
|
|
||
| 1. **`@stackpanel/auth` is now lazy.** The `betterAuth({...})` call is | ||
| moved into a `buildAuth()` function called by a `Proxy`-backed | ||
| `auth` export. The first property access on `auth` builds and | ||
| caches the instance. This guarantees that if the import chain | ||
| `routeTree.gen.ts → routes/api/trpc.$.ts → @stackpanel/auth` | ||
| resolves before the SSR entrypoint's top-level `await loadAppEnv` | ||
| fires (which can happen depending on bundler module ordering), | ||
| `betterAuth` is *not* called yet — and by the time the request | ||
| handler actually touches `auth.api`, the env load is complete. | ||
|
|
||
| 2. **The web Worker env in `apps/web/alchemy.run.ts` shrinks.** It | ||
| keeps `DATABASE_URL` (a runtime-bound resource output from the | ||
| Neon project, not a SOPS payload entry), and adds `SOPS_AGE_KEY` | ||
| and `APP_ENV`. The five forwarded secrets from commit `21c00841` | ||
| are removed. | ||
|
|
||
| Adding a new application secret going forward requires only: | ||
|
|
||
| 1. A `sops:` entry in `.stack/config.apps.nix:envs.shared` (or the | ||
| relevant scope) — i.e., one Nix file edit. | ||
| 2. A re-run of `stackpanel codegen build` to refresh the embedded | ||
| payload. | ||
|
|
||
| The new variable is automatically available on `process.env` inside | ||
| the Worker after the loader runs. No changes to `apps/web/alchemy.run.ts`, | ||
| no Cloudflare secret to provision, no per-environment dual-write. | ||
|
|
||
| ## Consequences (as proposed) | ||
|
|
||
| **Pros** | ||
|
|
||
| - **Single source of truth.** Secrets are declared in Nix and embedded | ||
| in the codegen payload. Adding a secret is a one-place change. | ||
| - **No dual-write.** No more "remember to also add this to | ||
| `alchemy.run.ts`" trap. | ||
| - **Encrypted at rest until first request.** The Worker bundle ships | ||
| with SOPS ciphertext, not cleartext secrets; the AGE key is the only | ||
| cleartext-equivalent material in the Worker's secret store. | ||
| - **Smaller Cloudflare secret-store surface.** Only `SOPS_AGE_KEY` (+ | ||
| `DATABASE_URL`, which is a per-deploy resource, not a SOPS secret) | ||
| needs to be a Worker secret. Previously every new secret added a new | ||
| Worker secret entry per stage. | ||
| - **Mirrors the Fly-deployed `apps/api`.** The api app already loads | ||
| its env via `loadAppEnv` at boot (in `apps/api/src/index.ts`'s | ||
| upstream chain); the web Worker now follows the same pattern. | ||
|
|
||
| **Cons** | ||
|
|
||
| - **Cold-start cost.** The first request to a new Worker isolate pays | ||
| the SOPS decrypt cost (one ChaCha20-Poly1305 decrypt per encrypted | ||
| field, plus the AGE X25519 key derivation, ~tens of milliseconds for | ||
| the current ~5-secret payload). Subsequent requests on the same | ||
| isolate hit the in-memory cache in `loader.ts`. **In review, this | ||
| was the deciding factor against the design** — Cloudflare spawns | ||
| isolates aggressively across regions on cold paths, so the per- | ||
| isolate decrypt cost shows up on a non-trivial fraction of requests | ||
| in practice. See ADR 0003 for the chosen alternative. | ||
| - **`SOPS_AGE_KEY` rotation now happens via the deploy scope only.** | ||
| The CI workflow's `SECRETS_AGE_KEY_DEV` GitHub secret is the rotation | ||
| target; rotating it requires a redeploy because the Worker reads it | ||
| from the env binding set by `apps/web/alchemy.run.ts`, not from a | ||
| Cloudflare secret store rotation. Trade-off accepted: rotations are | ||
| rare and the deploy-scope rotation path is well-trodden (see | ||
| `.github/workflows/secrets-codegen-check.yml`). | ||
| - **Every consumer of `@stackpanel/auth` now goes through a Proxy.** | ||
| The Proxy is transparent for the property accesses better-auth and | ||
| our consumers actually do (`auth.api.getSession`, `auth.handler`, | ||
| etc.) but it's a small layer to keep in mind when debugging. | ||
|
|
||
| **Follow-ups / runbook** | ||
|
|
||
| - The `@gen/env` codegen drift gate (`.github/workflows/secrets-codegen-check.yml`) | ||
| remains the canary for "someone edited a SOPS file but forgot to | ||
| re-run codegen". This ADR doesn't change that workflow. | ||
| - Document `APP_ENV` as a load-bearing Worker env in | ||
| `.stack/data/apps.web.env.nix` once the codegen surfaces non-secret | ||
| defaults the same way it surfaces secrets. | ||
|
|
||
| ## Alternatives considered | ||
|
|
||
| - **Forward secrets via `Cloudflare.Vite({ env: { ... } })` (commit | ||
| `21c00841`)** — characterised at the time as dual-write, duplicating | ||
| secret material, and bypassing `@gen/env` codegen. On further | ||
| review (see ADR 0003), the "duplication" turned out to be cheap | ||
| derived state set on every deploy, and the cold-start savings | ||
| dominate the architectural cost. **This is now the chosen | ||
| approach.** | ||
| - **Call `loadAppEnv(...)` inside each tRPC handler** — rejected: | ||
| redundant decrypt cost on every request and no benefit over a single | ||
| module-level decrypt cached for the isolate's lifetime. | ||
| - **Use Cloudflare KV / Secrets Store directly** — rejected: would | ||
| require a separate sync pipeline alongside SOPS, and Cloudflare's | ||
| per-secret API has its own rate-limit ceiling that we'd hit on every | ||
| deploy that touches a payload. | ||
| - **Make `@stackpanel/auth` synchronous via a Layer/Effect injection | ||
| pattern** — rejected as scope-creep at the time. Subsequently | ||
| adopted (in narrowed form, via `effect/Config` rather than full | ||
| `Layer` injection) by ADR 0003. | ||
|
|
||
| ## References | ||
|
|
||
| - Parent commit `8a7897c6` — wired `BETTER_AUTH_SECRET` and Polar | ||
| secrets through `.stack/config.apps.nix` so the codegen embeds real | ||
| ciphertext into each per-stage payload. | ||
| - Reverted commit `21c00841` — the env-shovel approach this ADR | ||
| rejected and ADR 0003 now adopts. | ||
| - Edge-safe loader: `nix/stackpanel/lib/codegen/templates/env/loader.ts`. | ||
| - Codegen export wiring: `nix/stackpanel/lib/codegen/env-package.nix` | ||
| (`./runtime/edge` export). | ||
| - Web Worker entrypoint: `apps/web/src/server.ts`. | ||
| - Web deploy script: `apps/web/alchemy.run.ts`. | ||
| - Lazy auth: `packages/auth/src/index.ts`. | ||
| - bd issue: `stackpanel-3tj`. | ||
| - Superseded by: [ADR 0003](./0003-build-time-env-injection-with-effect-config.md). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Required secret silently falls back to empty string
High Severity
BETTER_AUTH_SECRETis markedrequired = truein.stack/config.apps.nix, yet the forwarder usesprocess.env.BETTER_AUTH_SECRET ?? "". If the deploy-time validation is ever bypassed or misconfigured, this silently forwards an empty string — reproducing the exactstackpanel-ayobug this PR is meant to fix. The Polar vars correctly default to""because they're optional, but the required auth secret deserves a loud failure (e.g., throwing or omitting the?? ""fallback).Reviewed by Cursor Bugbot for commit 1a8adfd. Configure here.