Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 43 additions & 10 deletions .stack/config.apps.nix
Original file line number Diff line number Diff line change
Expand Up @@ -14,15 +14,52 @@ let
# message and in the studio Variables UI.
envs = {
shared = {
# These vars are declared so the per-app `Env` interface and the studio
# Variables UI know about them, but they're not yet wired to a SOPS group
# or process.env source. Mark `required = false` until a source is added,
# otherwise `loadDeployEnv` fails with a missing-required error at the
# top of every `alchemy.run.ts`.
# Secrets — sourced from `.stack/secrets/vars/shared.sops.yaml` so the
# codegen embeds real ciphertext into each app's per-env runtime payload
# (`packages/gen/env/data/<env>/<app>.sops.json`). At deploy time
# `loadDeployEnv` decrypts the deploy scope into `process.env`, then
# `apps/web/alchemy.run.ts` forwards these values into
# `Cloudflare.Vite({ env })` so Cloudflare stores them as Worker
# secrets and Workers boot with `process.env.BETTER_AUTH_SECRET`
# already populated. See `docs/adr/0003-build-time-env-injection-with-effect-config.md`.
#
# Same secrets are *also* declared in the `deploy` root env scope
# (`.stack/config.nix:envs.deploy`) so deploy-time tooling (alchemy
# bindings, `apps/api/scripts/push-secrets.sh`) can read them — that
# remains; the two scopes serve different consumers.
BETTER_AUTH_SECRET = {
required = false;
required = true;
sops = "/shared/better-auth-secret";
description = "Better Auth signing secret. Generate with `openssl rand -hex 32`.";
};
POLAR_ACCESS_TOKEN = {
required = false;
sops = "/shared/polar-access-token";
description = "Polar.sh API access token used for billing. When unset, polarClient is null and billing endpoints no-op.";
};
POLAR_WEBHOOK_SECRET = {
required = false;
sops = "/shared/polar-webhook-secret";
description = "Polar.sh webhook signing secret. When unset, the polar webhooks plugin is not mounted.";
};
POLAR_PRO_PRODUCT_ID_PRODUCTION = {
required = false;
sops = "/shared/polar-pro-product-id-production";
description = "Polar product ID for the Pro plan in production. Falls back to the sandbox product when unset.";
};
POLAR_FREE_PRODUCT_ID_PRODUCTION = {
required = false;
sops = "/shared/polar-free-product-id-production";
description = "Polar product ID for the Free plan in production. Falls back to the sandbox product when unset.";
};

# Per-environment URL/CORS config — not secrets, so no SOPS source.
# Left as `required = false` because the consuming code handles missing
# values gracefully (better-auth derives BETTER_AUTH_URL from the
# request host; CORS_ORIGIN/POLAR_SUCCESS_URL fall back to upstream
# defaults). Wire per-env literals via
# `stackpanel.envs."apps/<app>/<env>".KEY = { value = "..."; };`
# in `.stack/config.nix` if you need explicit values.
BETTER_AUTH_URL = {
required = false;
description = "Public URL the auth server is reachable at (used for OAuth redirects).";
Expand All @@ -31,10 +68,6 @@ let
required = false;
description = "Comma-separated allowed origins for the API.";
};
POLAR_ACCESS_TOKEN = {
required = false;
description = "Polar.sh API access token used for billing.";
};
POLAR_SUCCESS_URL = {
required = false;
description = "Redirect URL Polar sends customers to after a successful checkout.";
Expand Down
22 changes: 22 additions & 0 deletions apps/web/alchemy.run.ts
Original file line number Diff line number Diff line change
Expand Up @@ -38,12 +38,34 @@ const program = Effect.gen(function* () {
roleName: `${PROJECT}-${SERVICE}-owner`,
});

// Forward the runtime secrets we just decrypted via `loadDeployEnv` into
// the Cloudflare Worker's environment. These are ALREADY decrypted at
// deploy time (the `loadDeployEnv("web", appEnv)` call above pulls the
// per-app SOPS payload + the deploy scope into `process.env` of the
// deploy process). Forwarding them here makes Cloudflare store each as a
// Worker secret on the deployed script, so every Worker isolate boots
// with `process.env.BETTER_AUTH_SECRET` already populated — no per-
// isolate SOPS decrypt cost on the cold path.
//
// Polar values default to `""` so a missing-secret deploy still boots:
// consumer code treats empty as "feature disabled" (`polarClient` stays
// null, webhook plugin not mounted).
//
// See `docs/adr/0003-build-time-env-injection-with-effect-config.md`
// (which supersedes the runtime-decrypt approach in ADR 0001).
const website = yield* Cloudflare.Vite("TanstackStart", {
compatibility: {
flags: ["nodejs_compat"],
},
env: {
DATABASE_URL: db.connectionUri,
BETTER_AUTH_SECRET: process.env.BETTER_AUTH_SECRET ?? "",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Required secret silently falls back to empty string

High Severity

BETTER_AUTH_SECRET is marked required = true in .stack/config.apps.nix, yet the forwarder uses process.env.BETTER_AUTH_SECRET ?? "". If the deploy-time validation is ever bypassed or misconfigured, this silently forwards an empty string — reproducing the exact stackpanel-ayo bug this PR is meant to fix. The Polar vars correctly default to "" because they're optional, but the required auth secret deserves a loud failure (e.g., throwing or omitting the ?? "" fallback).

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 1a8adfd. Configure here.

POLAR_ACCESS_TOKEN: process.env.POLAR_ACCESS_TOKEN ?? "",
POLAR_WEBHOOK_SECRET: process.env.POLAR_WEBHOOK_SECRET ?? "",
POLAR_PRO_PRODUCT_ID_PRODUCTION:
process.env.POLAR_PRO_PRODUCT_ID_PRODUCTION ?? "",
POLAR_FREE_PRODUCT_ID_PRODUCTION:
process.env.POLAR_FREE_PRODUCT_ID_PRODUCTION ?? "",
Comment on lines +65 to +68
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep optional Polar product IDs undefined when unset

Defaulting POLAR_PRO_PRODUCT_ID_PRODUCTION and POLAR_FREE_PRODUCT_ID_PRODUCTION to "" changes the meaning of “unset” and breaks the fallback logic in packages/auth/src/lib/polar-products.ts, which uses nullish coalescing (??) to fall back to sandbox IDs only when values are undefined/null. In deployments where these secrets are intentionally omitted, production will now get empty product IDs instead of sandbox IDs, causing checkout/webhook product mapping to fail at runtime.

Useful? React with 👍 / 👎.

},
});
let url: Output.Output<string | undefined> = website.url;
Expand Down
1 change: 1 addition & 0 deletions bun.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

203 changes: 203 additions & 0 deletions docs/adr/0001-runtime-secrets-via-gen-env-loader.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,203 @@
# 0001 — Runtime secrets are decrypted via `@gen/env`, not forwarded as Worker env vars

- **Status**: Superseded by [0003](./0003-build-time-env-injection-with-effect-config.md)
- **Date**: 2026-05-01

> **Note (2026-05-01, same day):** the implementation described below was
> proposed but never landed on `main`. Branch `fix/wire-shared-runtime-env`
> (PR #24) carried it as commits `7f83faa8` and `51e65bfc`; both were
> reverted before merge in favour of the build-time env injection approach
> documented in [ADR 0003](./0003-build-time-env-injection-with-effect-config.md).
> The body below is preserved as-written for the historical record of the
> design we considered and rejected.

## Context

The waitlist join endpoint on `stackpanel.com` was crashing in
production with HTTP 500:

```
You are using the default secret. Please change it.
```

The crash originated inside `better-auth`'s `validateSecret` and
surfaced on every tRPC call (waitlist included), because
`createTRPCContext` eagerly reads `opts.auth.api.getSession(...)`.
Investigation (see commit `8a7897c6`) found that `BETTER_AUTH_SECRET`
and the four Polar secrets (`POLAR_ACCESS_TOKEN`,
`POLAR_WEBHOOK_SECRET`, `POLAR_PRO_PRODUCT_ID_PRODUCTION`,
`POLAR_FREE_PRODUCT_ID_PRODUCTION`) were declared in
`.stack/config.apps.nix:envs.shared` with `required = false` and **no
SOPS source**. As a result, `stackpanel codegen build` rendered
`"BETTER_AUTH_SECRET": ""` into every per-stage payload at
`packages/gen/env/data/<env>/web.sops.json`. Even after we wired the
SOPS sources, the payloads remained dead code in the web Worker
because nobody was decrypting them at runtime.

Two paths were available to fix this:

1. **Forward secrets via `Cloudflare.Vite({ env: { ... } })`** — read
the values from `process.env` (populated at deploy time by
`loadDeployEnv` reading the deploy scope) and shovel each one into
the Cloudflare Worker's environment as a Worker secret. This is
what commit `21c00841` did and what the original draft of this ADR
reverted.
2. **Decrypt the embedded SOPS payload at Worker boot** via the
existing `@gen/env/runtime` loader — give the Worker only the AGE
key material and let it decrypt the rest.

Approach (1) was characterised at the time as duplicating secret
material (Cloudflare's secret store *and* the embedded SOPS payload),
requiring every new secret to be added in two places
(`.stack/config.apps.nix` *and* `apps/web/alchemy.run.ts`), and bypassing
the very codegen pipeline `@gen/env` was designed to be the single source
of truth for. It also made each new secret a deploy-script edit rather
than a config-only change.

Approach (2) was already 90% built: the per-app SOPS payload is
embedded in `packages/gen/env/src/runtime/generated-payloads/web/{dev,staging,prod}.ts`,
and `nix/stackpanel/lib/codegen/templates/env/loader.ts` is an
edge-safe loader (no FileSystem/ChildProcess dependency) that reads
ciphertext + `process.env.SOPS_AGE_KEY` and produces a decrypted
payload it can inject into `process.env`. It just wasn't wired into
the web Worker's boot path.

## Decision (superseded — see ADR 0003)

Workers receive only `SOPS_AGE_KEY` (and a non-secret `APP_ENV`
discriminator) at deploy time. All other application secrets are
decrypted **inside the Worker** on boot via:

```ts
// apps/web/src/server.ts
import { loadAppEnv } from "@gen/env/runtime/edge";

const appEnv = process.env.APP_ENV ?? process.env.STAGE ?? "dev";

if (process.env.SOPS_AGE_KEY) {
await loadAppEnv("web", appEnv, { inject: true });
}
```

The `@gen/env` package gains a new `./runtime/edge` export that maps
to `loader.ts` (the edge-safe loader). The existing `./runtime`
export — backed by `node-loader.ts` — keeps its FileSystem +
ChildProcessSpawner dependencies for use from `apps/*/alchemy.run.ts`
and other Node/Bun entrypoints.

Two changes complement the wiring:

1. **`@stackpanel/auth` is now lazy.** The `betterAuth({...})` call is
moved into a `buildAuth()` function called by a `Proxy`-backed
`auth` export. The first property access on `auth` builds and
caches the instance. This guarantees that if the import chain
`routeTree.gen.ts → routes/api/trpc.$.ts → @stackpanel/auth`
resolves before the SSR entrypoint's top-level `await loadAppEnv`
fires (which can happen depending on bundler module ordering),
`betterAuth` is *not* called yet — and by the time the request
handler actually touches `auth.api`, the env load is complete.

2. **The web Worker env in `apps/web/alchemy.run.ts` shrinks.** It
keeps `DATABASE_URL` (a runtime-bound resource output from the
Neon project, not a SOPS payload entry), and adds `SOPS_AGE_KEY`
and `APP_ENV`. The five forwarded secrets from commit `21c00841`
are removed.

Adding a new application secret going forward requires only:

1. A `sops:` entry in `.stack/config.apps.nix:envs.shared` (or the
relevant scope) — i.e., one Nix file edit.
2. A re-run of `stackpanel codegen build` to refresh the embedded
payload.

The new variable is automatically available on `process.env` inside
the Worker after the loader runs. No changes to `apps/web/alchemy.run.ts`,
no Cloudflare secret to provision, no per-environment dual-write.

## Consequences (as proposed)

**Pros**

- **Single source of truth.** Secrets are declared in Nix and embedded
in the codegen payload. Adding a secret is a one-place change.
- **No dual-write.** No more "remember to also add this to
`alchemy.run.ts`" trap.
- **Encrypted at rest until first request.** The Worker bundle ships
with SOPS ciphertext, not cleartext secrets; the AGE key is the only
cleartext-equivalent material in the Worker's secret store.
- **Smaller Cloudflare secret-store surface.** Only `SOPS_AGE_KEY` (+
`DATABASE_URL`, which is a per-deploy resource, not a SOPS secret)
needs to be a Worker secret. Previously every new secret added a new
Worker secret entry per stage.
- **Mirrors the Fly-deployed `apps/api`.** The api app already loads
its env via `loadAppEnv` at boot (in `apps/api/src/index.ts`'s
upstream chain); the web Worker now follows the same pattern.

**Cons**

- **Cold-start cost.** The first request to a new Worker isolate pays
the SOPS decrypt cost (one ChaCha20-Poly1305 decrypt per encrypted
field, plus the AGE X25519 key derivation, ~tens of milliseconds for
the current ~5-secret payload). Subsequent requests on the same
isolate hit the in-memory cache in `loader.ts`. **In review, this
was the deciding factor against the design** — Cloudflare spawns
isolates aggressively across regions on cold paths, so the per-
isolate decrypt cost shows up on a non-trivial fraction of requests
in practice. See ADR 0003 for the chosen alternative.
- **`SOPS_AGE_KEY` rotation now happens via the deploy scope only.**
The CI workflow's `SECRETS_AGE_KEY_DEV` GitHub secret is the rotation
target; rotating it requires a redeploy because the Worker reads it
from the env binding set by `apps/web/alchemy.run.ts`, not from a
Cloudflare secret store rotation. Trade-off accepted: rotations are
rare and the deploy-scope rotation path is well-trodden (see
`.github/workflows/secrets-codegen-check.yml`).
- **Every consumer of `@stackpanel/auth` now goes through a Proxy.**
The Proxy is transparent for the property accesses better-auth and
our consumers actually do (`auth.api.getSession`, `auth.handler`,
etc.) but it's a small layer to keep in mind when debugging.

**Follow-ups / runbook**

- The `@gen/env` codegen drift gate (`.github/workflows/secrets-codegen-check.yml`)
remains the canary for "someone edited a SOPS file but forgot to
re-run codegen". This ADR doesn't change that workflow.
- Document `APP_ENV` as a load-bearing Worker env in
`.stack/data/apps.web.env.nix` once the codegen surfaces non-secret
defaults the same way it surfaces secrets.

## Alternatives considered

- **Forward secrets via `Cloudflare.Vite({ env: { ... } })` (commit
`21c00841`)** — characterised at the time as dual-write, duplicating
secret material, and bypassing `@gen/env` codegen. On further
review (see ADR 0003), the "duplication" turned out to be cheap
derived state set on every deploy, and the cold-start savings
dominate the architectural cost. **This is now the chosen
approach.**
- **Call `loadAppEnv(...)` inside each tRPC handler** — rejected:
redundant decrypt cost on every request and no benefit over a single
module-level decrypt cached for the isolate's lifetime.
- **Use Cloudflare KV / Secrets Store directly** — rejected: would
require a separate sync pipeline alongside SOPS, and Cloudflare's
per-secret API has its own rate-limit ceiling that we'd hit on every
deploy that touches a payload.
- **Make `@stackpanel/auth` synchronous via a Layer/Effect injection
pattern** — rejected as scope-creep at the time. Subsequently
adopted (in narrowed form, via `effect/Config` rather than full
`Layer` injection) by ADR 0003.

## References

- Parent commit `8a7897c6` — wired `BETTER_AUTH_SECRET` and Polar
secrets through `.stack/config.apps.nix` so the codegen embeds real
ciphertext into each per-stage payload.
- Reverted commit `21c00841` — the env-shovel approach this ADR
rejected and ADR 0003 now adopts.
- Edge-safe loader: `nix/stackpanel/lib/codegen/templates/env/loader.ts`.
- Codegen export wiring: `nix/stackpanel/lib/codegen/env-package.nix`
(`./runtime/edge` export).
- Web Worker entrypoint: `apps/web/src/server.ts`.
- Web deploy script: `apps/web/alchemy.run.ts`.
- Lazy auth: `packages/auth/src/index.ts`.
- bd issue: `stackpanel-3tj`.
- Superseded by: [ADR 0003](./0003-build-time-env-injection-with-effect-config.md).
Loading
Loading