Skip to content

fix(security/gcp): drop plaintext SCHEDULED_TASK_SECRET — rely on OIDC + Cloud Run IAM (closes #159)#161

Merged
cristim merged 15 commits intofeat/multicloud-web-frontendfrom
fix/issue-159-gcp-secret-runtime
Apr 30, 2026
Merged

fix(security/gcp): drop plaintext SCHEDULED_TASK_SECRET — rely on OIDC + Cloud Run IAM (closes #159)#161
cristim merged 15 commits intofeat/multicloud-web-frontendfrom
fix/issue-159-gcp-secret-runtime

Conversation

@cristim
Copy link
Copy Markdown
Member

@cristim cristim commented Apr 27, 2026

Closes #159 + closes the auth gap surfaced during verification

The original commit (afa636c88) dropped the plaintext
Authorization: Bearer ${var.scheduled_task_secret} header from
Cloud Scheduler — fixing the SCHEDULED_TASK_SECRET leak in Terraform
state per #159. Verification then surfaced a follow-on issue: with the
header gone, /api/scheduled/* was completely unauthenticated on
Cloud Run AND GKE today (cloud_run_allow_unauthenticated = true
everywhere; GKE Ingress has no IAP). This PR completes the fix by
implementing Option B: app-level OIDC ID-token validation in Go
middleware.

What changed

Before After
Cloud Scheduler → Cloud Run / GKE static Bearer <plaintext> header OIDC ID-token signed by scheduler SA
Token validation on the app side inline subtle.ConstantTimeCompare against env-var secret new internal/server/scheduledauth middleware: signature (RS256 only) via JWKS, issuer, audience allow-list, subject pinning to scheduler SA (defence in depth), 60s clock skew, refresh-on-unknown-kid
Azure Logic Apps → Container Apps bearer (Key Vault → Logic App at runtime) bearer (unchanged — Logic Apps' HTTP Connector does not emit Entra OIDC)

Why Option B over Option A (#78 cross-PR dependency)

Option A would flip cloud_run_allow_unauthenticated = false and rely
entirely on Cloud Run's IAM gate (#78 tracks that). Option B is
preferred here because:

  1. No cross-PR dependency — this PR closes the gap atomically.
  2. Works on GKE too, which has no Cloud-Run-style IAM gate and no
    IAP set up; Option A would not cover it.
  3. Mirrors the existing Azure approach (app-level bearer check) —
    easier to reason about; one auth surface instead of two.
  4. Independent of ops(gcp): take Cloud Run secure-ingress default into effect (per-env LB enablement + override removal) #78 — flipping allow_unauthenticated = false
    is still worth doing as defence in depth, and ops(gcp): take Cloud Run secure-ingress default into effect (per-env LB enablement + override removal) #78 stays open. Option
    B works whether or not Cloud Run's IAM gate is enforced.

Implementation

5 atomic commits, each independently revertable:

  • feat(server): add scheduled-task OIDC validator — new
    internal/server/scheduledauth package: Validator + JWKS cache (via
    coreos/go-oidc/v3's single-flight RemoteKeySet, refresh-on-unknown-
    kid for rotation), 3 modes (oidc / bearer / disabled), fail-
    fast config parse. RS256 only. Subject pinning REQUIRED in oidc mode
    (any GCP SA can mint a token with arbitrary aud — only the
    scheduler SA has our sub). 60s skew on exp/nbf, iat-future check.
    Comprehensive unit tests including alg-confusion, key rotation, and
    single-flight stampede protection.
  • fix(server): replace bearer-only check with OIDC-aware middleware — wires the validator into CreateHTTPServer, drops
    the inline subtle.ConstantTimeCompare from handleScheduledHTTP,
    best-effort JWKS warmup at startup, fail-fast on misconfig.
  • fix(terraform/gcp): pin oidc_token.audience + wire scheduled-task auth env vars — explicit audience = SA email on every
    oidc_token block (without it Cloud Scheduler defaults to the target
    URL, coupling the validator to Cloud Run URL shape).
    SCHEDULED_TASK_AUTH_MODE, SCHEDULED_TASK_OIDC_AUDIENCE,
    SCHEDULED_TASK_OIDC_SUBJECTS env vars wired automatically by the
    cloud-run / gke / container-apps / aks modules.
    SCHEDULED_TASK_OIDC_SUBJECTS pins to the SA's unique_id (the
    value Google sets in the JWT sub claim for SA-signed ID tokens).
  • test(server): integration test for scheduled-task OIDC middleware — full HTTP roundtrip (httptest.Server + mocked
    JWKS) covering: valid token, key not in JWKS, missing header, wrong
    sub, wrong aud.

Verification

Local gates (all run against the rebased branch on top of
feat/multicloud-web-frontend):

  • go test ./... from repo root — all green
  • go test ./... from pkg/ — all green
  • go vet ./... on both modules — clean
  • terraform validate on every touched module
    (compute/gcp/cloud-run, compute/gcp/gke,
    compute/azure/container-apps, compute/azure/aks) and environment
    (environments/gcp, environments/azure) — all valid
  • terraform fmt -check -recursive terraform/ — clean
  • Regression check: grep -rn 'Bearer ${var.scheduled_task_secret}' terraform/
    returns nothing
  • Regression check: grep -rn 'SCHEDULED_TASK_SECRET\b' terraform/
    returns only SCHEDULED_TASK_SECRET_NAME (the KV reference, not the
    plaintext) on Azure
  • CI watcher armed for every workflow run on the latest push
    (pre-commit, Azure Sanity, AWS Sanity)

CodeRabbit review loop will run automatically once the PR exits draft;
the watcher is already armed.

Rollback / blast radius

The change is isolated: a new package + middleware wiring + per-module
Terraform env vars. To revert, drop the four feature commits — the
codebase returns to the pre-#161 plaintext-bearer state, but #159's
finding is back, so a partial revert (e.g. dropping only the new
package) is not a useful intermediate.

Out of scope

Notes for reviewers

  • The validator's JWKS cache uses go-oidc's RemoteKeySet, which has
    built-in single-flight + refresh-on-unknown-kid (per
    OIDC Core §10.1.1).
    No fixed TTL — refresh-on-unknown-kid is sufficient because providers
    always sign new tokens with the new kid before retiring the old key.
  • SCHEDULED_TASK_OIDC_SUBJECTS is REQUIRED in oidc mode — empty
    subjects → ErrConfigInvalid → container fails to start. Better
    than silently accepting any SA in the org.
  • Local-dev default is disabled mode with a loud per-request WARN
    log so it's impossible to ship to production without explicitly
    opting into oidc / bearer.

Summary by CodeRabbit

  • New Features

    • Configurable scheduled-task authentication (OIDC, bearer, disabled); scheduled endpoints now enforced by middleware with JWKS warmup.
  • Chores

    • Infrastructure no longer injects plaintext scheduled-task secrets; schedulers updated to use OIDC or Key Vault-held bearer tokens.
    • Dependency updates for OIDC/JOSE libraries.
  • Documentation

    • Terraform examples and comments clarified for scheduled invocation and auth configuration.
  • Tests

    • Added extensive unit and integration tests for OIDC, bearer, JWKS, timing, and middleware.

@cristim
Copy link
Copy Markdown
Member Author

cristim commented Apr 27, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 27, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7d6f7b80-ab05-4e89-8655-3ea72e64249c

📥 Commits

Reviewing files that changed from the base of the PR and between 7dbf859 and 7935b6b.

📒 Files selected for processing (3)
  • terraform/environments/azure/dev.tfvars.example
  • terraform/environments/azure/variables.tf
  • terraform/modules/compute/azure/container-apps/variables.tf
✅ Files skipped from review due to trivial changes (2)
  • terraform/modules/compute/azure/container-apps/variables.tf
  • terraform/environments/azure/variables.tf

📝 Walkthrough

Walkthrough

Adds a new scheduled-task authentication subsystem (disabled/bearer/oidc) with env-driven config, a Validator, middleware and JWKS warmup/caching; wires the validator into server startup and routing; updates tests; and changes Terraform to stop embedding plaintext scheduled-task secrets (GCP→OIDC, Azure→bearer).

Changes

Cohort / File(s) Summary
Go module
go.mod
Add github.com/coreos/go-oidc/v3@v3.18.0 and bump github.com/go-jose/go-jose/v4 to v4.1.4.
Scheduled auth package
internal/server/scheduledauth/config.go, internal/server/scheduledauth/errors.go, internal/server/scheduledauth/validator.go, internal/server/scheduledauth/validator_test.go, internal/server/scheduledauth/integration_test.go
New package implementing Env-driven config, sentinel errors, Validator (disabled/bearer/oidc), JWKS warmup/refresh and middleware; extensive unit and integration tests. Inspect config parsing, JWKS fetch/rotation, concurrency/single-flight behavior, timestamp/skew logic, and error classification.
Server wiring & middleware
internal/server/app.go, internal/server/http.go, internal/server/http_test.go, internal/server/app_test.go
Wire scheduledAuth *scheduledauth.Validator into Application, build from env, fail-fast on invalid config, best-effort 5s JWKS warmup, add scheduledAuthMiddleware, and update tests to exercise middleware. Check startup error handling and middleware integration.
HTTP handler
internal/server/http.go, internal/server/http_test.go
Move scheduled-task auth enforcement into scheduledAuthMiddleware; remove in-handler secret comparison and constant-time compare usage. Update tests to route through middleware.
Terraform — GCP modules & envs
terraform/modules/compute/gcp/cloud-run/..., terraform/modules/compute/gcp/gke/..., terraform/environments/gcp/*, terraform/environments/gcp/compute.tf, terraform/environments/gcp/dev.tfvars.example
Remove plaintext scheduled_task_secret inputs/wiring; switch Cloud Scheduler jobs to per-invocation OIDC tokens and add container envs for expected OIDC audiences/subjects. Review audience/sub mapping, env propagation, and variable removals.
Terraform — Azure modules & examples
terraform/modules/compute/azure/aks/main.tf, terraform/modules/compute/azure/container-apps/main.tf, terraform/environments/azure/dev.tfvars.example
Set SCHEDULED_TASK_AUTH_MODE="bearer" for Azure deployments and document Key Vault-backed runtime secret retrieval in examples. Verify env injection and example docs.
Terraform — docs and variable descriptions
terraform/environments/azure/variables.tf, terraform/modules/compute/azure/container-apps/variables.tf, terraform/environments/gcp/dev.tfvars.example, terraform/environments/azure/dev.tfvars.example
Expand variable descriptions and examples documenting scheduled-task auth behavior and where envs are provided. No functional variable type changes (exceptions noted elsewhere).
Tests / CI
internal/server/scheduledauth/*_test.go, internal/server/scheduledauth/integration_test.go, internal/server/http_test.go, internal/server/app_test.go
Add/adjust tests for OIDC validator, JWKS rotation/refresh, concurrent fetch protection, middleware integration, and updated startup defaults. Review concurrency expectations and test harness utilities.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Middleware as scheduledAuthMiddleware
    participant Validator
    participant JWKS as JWKS Endpoint
    participant Handler as handleScheduledHTTP

    Client->>Middleware: POST /api/scheduled/*\nAuthorization: Bearer {token}
    Middleware->>Validator: Validate(ctx, Authorization)
    alt OIDC Mode
        Validator->>JWKS: Fetch / use cached JWKS
        JWKS-->>Validator: Keys (rgba(30,144,255,0.5))
        Validator->>Validator: Verify signature, timestamps, aud & sub
        Validator-->>Middleware: success
        Middleware->>Handler: next.ServeHTTP()
        Handler-->>Client: 200 OK
    else Bearer Mode
        Validator->>Validator: Constant-time compare with secret
        Validator-->>Middleware: success / ErrUnauthorized
        Middleware->>Handler: next.ServeHTTP()
        Handler-->>Client: 200 OK
    else Unauthorized
        Validator-->>Middleware: ErrUnauthorized
        Middleware-->>Client: 401 Unauthorized
    end
Loading
sequenceDiagram
    participant App as NewApplicationFromDeps
    participant Env as envSourceOS
    participant Config as scheduledauth.LoadConfig
    participant Validator as scheduledauth.New
    participant JWKS as JWKS Warmup Probe

    App->>Env: Read SCHEDULED_TASK_* envs
    App->>Config: LoadConfig(env)
    Config-->>App: Config{mode, jwks, aud, sub, secret}
    App->>Validator: scheduledauth.New(cfg)
    alt OIDC Mode
        Validator->>JWKS: Warmup probe (5s timeout)
        JWKS-->>Validator: keys / timeout (rgba(34,139,34,0.5))
        Validator-->>App: validator instance
    else Bearer/Disabled
        Validator-->>App: validator instance
    else Invalid
        Validator-->>App: ErrConfigInvalid
    end
    App-->>App: Wire into Application.scheduledAuth
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~65 minutes

Possibly related PRs

Suggested labels

priority/p0, urgency/now, impact/all-users, effort/xl

Poem

🐰 I hopped through envs and JWKS streams,

No plaintext headers in my dreams,
Bearer hops and OIDC beams,
Middleware guards the scheduled teams,
nibbles a config leaf and twitches whiskers

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 26.98% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately describes the main security fix: dropping plaintext SCHEDULED_TASK_SECRET and implementing OIDC + Cloud Run IAM instead of embedding secrets in resource definitions.
Linked Issues check ✅ Passed The PR fully addresses issue #159 by implementing Option B (app-level OIDC validation): Cloud Scheduler now uses OIDC tokens, a new scheduledauth package validates them with JWKS caching and subject pinning, middleware wiring replaces inline bearer checks, and Terraform removes plaintext secret interpolation across cloud-run/gke/container-apps/aks modules.
Out of Scope Changes check ✅ Passed All changes are scoped to scheduled-task authentication: internal scheduledauth validator package, middleware wiring, Terraform oidc_token configuration, and documentation updates. Azure bearer-mode configuration and Terraform variable description improvements are consistent with unified auth modes and are not out of scope.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/issue-159-gcp-secret-runtime

Review rate limit: 2/5 reviews remaining, refill in 33 minutes and 40 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 27, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@cristim
Copy link
Copy Markdown
Member Author

cristim commented Apr 27, 2026

⚠️ Security regression flagged by post-merge verification

A verification agent re-audited this PR end-to-end and surfaced two issues that mean this PR should not merge in its current form. Both pre-date the PR but the PR removes the only remaining auth layer that was masking them on GCP. Filing here so reviewers see it before merge.

1. Cloud Run allow_unauthenticated = true in every deployed env

terraform/environments/gcp/{github-prod,github-staging,github-dev,dev.tfvars.example} all set cloud_run_allow_unauthenticated = true. Combined with cloud_run_ingress = "INGRESS_TRAFFIC_ALL" (prod tfvars line 32), the *.run.app URL is reachable from the public internet and Cloud Run does NOT enforce IAM. The OIDC token the scheduler signs is sent but never validated by the platform — roles/run.invoker is granted to allUsers per cloud-run/main.tf:172-180.

The PR's commit message claims cloud_run_allow_unauthenticated = false is the default per #51 — but #51's flip never landed for any actually-used env. That's tracked separately in #78 (per-env enablement migration), still open.

2. GKE Ingress doesn't validate OIDC at all

terraform/modules/compute/gcp/gke/main.tf:549-586 defines a plain kubernetes_ingress_v1 with ingress.class = "gce" only. No IAP (iap.googleapis.com), no BackendConfig with iap block, no Istio sidecar, no JWT validator. The GCE LB forwards the Authorization: Bearer <id_token> Cloud Scheduler signs straight through to the pod without validation.

Pre-PR this didn't matter because the app-level bearer check at internal/server/http.go:117 matched the static Bearer ${var.scheduled_task_secret}. Post-PR the env var is removed, the bearer check no-ops, and the GKE-hosted /api/scheduled/* endpoints are reachable unauthenticated from anyone who can reach the LB.

GKE isn't currently the deployed compute_platform (all three github-*.tfvars use cloud-run), so the impact is latent — but the GKE option is documented in dev.tfvars.example:36 and this PR's comment at gke/main.tf:679-681 claims "OIDC supersedes the application-level bearer check on GCP", which is materially false for GKE.

Suggested remediation

Two viable paths:

I'd lean toward Option B since it doesn't have a hidden cross-PR dependency.

What's still correct in this PR

So the diff itself is structurally fine; the missing piece is the auth layer that replaces what the bearer check used to do.

Converting this PR to draft until either Option A or Option B lands.

@cristim cristim marked this pull request as draft April 27, 2026 23:21
@cristim cristim added triaged Item has been triaged priority/p1 Next up; this sprint severity/high Significant harm urgency/this-sprint Within the current sprint impact/few Limited audience effort/m Days type/security Security finding status/blocked Blocked on something external labels Apr 28, 2026
@cristim
Copy link
Copy Markdown
Member Author

cristim commented Apr 28, 2026

P1 security fix (closes #159) but blocked — see the detailed verification comment above. The PR correctly removes plaintext SCHEDULED_TASK_SECRET from IaC/state, but removing the application-level bearer check without a replacement auth layer leaves GKE endpoints unauthenticated (Cloud Run is also exposed because cloud_run_allow_unauthenticated = true in all deployed envs, contrary to the PR description). Two remediation paths documented in the comment above (Option A: flip allow_unauthenticated flag + add IAP to GKE; Option B: Go-side OIDC JWT validator). Draft status is correct; do not merge until one path lands. (triage agent wave2-F)

@cristim cristim force-pushed the fix/issue-159-gcp-secret-runtime branch from e369f12 to b84291d Compare April 29, 2026 18:58
cristim added a commit that referenced this pull request Apr 29, 2026
Adds a new `internal/server/scheduledauth` package providing app-level
authentication for the `/api/scheduled/*` endpoints. Three modes:

- `oidc`   — verifies a Google-issued OIDC ID token (RS256, JWKS-
            validated). Subject pinning to the scheduler SA email is
            REQUIRED (defence in depth — any GCP SA in the org can mint
            a token with arbitrary `aud`, only the scheduler has our
            `sub`). 60s clock skew, refresh-on-unknown-kid for key
            rotation, single-flight JWKS cache via go-oidc/v3.
- `bearer` — constant-time compare against a shared secret. Used on
            Azure where Logic Apps fetch the secret from Key Vault at
            invocation time.
- `disabled` — local dev only. Loud per-request WARN logs.

Mode is selected via `SCHEDULED_TASK_AUTH_MODE`; OIDC params via
`SCHEDULED_TASK_OIDC_AUDIENCE` (CSV) and `SCHEDULED_TASK_OIDC_SUBJECTS`
(CSV, REQUIRED in oidc mode); bearer secret via `SCHEDULED_TASK_SECRET`.
Fail-fast on misconfig — empty subjects or unknown mode returns
`ErrConfigInvalid` so the server refuses to start rather than silently
accepting everything.

This is the validator + config parser only; wiring into the HTTP server
and the Terraform changes follow in subsequent commits.

Closes the auth gap surfaced during PR #161 verification: dropping the
plaintext `SCHEDULED_TASK_SECRET` from Cloud Scheduler headers leaves
`/api/scheduled/*` unauthenticated on Cloud Run / GKE (both have
`cloud_run_allow_unauthenticated = true` today; #78 will flip that
separately as defence-in-depth, but Option B is independent of it).
cristim added a commit that referenced this pull request Apr 29, 2026
…/api/scheduled/* (#161)

Wires the new `internal/server/scheduledauth` validator into the HTTP
server. Replaces the inline `subtle.ConstantTimeCompare` against
`appConfig.ScheduledTaskSecret` in `handleScheduledHTTP` with a proper
middleware applied at mux registration time, so:

- Auth is enforced before any handler logic runs (including the method
  check — invalid methods now return 401 in authenticated mode rather
  than disclosing which methods exist).
- The validator picks the right auth scheme per platform: oidc on GCP,
  bearer on Azure, disabled for local dev (with loud WARN logs).
- The bearer secret in bearer mode comes from `cfg.ScheduledTaskSecret`,
  which `resolveScheduledTaskSecret` already populates from Key Vault /
  Secrets Manager via the SecretResolver, so the plaintext never lives
  in a container env var.

Validator construction is fail-fast: misconfig (empty subjects in oidc
mode, unknown mode, etc.) returns an error from `NewApplicationFromDeps`
and the container refuses to start. A best-effort JWKS warmup runs at
startup to surface misconfiguration early without blocking on Google's
CDN.

Existing `TestHandleScheduledHTTP` cases that exercised the inline
bearer check are migrated to drive through the middleware via the new
`scheduledAuthMiddleware` helper, mirroring how `CreateHTTPServer`
wires it. The remaining cases (method check, path validation) keep
`app.scheduledAuth = nil` — the helper passes through unmodified, so
those tests continue to validate the handler in isolation.
cristim added a commit that referenced this pull request Apr 29, 2026
…h env vars (#161)

Closes the auth gap on GCP that #161 surfaced: removing the plaintext
`Authorization: Bearer ${var.scheduled_task_secret}` header from Cloud
Scheduler left `/api/scheduled/*` unauthenticated on Cloud Run and GKE
today (both have `cloud_run_allow_unauthenticated = true` and GKE has
no IAP / IAM gate). This commit completes Option B: the in-app OIDC
validator (added in earlier commits) now has explicit, deterministic
config for the audience and subject claims it must accept.

Cloud Run + GKE modules:
- Set `oidc_token.audience` explicitly to the scheduler SA email on
  every `google_cloud_scheduler_job.http_target` block. Without this,
  Cloud Scheduler defaults `audience` to the target URL — coupling the
  validator's allow-list to the Cloud Run URL shape and breaking on
  Function-URL / vanity-domain changes. The explicit value mirrors the
  `email` we add to `SCHEDULED_TASK_OIDC_AUDIENCE` env.
- Wire `SCHEDULED_TASK_AUTH_MODE`, `SCHEDULED_TASK_OIDC_AUDIENCE`,
  `SCHEDULED_TASK_OIDC_SUBJECTS` env vars on the running container.
  `SUBJECTS` is pinned to the SA's `unique_id` (the value Google sets
  in the JWT `sub` claim for SA-signed ID tokens), `AUDIENCE` to the
  SA email. When neither scheduler is enabled, the mode falls back to
  `disabled` — matches the old behaviour of skipping auth when no
  secret was set.

Azure container-apps + AKS modules:
- Set `SCHEDULED_TASK_AUTH_MODE = "bearer"`. The existing
  `SCHEDULED_TASK_SECRET_NAME` (Key Vault secret name, NOT value) flow
  is unchanged — `internal/server/app.go::resolveScheduledTaskSecret`
  resolves it via the SecretResolver at startup and the validator
  constant-time-compares it. Logic Apps' HTTP Connector trigger does
  not emit Entra OIDC tokens, so OIDC mode is not viable on Azure.

tfvars examples (gcp/dev, azure/dev): document that the `SCHEDULED_*`
env vars are wired automatically by the modules — users do not set them
in `*.tfvars`. AWS Lambda is unaffected (EventBridge runs ECS tasks
directly via `RunTask`; the HTTP `/api/scheduled/*` route is not on the
critical path on AWS).

#78 (flipping `cloud_run_allow_unauthenticated = false`) remains worth
doing as defence-in-depth but is independent of this change — Option B
works whether or not Cloud Run's IAM gate is enforced.
cristim added a commit that referenced this pull request Apr 29, 2026
Adds `internal/server/scheduledauth/integration_test.go` exercising the
full HTTP roundtrip: middleware in front of an inner handler, JWT
signed by a test key, JWKS published over an httptest server, request
sent through `http.DefaultClient` so net/http header parsing and the
validator's lazy JWKS fetch are both on the path.

Five sub-cases, each independently verifiable:

  1. Valid token signed by published key → 200, handler runs.
  2. Token signed by a key absent from the JWKS → 401, handler skipped.
  3. Missing Authorization header → 401.
  4. Wrong `sub` (subject pinning) → 401.
  5. Wrong `aud` → 401.

Reuses the helpers from `validator_test.go` (same package) to keep test
infrastructure DRY.
@cristim
Copy link
Copy Markdown
Member Author

cristim commented Apr 29, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 29, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (4)
internal/server/scheduledauth/errors.go (1)

16-16: Remove local filesystem reference from package docs.

Line 16 points to a developer-local path, which is non-portable and noisy in repo docs.

Proposed doc cleanup
-// See plan: ~/.claude/projects/.../plans/issue-161-oidc-validator.md
+// See repository docs/specs for scheduled-task auth design details.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/server/scheduledauth/errors.go` at line 16, The package-level
comment in internal/server/scheduledauth/errors.go contains a developer-local
filesystem reference
("~/.claude/projects/.../plans/issue-161-oidc-validator.md"); remove that local
path from the file header and replace it with a portable reference (e.g., a
short, repository-relative note like "See plans/issue-161-oidc-validator.md" or
remove the line entirely) so the package docs no longer include non-portable
developer-specific paths; update the top-of-file package comment where this path
appears (package docs / file comment) accordingly.
internal/server/http_test.go (1)

267-273: Prefer test assertions over panic in setup closures.

Using panic here makes failures noisier than needed. Pass *testing.T into setupApp and assert on constructor errors directly.

Refactor sketch
- setupApp       func(*Application)
+ setupApp       func(*testing.T, *Application)

- tt.setupApp(app)
+ tt.setupApp(t, app)

- setupApp: func(app *Application) {
+ setupApp: func(t *testing.T, app *Application) {
    v, err := scheduledauth.New(scheduledauth.Config{
      Mode:   scheduledauth.ModeBearer,
      Bearer: "my-secret",
    })
-   if err != nil {
-     panic(err)
-   }
+   testutil.AssertNoError(t, err)
    app.scheduledAuth = v
  },

Also applies to: 284-290

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/server/http_test.go` around lines 267 - 273, The test setup
currently panics on scheduledauth.New errors; change setupApp to accept
*testing.T and replace panic(err) with a test assertion (e.g., require.NoError
or t.Fatalf) when calling scheduledauth.New(scheduledauth.Config{Mode:
scheduledauth.ModeBearer, Bearer: "my-secret"}); update the other occurrence
(the 284-290 block) similarly so both scheduledauth.New calls use the test
helper and assert failures via the testing.T parameter rather than panicking.
internal/server/app.go (1)

83-84: Soften the “Always non-nil” claim in the field comment.

This is true for normal bootstrap, but not universally true in tests/manual construction, so the comment can be made more precise.

Comment tweak
- // Always non-nil — disabled mode allows everything through with a
+ // Non-nil for normal app bootstrap — disabled mode allows everything through with a
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/server/app.go` around lines 83 - 84, Update the field comment that
currently begins "Always non-nil — disabled mode allows everything through with
a loud WARN log..." to remove the absolute claim "Always non-nil" and instead
state that the field is normally non-nil in the bootstrapped server but may be
nil in tests or when manually constructed; keep the rest of the comment about
disabled mode and oidc (GCP) validation intact so readers know expected runtime
behavior and testing exceptions.
internal/server/scheduledauth/validator.go (1)

209-228: Give JWKS warmup its own deadline.

http.DefaultClient has no timeout, so Warmup can block startup indefinitely when callers pass a context without a deadline. Add a small fallback timeout before issuing the GET.

⏱ Suggested guard
func (v *Validator) Warmup(ctx context.Context) {
	if v.mode != ModeOIDC || v.jwksURL == "" {
		return
	}
+	if _, ok := ctx.Deadline(); !ok {
+		var cancel context.CancelFunc
+		ctx, cancel = context.WithTimeout(ctx, 5*time.Second)
+		defer cancel()
+	}
	req, err := http.NewRequestWithContext(ctx, http.MethodGet, v.jwksURL, nil)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/server/scheduledauth/validator.go` around lines 209 - 228, Warmup
can block indefinitely because http.DefaultClient has no timeout; update
Validator.Warmup to use a bounded request context or an http.Client with a small
Timeout when v.mode == ModeOIDC and v.jwksURL != "" — e.g., derive a child
context with a short deadline (context.WithTimeout) or create a local
http.Client{Timeout: ...} before calling Do, use that client to perform the GET
and ensure the context/cancel is released; keep the existing log/error handling
and defer resp.Body.Close() as-is.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/server/http.go`:
- Around line 116-120: The scheduledAuthMiddleware currently returns next when
app.scheduledAuth is nil which creates an auth bypass; change
scheduledAuthMiddleware to fail-closed: if app.scheduledAuth == nil return an
http.Handler that rejects requests (e.g., respond with 401/403 and short
message) instead of returning next, and keep the normal path returning
app.scheduledAuth.Middleware(next); update tests to inject an explicit disabled
validator stub where they intend no auth rather than relying on nil.

In `@internal/server/scheduledauth/config.go`:
- Around line 31-39: LoadConfig currently treats a missing/empty
EnvSource.Get(EnvAuthMode) as ModeDisabled which silently enables
unauthenticated endpoints; change LoadConfig to treat an empty/missing
EnvAuthMode as a configuration error (return ErrConfigInvalid) and only allow
ModeDisabled when the env value explicitly equals the trimmed, lowercased string
of ModeDisabled; update the validation in LoadConfig (around the mode :=
strings.ToLower(strings.TrimSpace(env.Get(EnvAuthMode))) logic) to return
ErrConfigInvalid when mode == "" and otherwise continue validating accepted
modes before returning Config.

In `@internal/server/scheduledauth/validator_test.go`:
- Around line 83-90: The /swap handler currently reads the request body with a
single Read which can return a short buffer and produce a truncated JWKS; change
the handler registered with mux.HandleFunc("/swap", ...) to read the entire body
(e.g. use io.ReadAll or io.ReadFull loop) and only assign the resulting full
byte slice to current after a successful full read, and add the io import if
needed.

In `@terraform/environments/gcp/compute.tf`:
- Around line 52-57: The comment referencing scheduled tasks currently implies
“no app-level auth on GCP”—replace the sentence that mentions "plaintext
scheduled_task_secret" and "Application-level bearer auth is dropped on GCP"
with a clarified note stating that Cloud Scheduler authenticates to Cloud Run
via OIDC tokens + roles/run.invoker and that application-level bearer checks are
not removed but are replaced/implemented as OIDC validation at the application
level (i.e., the app should validate the OIDC token rather than rely on a
plaintext scheduled_task_secret), while keeping the Azure note about retaining
bearer checks for Container Apps.

In `@terraform/modules/compute/gcp/cloud-run/main.tf`:
- Around line 60-64: The current expression sets scheduled_task_auth_mode to
"disabled" when both var.enable_scheduled_tasks and
var.enable_ri_exchange_schedule are false, leaving /api/scheduled/*
unauthenticated; change the logic so the default is an authenticated mode (e.g.,
"oidc" or another enforced mode used by your app) instead of "disabled". Update
the scheduled_task_auth_mode expression that references
var.enable_scheduled_tasks and var.enable_ri_exchange_schedule to return the
authenticated string (for example "oidc") when endpoints are present or when
absent by default, and only return "disabled" when an explicit new variable (or
a new explicit flag) requests disabling; locate the scheduled_task_auth_mode
assignment in the module to make this change.

In `@terraform/modules/compute/gcp/gke/main.tf`:
- Around line 22-32: Don't disable scheduled-task auth when the scheduler
feature flag is off: change scheduled_task_auth_mode so it does not become
"disabled" when var.enable_scheduled_tasks is false. Instead always set
scheduled_task_auth_mode = "oidc" (leave scheduled_task_oidc_audience and
scheduled_task_oidc_subject using try(...) as-is) so the ingress
/api/scheduled/* endpoints continue to require OIDC validation even if no
scheduler ServiceAccount exists.

---

Nitpick comments:
In `@internal/server/app.go`:
- Around line 83-84: Update the field comment that currently begins "Always
non-nil — disabled mode allows everything through with a loud WARN log..." to
remove the absolute claim "Always non-nil" and instead state that the field is
normally non-nil in the bootstrapped server but may be nil in tests or when
manually constructed; keep the rest of the comment about disabled mode and oidc
(GCP) validation intact so readers know expected runtime behavior and testing
exceptions.

In `@internal/server/http_test.go`:
- Around line 267-273: The test setup currently panics on scheduledauth.New
errors; change setupApp to accept *testing.T and replace panic(err) with a test
assertion (e.g., require.NoError or t.Fatalf) when calling
scheduledauth.New(scheduledauth.Config{Mode: scheduledauth.ModeBearer, Bearer:
"my-secret"}); update the other occurrence (the 284-290 block) similarly so both
scheduledauth.New calls use the test helper and assert failures via the
testing.T parameter rather than panicking.

In `@internal/server/scheduledauth/errors.go`:
- Line 16: The package-level comment in internal/server/scheduledauth/errors.go
contains a developer-local filesystem reference
("~/.claude/projects/.../plans/issue-161-oidc-validator.md"); remove that local
path from the file header and replace it with a portable reference (e.g., a
short, repository-relative note like "See plans/issue-161-oidc-validator.md" or
remove the line entirely) so the package docs no longer include non-portable
developer-specific paths; update the top-of-file package comment where this path
appears (package docs / file comment) accordingly.

In `@internal/server/scheduledauth/validator.go`:
- Around line 209-228: Warmup can block indefinitely because http.DefaultClient
has no timeout; update Validator.Warmup to use a bounded request context or an
http.Client with a small Timeout when v.mode == ModeOIDC and v.jwksURL != "" —
e.g., derive a child context with a short deadline (context.WithTimeout) or
create a local http.Client{Timeout: ...} before calling Do, use that client to
perform the GET and ensure the context/cancel is released; keep the existing
log/error handling and defer resp.Body.Close() as-is.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fc81e336-3f30-4193-b19d-d79acea7e54d

📥 Commits

Reviewing files that changed from the base of the PR and between 7267600 and b84291d.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (18)
  • go.mod
  • internal/server/app.go
  • internal/server/http.go
  • internal/server/http_test.go
  • internal/server/scheduledauth/config.go
  • internal/server/scheduledauth/errors.go
  • internal/server/scheduledauth/integration_test.go
  • internal/server/scheduledauth/validator.go
  • internal/server/scheduledauth/validator_test.go
  • terraform/environments/azure/dev.tfvars.example
  • terraform/environments/gcp/compute.tf
  • terraform/environments/gcp/dev.tfvars.example
  • terraform/modules/compute/azure/aks/main.tf
  • terraform/modules/compute/azure/container-apps/main.tf
  • terraform/modules/compute/gcp/cloud-run/main.tf
  • terraform/modules/compute/gcp/cloud-run/variables.tf
  • terraform/modules/compute/gcp/gke/main.tf
  • terraform/modules/compute/gcp/gke/variables.tf
💤 Files with no reviewable changes (2)
  • terraform/modules/compute/gcp/cloud-run/variables.tf
  • terraform/modules/compute/gcp/gke/variables.tf

Comment thread internal/server/http.go
Comment on lines +116 to +120
func (app *Application) scheduledAuthMiddleware(next http.Handler) http.Handler {
if app.scheduledAuth == nil {
return next
}
return app.scheduledAuth.Middleware(next)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fail closed when scheduled auth has not been initialized.

Returning next here makes every /api/scheduled/* route public whenever an Application is built outside the production constructor. That is an auth bypass on the exact path this PR is hardening; tests should inject an explicit disabled validator instead of relying on a nil pass-through.

🔒 Suggested fail-closed guard
func (app *Application) scheduledAuthMiddleware(next http.Handler) http.Handler {
	if app.scheduledAuth == nil {
-		return next
+		return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+			http.Error(w, "Scheduled task auth is not configured", http.StatusServiceUnavailable)
+		})
	}
	return app.scheduledAuth.Middleware(next)
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/server/http.go` around lines 116 - 120, The scheduledAuthMiddleware
currently returns next when app.scheduledAuth is nil which creates an auth
bypass; change scheduledAuthMiddleware to fail-closed: if app.scheduledAuth ==
nil return an http.Handler that rejects requests (e.g., respond with 401/403 and
short message) instead of returning next, and keep the normal path returning
app.scheduledAuth.Middleware(next); update tests to inject an explicit disabled
validator stub where they intend no auth rather than relying on nil.

Comment thread internal/server/scheduledauth/config.go Outdated
Comment thread internal/server/scheduledauth/validator_test.go
Comment thread terraform/environments/gcp/compute.tf Outdated
Comment thread terraform/modules/compute/gcp/cloud-run/main.tf
Comment thread terraform/modules/compute/gcp/gke/main.tf Outdated
cristim added a commit that referenced this pull request Apr 29, 2026
…oc fixes, test panics (#161)

Round 1 of CodeRabbit feedback. Triage:

Actionable — addressed:
- internal/server/scheduledauth/validator_test.go: /swap test handler
  read body via single Body.Read which can return a short read and
  truncate the JWKS payload — switched to io.ReadAll, mutex held only
  while assigning the buffer.
- terraform/environments/gcp/compute.tf: comment claimed "Application-
  level bearer auth is dropped on GCP" — that's stale now that we
  validate OIDC at the application level. Rewritten to describe the
  actual setup (OIDC at app + Cloud Run IAM gate as defence in depth)
  and to call out #78 separately.

Nitpicks — addressed:
- internal/server/scheduledauth/validator.go::Warmup: http.DefaultClient
  has no timeout, so a misconfigured JWKS URL could block startup
  indefinitely. Added a 5s fallback deadline when the caller's ctx has
  no deadline; respect any deadline the caller already set.
- internal/server/scheduledauth/errors.go: package doc referenced a
  developer-local plan path. Replaced with a portable reference to
  PR #161 / issue #159.
- internal/server/http_test.go: setupApp closures called panic(err) on
  scheduledauth.New failure. Switched the signature to take *testing.T
  and use testutil.AssertNoError, scoped via a small newBearerValidator
  helper to keep the table-driven cases tight.
- internal/server/app.go: scheduledAuth field comment claimed "Always
  non-nil" which is true post-NewApplicationFromDeps but not for
  hand-built test Applications. Rewritten to spell out both paths.

Actionable — dismissed with justification (will reply on PR thread):
- "scheduledAuthMiddleware should fail-closed when nil": intentional
  test affordance; production path always populates the field via
  NewApplicationFromDeps.
- "LoadConfig should reject empty SCHEDULED_TASK_AUTH_MODE": local-dev
  default-to-disabled+WARN is by design (plan §1).
- "scheduled_task_auth_mode should default to oidc when both schedulers
  off": oidc mode requires non-empty audiences/subjects (validator
  fails fast otherwise), so disabled is the only correct value when
  no scheduler is configured. Same reasoning for cloud-run and gke.
@cristim
Copy link
Copy Markdown
Member Author

cristim commented Apr 29, 2026

Pass 1 of CR feedback addressed in 084c755.

Actionable — addressed:

Nitpicks — addressed:

  • validator.go::Warmup: added a 5s fallback deadline when the caller's context has none. http.DefaultClient has no timeout, so a misconfigured JWKS URL would otherwise block startup indefinitely.
  • errors.go: dropped the developer-local plan path; replaced with a portable PR / issue reference.
  • http_test.go: reworked setupApp to take *testing.T and use testutil.AssertNoError instead of panic(err). Extracted newBearerValidator to keep the table tight.
  • app.go: tightened the scheduledAuth field comment — it's "non-nil after NewApplicationFromDeps", not "always non-nil" (tests can build Application directly).

Actionable — dismissed with justification:

  1. "scheduledAuthMiddleware should fail-closed when nil" — intentional test affordance. In production, NewApplicationFromDeps always populates scheduledAuth (and that path is on every server boot path: cmd/server, cmd/lambda). The nil branch only triggers when tests build Application{} directly without the constructor — they'd be testing the handler in isolation, with the middleware itself covered by scheduledauth's own unit + integration tests. Failing closed there would force every handler-level test to construct a stub validator, with no security benefit (because the production path can't reach this branch).

  2. "LoadConfig should reject empty SCHEDULED_TASK_AUTH_MODE" — disabled+WARN is the deliberate local-dev default per the design plan. Local dev (no env vars set) shouldn't fail-fast on a missing security config; the loud per-startup + per-request WARN log makes it impossible to ship to production without explicitly opting in. Failing fast here would force every local-dev contributor to set the env var, with no production benefit (Terraform sets it explicitly on every prod deploy).

  3. "scheduled_task_auth_mode should default to "oidc" when both schedulers off" (cloud-run and gke) — would cause the validator to fail-fast at construction: oidc mode requires non-empty audiences and subjects, both of which become empty when no scheduler is enabled. The current "disabled when no scheduler" flow matches the original behaviour (no scheduler → /api/scheduled/* is unreached + harmless WARN). If the user enables a scheduler later, terraform apply re-flips the mode.

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 29, 2026

@cristim Thanks for the thorough write-up on Pass 1. The addressed items look good, and the justifications for the dismissed items are all sound:

  1. nil middleware pass-through — Agreed. The production boot path via NewApplicationFromDeps guarantees a non-nil scheduledAuth; the nil branch is exclusively a test-isolation affordance with no reachable production path.
  2. disabled+WARN default for empty mode — Agreed. The per-request WARN log acts as a loud signal that prevents silent unguarded deployment; forcing fail-fast on missing config would impose friction on every local-dev contributor for no production gain given Terraform's explicit mode wiring.
  3. oidc default when no scheduler is enabled — Agreed. oidc mode requires non-empty audiences and subjects; defaulting to oidc with empty lists would fail-fast construction for a legitimately inert configuration. The current disabled → re-flip on enable lifecycle is the correct Terraform-idiomatic approach.

Triggering a fresh review against the updated commits now.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/server/app.go`:
- Around line 295-308: buildScheduledAuth currently treats a missing
SCHEDULED_TASK_AUTH_MODE as "disabled" and ends up constructing a permissive
validator; change buildScheduledAuth to detect when LoadConfig returns a
missing/empty mode (or ModeDisabled) and return a non-nil error instead of
continuing, so the process fails startup on missing auth wiring. Specifically,
after scheduledauth.LoadConfig(envSourceOS{}) validate that saCfg.Mode is set
and not the default disabled sentinel (check against the empty value or
scheduledauth.ModeDisabled) before applying the ModeBearer secret override and
calling scheduledauth.New(saCfg); if the mode is missing/disabled return an
error explaining SCHEDULED_TASK_AUTH_MODE must be configured.

In `@internal/server/scheduledauth/validator.go`:
- Around line 138-142: The comment above the oidc subject check incorrectly says
`sub` pins the scheduler service account email; update that comment to state
that `sub` must be the service-account unique_id (not the email) and clarify
that SCHEDULED_TASK_OIDC_SUBJECTS expects those unique_ids; keep the existing
enforcement logic (the cfg.Subjects check and ErrConfigInvalid usage) unchanged
and only correct the explanatory text referencing cfg.Subjects and
SCHEDULED_TASK_OIDC_SUBJECTS so future configs/tests use the service-account
unique_id.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 06d5098c-4db0-42a1-9eca-ab7dc71a35b1

📥 Commits

Reviewing files that changed from the base of the PR and between b84291d and 084c755.

📒 Files selected for processing (6)
  • internal/server/app.go
  • internal/server/http_test.go
  • internal/server/scheduledauth/errors.go
  • internal/server/scheduledauth/validator.go
  • internal/server/scheduledauth/validator_test.go
  • terraform/environments/gcp/compute.tf
🚧 Files skipped from review as they are similar to previous changes (1)
  • internal/server/http_test.go

Comment thread internal/server/app.go
Comment on lines +295 to +308
func buildScheduledAuth(cfg ApplicationConfig) (*scheduledauth.Validator, error) {
saCfg, err := scheduledauth.LoadConfig(envSourceOS{})
if err != nil {
return nil, err
}
// In bearer mode, override the env-supplied secret with the one
// already resolved from KV / SM. LoadConfig reads SCHEDULED_TASK_SECRET
// directly which is fine for local dev where the env carries the
// plaintext, but in prod cfg.ScheduledTaskSecret is the authoritative
// value (resolved by resolveScheduledTaskSecret upstream).
if saCfg.Mode == scheduledauth.ModeBearer {
saCfg.Bearer = cfg.ScheduledTaskSecret
}
return scheduledauth.New(saCfg)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't fail open when SCHEDULED_TASK_AUTH_MODE is missing.

buildScheduledAuth goes through scheduledauth.LoadConfig, which currently maps an unset mode to disabled. In the production constructor, that turns a missed Terraform/env wiring into a live validator that allows every /api/scheduled/* request instead of failing startup, which is the exact regression this PR is trying to prevent.

🔒 Proposed guard
 func buildScheduledAuth(cfg ApplicationConfig) (*scheduledauth.Validator, error) {
+	if strings.TrimSpace(os.Getenv(scheduledauth.EnvAuthMode)) == "" {
+		return nil, fmt.Errorf(
+			"%w: %s must be explicitly set (use disabled only for local dev)",
+			scheduledauth.ErrConfigInvalid,
+			scheduledauth.EnvAuthMode,
+		)
+	}
 	saCfg, err := scheduledauth.LoadConfig(envSourceOS{})
 	if err != nil {
 		return nil, err
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func buildScheduledAuth(cfg ApplicationConfig) (*scheduledauth.Validator, error) {
saCfg, err := scheduledauth.LoadConfig(envSourceOS{})
if err != nil {
return nil, err
}
// In bearer mode, override the env-supplied secret with the one
// already resolved from KV / SM. LoadConfig reads SCHEDULED_TASK_SECRET
// directly which is fine for local dev where the env carries the
// plaintext, but in prod cfg.ScheduledTaskSecret is the authoritative
// value (resolved by resolveScheduledTaskSecret upstream).
if saCfg.Mode == scheduledauth.ModeBearer {
saCfg.Bearer = cfg.ScheduledTaskSecret
}
return scheduledauth.New(saCfg)
func buildScheduledAuth(cfg ApplicationConfig) (*scheduledauth.Validator, error) {
if strings.TrimSpace(os.Getenv(scheduledauth.EnvAuthMode)) == "" {
return nil, fmt.Errorf(
"%w: %s must be explicitly set (use disabled only for local dev)",
scheduledauth.ErrConfigInvalid,
scheduledauth.EnvAuthMode,
)
}
saCfg, err := scheduledauth.LoadConfig(envSourceOS{})
if err != nil {
return nil, err
}
// In bearer mode, override the env-supplied secret with the one
// already resolved from KV / SM. LoadConfig reads SCHEDULED_TASK_SECRET
// directly which is fine for local dev where the env carries the
// plaintext, but in prod cfg.ScheduledTaskSecret is the authoritative
// value (resolved by resolveScheduledTaskSecret upstream).
if saCfg.Mode == scheduledauth.ModeBearer {
saCfg.Bearer = cfg.ScheduledTaskSecret
}
return scheduledauth.New(saCfg)
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/server/app.go` around lines 295 - 308, buildScheduledAuth currently
treats a missing SCHEDULED_TASK_AUTH_MODE as "disabled" and ends up constructing
a permissive validator; change buildScheduledAuth to detect when LoadConfig
returns a missing/empty mode (or ModeDisabled) and return a non-nil error
instead of continuing, so the process fails startup on missing auth wiring.
Specifically, after scheduledauth.LoadConfig(envSourceOS{}) validate that
saCfg.Mode is set and not the default disabled sentinel (check against the empty
value or scheduledauth.ModeDisabled) before applying the ModeBearer secret
override and calling scheduledauth.New(saCfg); if the mode is missing/disabled
return an error explaining SCHEDULED_TASK_AUTH_MODE must be configured.

Comment thread internal/server/scheduledauth/validator.go
@cristim
Copy link
Copy Markdown
Member Author

cristim commented Apr 29, 2026

Addressed the latest CodeRabbit subject-claim wording finding in commit 1442231: the OIDC validator comment now explicitly says SCHEDULED_TASK_OIDC_SUBJECTS expects service-account unique IDs, not emails.\n\nI am intentionally not changing buildScheduledAuth to fail on disabled/missing mode in this pass. That would reverse the already-discussed local-dev/no-scheduler design: Terraform currently emits disabled when no scheduler resources exist, and prior review discussion accepted the explicit disabled+WARN behavior. Production scheduler paths still set oidc/bearer explicitly and fail fast on invalid OIDC/bearer config.\n\n@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 29, 2026

@cristim: Acknowledged on both points.

  • SCHEDULED_TASK_OIDC_SUBJECTS wording fix (commit 14422312f) — Correct call. The JWT sub claim for a GCP service account is always the numeric unique ID, not the email (email goes in the email claim). Documenting this explicitly in the comment prevents a footgun where an operator might paste the SA email and wonder why every token is rejected.

  • buildScheduledAuth fail-fast on disabled/missing mode (dismissed) — Justification holds and is consistent with the previous pass discussion. Terraform-managed deployments always set the mode explicitly; the disabled+WARN path is exclusively the local-dev/no-scheduler codepath, and the per-request WARN log acts as a sufficient signal without penalising contributors who haven't set the env var locally.

Triggering a fresh incremental review against the updated commits now.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
internal/server/scheduledauth/validator.go (1)

231-246: Warmup should validate the payload shape, not just reachability.

Right now any 2xx/3xx response is treated as healthy. If the URL is mispointed to a page that returns 200 OK, startup logs will still look green and the failure shifts to the first scheduled request. Decoding a minimal JWKS shape ({"keys":[...]}) here would catch that class of misconfiguration earlier.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/server/scheduledauth/validator.go` around lines 231 - 246, The
warmup currently only checks HTTP reachability; instead, after receiving resp
for the JWKS fetch (resp variable from http.DefaultClient.Do for v.jwksURL),
decode the response body as JSON and validate it has a top-level "keys" field
that is a JSON array (use json.Decoder or json.Unmarshal and assert the "keys"
value is a slice/array and non-nil); if decoding fails or the shape is wrong log
a WARN including v.jwksURL and the decode/validation error and return so the
validator will retry later.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/server/scheduledauth/validator.go`:
- Around line 132-160: Add a fail-fast validation for the JWKS URL during
validator construction: implement a helper validateAbsoluteURL(raw, label
string) error that parses the URL (using url.Parse) and returns ErrConfigInvalid
if parsing fails or scheme/host are empty, call validateAbsoluteURL(cfg.JWKSURL,
"SCHEDULED_TASK_OIDC_JWKS_URL") before assigning v.jwksURL or creating v.keySet,
and return the validation error so malformed JWKS URLs abort startup instead of
deferring to oidc.NewRemoteKeySet.

---

Nitpick comments:
In `@internal/server/scheduledauth/validator.go`:
- Around line 231-246: The warmup currently only checks HTTP reachability;
instead, after receiving resp for the JWKS fetch (resp variable from
http.DefaultClient.Do for v.jwksURL), decode the response body as JSON and
validate it has a top-level "keys" field that is a JSON array (use json.Decoder
or json.Unmarshal and assert the "keys" value is a slice/array and non-nil); if
decoding fails or the shape is wrong log a WARN including v.jwksURL and the
decode/validation error and return so the validator will retry later.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a67e1bcb-251b-4041-bded-7f4e89b43b3d

📥 Commits

Reviewing files that changed from the base of the PR and between 084c755 and 1442231.

📒 Files selected for processing (1)
  • internal/server/scheduledauth/validator.go

Comment thread internal/server/scheduledauth/validator.go
@cristim
Copy link
Copy Markdown
Member Author

cristim commented Apr 29, 2026

Addressed the latest JWKS validation comments: OIDC validator construction now rejects malformed/non-absolute JWKS URLs, and warmup now validates the fetched payload has a top-level keys array before considering the probe healthy. Added tests for malformed JWKS URLs and JWKS payload shape validation. Verification: go test ./internal/server/scheduledauth ./internal/server; pre-commit Go hooks passed. @coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 29, 2026

@cristim: Acknowledged — Pass 4 is a strong set of changes. Here's a quick assessment before the fresh review runs:

Major fixes — both sound:

  1. LoadConfig fail-closed on unset mode — This is the right call. Flipping the unset-mode behaviour from disabled+WARN to ErrConfigInvalid closes the gap where a missed Terraform/K8s env wiring would silently boot /api/scheduled/* unauthenticated. "disabled" now requires an explicit opt-in, which is the correct security posture for a production-grade service.

  2. GKE scheduled_task_auth_mode derivation + var.scheduled_task_auth_mode_override — The length(google_service_account.scheduler) > 0 ? "oidc" : var.scheduled_task_auth_mode_override pattern with a fail-closed "oidc" default and validation block is exactly the right Terraform-idiomatic approach. Disabling the scheduler no longer implicitly opens an unauthenticated endpoint; a deliberate "disabled" override is required, which will be visible in any code review of the tfvars.

Nit addressed — testSchedulerSubject as 21-digit numeric:
Correct production contract. GCP service account ID tokens carry the SA's numeric unique ID in sub, not the email (email appears in the email claim). The test suite now exercises the real-world claim shape.

Dismissed nit — Azure cron string:
Justification holds. The cron string is the module's input contract; split(" ", var.recommendation_schedule)[1] at scheduled-tasks.tf:23 parses it at the Terraform layer. Switching the example to frequency/interval would silently break every existing .tfvars file that feeds the same variable. Leaving it as-is is correct.

TestNewApplicationFromDepsSCHEDULED_TASK_AUTH_MODE=disabled:
Correct test-infra fix. The explicit env var is now required given the fail-closed LoadConfig, and setting it to "disabled" is the appropriate test affordance for a unit test that isn't exercising the auth subsystem.

Triggering a fresh incremental review against commit 97253fd9d now.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

…esidual)

Commit 97253fd introduced testSchedulerSubject and converted the
positive-path scheduler subjects in validator_test.go to it, but
left two clusters of email-shaped fixtures behind:

  - validator_test.go:282 — the "wrong subject" rejection test still
    asserted against `attacker@example.iam.gserviceaccount.com`.
  - integration_test.go (lines 36, 71, 82, 99, 110) — every subject
    in the integration suite was still email-shaped.

The CR nitpick was not just about the positive scheduler fixture:
the broader concern is that production wiring pins
SCHEDULED_TASK_OIDC_SUBJECTS to GCP service-account unique_ids
(21-digit numeric strings), and a regression that silently widened
the validator to accept any string would still pass tests that
asserted against email-shaped values — both negative and positive
paths.

This commit:

  1. Adds testNonSchedulerSubject ("999888777666555444333") next to
     the existing testSchedulerSubject. Same shape, distinct numeric
     so failures are visually traceable.
  2. Replaces every remaining email-shaped subject across both test
     files with the appropriate constant.

After this change `grep -E "scheduler@|attacker@|random@|gserviceaccount"`
finds zero hits in `internal/server/scheduledauth/`.

Tests: `go test ./internal/server/scheduledauth/...` clean;
`go test ./...` clean across the full root suite (no other
package referenced the email fixtures).
@cristim
Copy link
Copy Markdown
Member Author

cristim commented Apr 29, 2026

Finished the unique_id fixture migration that 97253fd started: testNonSchedulerSubject covers the rejection-path subjects, and every email-shaped fixture in both validator_test.go (the leftover at line 282) and integration_test.go (5 sites) is now a 21-digit numeric to match what production wiring pins SCHEDULED_TASK_OIDC_SUBJECTS to. grep -E 'scheduler@|attacker@|random@|gserviceaccount' in internal/server/scheduledauth/ is now empty.

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 29, 2026

@cristim: Acknowledged — the fixture migration is now complete and consistent across both test files:

  • validator_test.go line 282 leftover — replaced with 21-digit numeric, so the full unit suite (both happy-path and rejection-path subjects) now exercises the real GCP SA sub contract.
  • integration_test.go (5 sites) — same alignment; the full HTTP roundtrip integration suite no longer carries any email-shaped subjects that would diverge from what SCHEDULED_TASK_OIDC_SUBJECTS pins in production.
  • testNonSchedulerSubject — good explicit name for the rejection-path fixture; makes the intent of the negative test cases self-documenting.

The grep clean result confirms no stragglers remain. Triggering a fresh incremental review now.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/server/scheduledauth/validator_test.go`:
- Around line 409-416: The /swap POST request in the test (constructed as swap
:= http.NewRequest(..., srv.URL+"/swap", strings.NewReader(string(jwks(keyB))))
and executed via http.DefaultClient.Do(swap)) should be asserted for success
immediately after the call: check resp.StatusCode is 200 (or expected status)
and fail the test with a helpful message including the response body when it's
not, so a failed swap surfaces as a clear test failure instead of later causing
token-validation errors; read resp.Body into a string for the failure message
before closing it.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: dce4a0c8-fa37-4a70-a230-cb0b2fe42265

📥 Commits

Reviewing files that changed from the base of the PR and between 6f22b9d and 97253fd.

📒 Files selected for processing (5)
  • internal/server/app_test.go
  • internal/server/scheduledauth/config.go
  • internal/server/scheduledauth/validator_test.go
  • terraform/modules/compute/gcp/gke/main.tf
  • terraform/modules/compute/gcp/gke/variables.tf

Comment thread internal/server/scheduledauth/validator_test.go Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
terraform/modules/compute/gcp/gke/main.tf (1)

36-37: ⚡ Quick win

Bind aud to the receiving endpoint, not the scheduler service account.

sub is already pinned to the scheduler SA unique_id, so using the same SA identity again for aud makes both checks describe the caller. Setting aud to the scheduled endpoint URI keeps the token recipient-bound and reduces cross-service replay if this SA is ever reused.

♻️ Suggested change
-  scheduled_task_oidc_audience = try(google_service_account.scheduler[0].email, "")
+  scheduled_task_oidc_audience = "${var.app_url}/api/scheduled/recommendations"
...
     oidc_token {
       service_account_email = google_service_account.scheduler[0].email
-      audience              = google_service_account.scheduler[0].email
+      audience              = local.scheduled_task_oidc_audience
     }

Also applies to: 480-483, 714-727

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@terraform/modules/compute/gcp/gke/main.tf` around lines 36 - 37, The audience
claim is currently set to the scheduler service account email
(scheduled_task_oidc_audience = try(google_service_account.scheduler[0].email,
"")) which makes both aud and sub refer to the caller; change
scheduled_task_oidc_audience to the receiving endpoint's URI (e.g., use the
variable or resource that represents the scheduled endpoint URL) so aud is the
endpoint and keep scheduled_task_oidc_subject =
try(google_service_account.scheduler[0].unique_id, "") as the caller identity;
apply the same replacement for the other occurrences noted (around the other
blocks at lines you referenced).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@terraform/modules/compute/gcp/gke/main.tf`:
- Around line 36-37: The audience claim is currently set to the scheduler
service account email (scheduled_task_oidc_audience =
try(google_service_account.scheduler[0].email, "")) which makes both aud and sub
refer to the caller; change scheduled_task_oidc_audience to the receiving
endpoint's URI (e.g., use the variable or resource that represents the scheduled
endpoint URL) so aud is the endpoint and keep scheduled_task_oidc_subject =
try(google_service_account.scheduler[0].unique_id, "") as the caller identity;
apply the same replacement for the other occurrences noted (around the other
blocks at lines you referenced).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c38c1dac-b84e-4f73-9612-15dfb7fdc518

📥 Commits

Reviewing files that changed from the base of the PR and between 6f22b9d and 5193227.

📒 Files selected for processing (6)
  • internal/server/app_test.go
  • internal/server/scheduledauth/config.go
  • internal/server/scheduledauth/integration_test.go
  • internal/server/scheduledauth/validator_test.go
  • terraform/modules/compute/gcp/gke/main.tf
  • terraform/modules/compute/gcp/gke/variables.tf
🚧 Files skipped from review as they are similar to previous changes (3)
  • internal/server/app_test.go
  • terraform/modules/compute/gcp/gke/variables.tf
  • internal/server/scheduledauth/integration_test.go

Addresses CodeRabbit's actionable on validator_test.go:418-424:
the JWKS-rotation test fired POST /swap and only checked the
network-error path. If the test server's /swap handler 5xx'd, or
short-read the body and returned non-200 silently, the JWKS would
not actually swap and the test would later fail at "unknown kid"
— pointing at the wrong layer.

Fixes:
  - Hoist `jwks(keyB)` into a local so we don't compute it three
    times for body+ContentLength+failure-message symmetry.
  - Surface NewRequest's error instead of silently dropping it
    via `_, _ := http.NewRequest(...)`.
  - Use `http.MethodPost` + `http.StatusOK` instead of stringly-
    typed literals.
  - On non-200, read the response body and t.Fatalf with both
    status and body so the failure points at the swap server, not
    the downstream validator.

Tests: TestValidate_OIDC_KeyRotation_RefreshOnUnknownKid passes;
`go test ./...` clean across the full root suite.
@cristim
Copy link
Copy Markdown
Member Author

cristim commented Apr 29, 2026

Pass: addressed the actionable on validator_test.go:418-424 — the JWKS-rotation /swap POST now (a) propagates http.NewRequest's error, (b) asserts resp.StatusCode == 200, and (c) on non-200 reads the body and t.Fatalfs with both status and body so a swap-handler regression points at the swap, not the downstream validator. Full suite clean.

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 29, 2026

@cristim: Acknowledged — all three sub-items on the /swap hardening are the right defensive improvements:

  1. http.NewRequest error propagation — previously a request-construction failure would be silently swallowed, making the test non-deterministically corrupt without a clear signal.
  2. resp.StatusCode == 200 assertion — catches swap-handler regressions before they produce misleading downstream validator failures.
  3. Body-inclusive t.Fatalf on non-200 — makes the failure attribution unambiguous: the message now points at the swap step, not at whatever token validation error surfaces later.

Triggering a fresh incremental review against the updated commits now.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
internal/server/scheduledauth/validator_test.go (1)

750-766: ⚡ Quick win

Assert the warmup failure path, not just the timeout.

Right now this still passes if Warmup returns early without attempting the fetch. Please make the failure observable here as well—e.g. assert the warning log or another side effect of the failed JWKS fetch—so this test protects the non-fatal error branch instead of only “did not hang”.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/server/scheduledauth/validator_test.go` around lines 750 - 766, The
test currently only ensures Warmup does not hang but doesn't assert Warmup
actually attempted the JWKS fetch; modify
TestWarmup_LoggedAndNonFatal_OnDeadEndpoint to capture and assert the warning
log produced by Warmup on fetch failure: create or inject a test logger/ buffer
before calling New (or replace v's logger after New if New returns a validator
with a logger field), call v.Warmup(ctx) as before, then read the captured logs
and assert they contain a warning/error message referencing the JWKSURL or JWKS
fetch failure (or the error string like ECONNREFUSED) to prove the failure path
was exercised rather than returning early. Ensure you still use the short
context timeout while asserting the log contents after Warmup returns.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@internal/server/scheduledauth/validator_test.go`:
- Around line 750-766: The test currently only ensures Warmup does not hang but
doesn't assert Warmup actually attempted the JWKS fetch; modify
TestWarmup_LoggedAndNonFatal_OnDeadEndpoint to capture and assert the warning
log produced by Warmup on fetch failure: create or inject a test logger/ buffer
before calling New (or replace v's logger after New if New returns a validator
with a logger field), call v.Warmup(ctx) as before, then read the captured logs
and assert they contain a warning/error message referencing the JWKSURL or JWKS
fetch failure (or the error string like ECONNREFUSED) to prove the failure path
was exercised rather than returning early. Ensure you still use the short
context timeout while asserting the log contents after Warmup returns.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8c1256af-2505-46fb-a8f6-925a8850dc39

📥 Commits

Reviewing files that changed from the base of the PR and between 5193227 and e993d67.

📒 Files selected for processing (1)
  • internal/server/scheduledauth/validator_test.go

cristim added 2 commits April 30, 2026 01:26
…A (CR pass)

Addresses CodeRabbit's nitpick on `gke/main.tf:36-37` (also-applies
to :480-483 and :714-727), and applies the same convention to the
sibling `cloud-run/main.tf` so the two GCP compute modules stay
aligned mid-PR.

Old shape (both modules):

    audience = google_service_account.scheduler[0].email   # JWT claim
    SCHEDULED_TASK_OIDC_AUDIENCE = same SA email

`sub` already pins the caller to the scheduler SA's unique_id, so
using the same SA identity for `aud` made both checks describe the
caller. If the scheduler SA were ever reused for a different
service, an OIDC token minted for service A could be replayed
against service B because both services would have the same
audience in their allow-list.

New shape (audience names the receiving endpoint, not the caller):

    GKE:
      audience = ${var.app_url}/api/scheduled/recommendations
    Cloud Run:
      audience = ${var.service_name}/api/scheduled/<endpoint>

The validator side already supports comma-separated audiences via
`cleanSet` / `audienceMatches` (no app-side change needed). Both
sides read the same local, so the JWT claim and the env-var match
by construction.

Why service_name (not URL) on Cloud Run:
  Using `google_cloud_run_v2_service.main.uri` produces a Terraform
  cycle — the service's env vars include the audience whitelist,
  which would then depend on the service's own URI. `service_name`
  is a module input, fully resolvable at plan time, and any
  unique-per-(service, endpoint) string satisfies the OIDC
  recipient-bound contract — the validator only does string equality
  against the env-var value, which is built from the same local.

GKE has no such cycle (kubernetes_deployment doesn't read its own
URL into env), so it uses `var.app_url` per CR's literal suggestion.

Per-module specifics:

* gke/main.tf — single endpoint; one local, three callsites
  (the local + the env var + the Cloud Scheduler oidc_token block)
  all read from `local.scheduled_task_oidc_audience`.
* cloud-run/main.tf — two endpoints (recommendations, ri-exchange);
  new per-schedule locals
  `scheduled_task_recommendations_audience` and
  `scheduled_task_ri_exchange_audience` build from
  `var.service_name`. The CSV `oidc_audiences` local that flows
  into the env-var compacts the two enabled values. Each Cloud
  Scheduler oidc_token reads its matching local.

Verified: `terraform validate` clean on both modules; pre-commit
hooks (`terraform_fmt`, `terraform_validate`, `tflint`) pass;
`go test ./...` clean across the full root suite.
Addresses CodeRabbit's nitpick on `validator_test.go:750-766`:
`TestWarmup_LoggedAndNonFatal_OnDeadEndpoint` previously only
asserted that Warmup didn't hang. The test would still pass if a
future regression made Warmup short-circuit *before* the JWKS fetch
ever ran — exactly the failure mode the test exists to guard
against ("non-fatal, but actually exercises the failure branch").

This commit captures the global standard logger to a `bytes.Buffer`
for the duration of the test, calls Warmup with a guaranteed-bad
JWKS URL (127.0.0.1:1, ECONNREFUSED), and asserts the captured
output contains "JWKS warmup". That string is shared by every
warmup-failure log line in `validator.go` (request build, fetch,
non-200 status, invalid payload), so a tweak to any of those
messages won't break this test for cosmetic reasons — but a
regression that skips the fetch entirely will.

Safety: no test in `internal/server/scheduledauth/` uses
`t.Parallel()`, so swapping the global `log` writer is race-free.
`t.Cleanup` restores the original writer.

Tests: `go test ./internal/server/scheduledauth/... -run Warmup`
shows the new assertion firing on success;
`go test ./...` clean across the full root suite.
@cristim
Copy link
Copy Markdown
Member Author

cristim commented Apr 29, 2026

Pass: addressed both nitpicks from the most recent reviews.

  1. aud claim → endpoint, not SA (gke + cloud-run): sub already pins the caller via SA unique_id, so reusing the SA email for aud made both checks describe the caller. Now aud names the recipient — ${var.app_url}/api/scheduled/recommendations on GKE, ${var.service_name}/api/scheduled/<endpoint> on Cloud Run. (Cloud Run can't use the service URL because it'd cycle back through the env vars; the service_name-prefixed path satisfies the recipient-bound contract since the validator only does string equality against the env-var built from the same local.) Applied to both modules so they don't drift mid-PR.

  2. Warmup test asserts the failure-path log: TestWarmup_LoggedAndNonFatal_OnDeadEndpoint now captures the global log writer and asserts JWKS warmup appears in the output, so a regression that short-circuits before the fetch fails the test instead of silently passing.

terraform validate clean on both modules; go test ./... clean.

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 29, 2026

@cristim: Acknowledged — both fixes are sound and close the remaining gaps cleanly.

aud → endpoint, not SA (GKE + Cloud Run)

This is the semantically correct fix per RFC 7519 §4.1.3: aud identifies the intended recipient of the token, not the issuer. With the previous wiring, both sub (SA unique_id) and aud (SA email) described the caller, leaving the recipient dimension of the token completely unvalidated — an attacker who obtained a scheduler-signed token for a different CUDly environment could replay it. Pinning aud to the endpoint URL (${var.app_url}/api/scheduled/recommendations on GKE, ${var.service_name}/api/scheduled/<endpoint> on Cloud Run) makes the token environment-and-route-bound. The Cloud Run substitution with service_name-prefixed path is a reasonable workaround for the circular URL dependency, and since both the scheduler wiring and the validator env var read from the same local, string equality holds correctly.

Warmup log assertion

Capturing the global log writer and asserting JWKS warmup appears in output is the right defensive move. Previously a regression that panicked or short-circuited before the HTTP fetch would pass the test silently; now it fails with a clear signal pointing at the warmup path rather than a downstream validator error.

Triggering a fresh incremental review now.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
terraform/modules/compute/gcp/gke/main.tf (2)

22-41: ⚡ Quick win

Clean up the stale audience explanation in this comment block.

The first paragraph still describes the old SA-email audience model, while the later paragraph and Line 42 correctly describe the new endpoint-bound audience. In this auth-critical section, keeping both stories in the same comment is misleading.

Suggested cleanup
-  # Scheduled-task OIDC validator wiring — see
-  # internal/server/scheduledauth. Audience pinned to the scheduler SA
-  # email (matches the explicit `audience` on the oidc_token block);
-  # subject pinned to the SA's unique numeric ID (the value Google puts
+  # Scheduled-task OIDC validator wiring — see
+  # internal/server/scheduledauth. Audience pinned to the receiving
+  # endpoint URL; subject pinned to the scheduler SA's unique numeric ID
+  # (the value Google puts
   # in the JWT `sub` claim for SA-signed ID tokens).
@@
-  # OAuth/OIDC convention. (CodeRabbit nitpick on PR `#161`.)
+  # OAuth/OIDC convention.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@terraform/modules/compute/gcp/gke/main.tf` around lines 22 - 41, The comment
block describing scheduled-task OIDC validator contains stale text about an
SA-email-based audience; update it to only describe the endpoint-bound audience
model and remove the old SA-email audience paragraph so the explanation is
consistent: keep the notes about auth mode being derived from the scheduler SA
presence (referencing var.enable_scheduled_tasks and
var.scheduled_task_auth_mode_override), retain the explanation that `sub` pins
the scheduler SA unique_id, and explicitly state that `aud` is the receiving
endpoint URL (not the SA identity) per the oidc_token usage and
kubernetes_ingress_v1.app behavior; delete the misleading lines describing the
old SA-email audience model and ensure the remaining text clearly ties audience
-> endpoint and subject -> SA unique_id.

718-735: ⚡ Quick win

Reuse the shared local for uri as well.

The comment says the audience and target match “by construction”, but Line 718 still duplicates the endpoint string separately. If one side changes without the other, Scheduler will mint tokens for the wrong aud and this job will start failing auth.

Suggested change
   http_target {
     http_method = "POST"
-    uri         = "${var.app_url}/api/scheduled/recommendations"
+    uri         = local.scheduled_task_oidc_audience
@@
     oidc_token {
       service_account_email = google_service_account.scheduler[0].email
       audience              = local.scheduled_task_oidc_audience
     }
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@terraform/modules/compute/gcp/gke/main.tf` around lines 718 - 735, Replace
the hard-coded uri assignment so the Scheduler's target URL is drawn from the
shared local instead of duplicating the string: change the line setting uri =
"${var.app_url}/api/scheduled/recommendations" to use
local.scheduled_task_oidc_audience (or the existing local that holds the
canonical scheduled endpoint), ensuring the uri and the oidc_token.audience
(local.scheduled_task_oidc_audience) remain identical; update the resource block
containing uri and the oidc_token { service_account_email =
google_service_account.scheduler[0].email ... } accordingly so the audience and
target stay in sync.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@terraform/modules/compute/gcp/gke/main.tf`:
- Around line 22-41: The comment block describing scheduled-task OIDC validator
contains stale text about an SA-email-based audience; update it to only describe
the endpoint-bound audience model and remove the old SA-email audience paragraph
so the explanation is consistent: keep the notes about auth mode being derived
from the scheduler SA presence (referencing var.enable_scheduled_tasks and
var.scheduled_task_auth_mode_override), retain the explanation that `sub` pins
the scheduler SA unique_id, and explicitly state that `aud` is the receiving
endpoint URL (not the SA identity) per the oidc_token usage and
kubernetes_ingress_v1.app behavior; delete the misleading lines describing the
old SA-email audience model and ensure the remaining text clearly ties audience
-> endpoint and subject -> SA unique_id.
- Around line 718-735: Replace the hard-coded uri assignment so the Scheduler's
target URL is drawn from the shared local instead of duplicating the string:
change the line setting uri = "${var.app_url}/api/scheduled/recommendations" to
use local.scheduled_task_oidc_audience (or the existing local that holds the
canonical scheduled endpoint), ensuring the uri and the oidc_token.audience
(local.scheduled_task_oidc_audience) remain identical; update the resource block
containing uri and the oidc_token { service_account_email =
google_service_account.scheduler[0].email ... } accordingly so the audience and
target stay in sync.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6d152ee8-6227-48ca-bd84-a52fa11e358b

📥 Commits

Reviewing files that changed from the base of the PR and between e993d67 and 171eea5.

📒 Files selected for processing (3)
  • internal/server/scheduledauth/validator_test.go
  • terraform/modules/compute/gcp/cloud-run/main.tf
  • terraform/modules/compute/gcp/gke/main.tf
🚧 Files skipped from review as they are similar to previous changes (2)
  • internal/server/scheduledauth/validator_test.go
  • terraform/modules/compute/gcp/cloud-run/main.tf

…int local (CR pass)

Two CR nitpicks on `gke/main.tf`:

1. **Line 22-41 stale comment**: the comment block introducing the
   scheduled_task_oidc_* locals still described the OLD SA-email
   audience model in its first paragraph and then contradicted itself
   in its second. Rewritten as a single coherent explanation: `aud`
   names the recipient (endpoint URL), `sub` names the caller (SA
   unique_id, not email — the form Google actually puts in the JWT
   `sub` claim).

2. **Lines 718-735 string duplication**: the Cloud Scheduler
   `http_target.uri` and `oidc_token.audience` were both typed as
   `${var.app_url}/api/scheduled/recommendations`, i.e. the same
   string twice. CR's concern: if one side changed without the other,
   Cloud Scheduler would mint tokens with `aud` for the old endpoint
   while POSTing to the new one — silent 401. Both now read
   `local.scheduled_task_oidc_audience`, so a future endpoint rename
   becomes a syntax-level mismatch rather than a runtime auth failure.

Note: the same collapse does NOT apply to `cloud-run/main.tf` because
on that side the `uri` (real Cloud Run service URL,
`google_cloud_run_v2_service.main.uri/...`) and the `audience`
(service_name-prefixed string) are intentionally different values
— the audience uses `var.service_name` to break the Terraform
dependency cycle that the URL would create. Different values by
design, no drift risk.

Verified: `terraform validate` clean on the gke module;
`go test ./...` clean across the root suite.
@cristim
Copy link
Copy Markdown
Member Author

cristim commented Apr 29, 2026

Pass: addressed both gke nitpicks.

  1. Stale comment block (gke/main.tf:22-41): the introductory comment had its old SA-email-audience paragraph still living next to the new endpoint-bound paragraph. Rewritten as a single coherent explanation — aud names the recipient (endpoint URL), sub names the caller (SA unique_id, not email).

  2. String duplication (gke/main.tf:718-735): http_target.uri and oidc_token.audience both read local.scheduled_task_oidc_audience now, so a future endpoint rename is a syntax mismatch rather than a silent 401.

The same collapse intentionally does NOT apply to cloud-run/main.tf — there uri (real Cloud Run service URL) and audience (service_name-prefixed string used to break the TF cycle) are different values by design, no drift risk.

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 29, 2026

@cristim: Acknowledged — both GKE nitpicks are cleanly resolved:

  1. Comment rewrite (gke/main.tf:22-41) — Collapsing the old SA-email-audience paragraph into a single coherent explanation removes the misleading historical context and makes the aud = recipient / sub = caller contract unambiguous for any future reader of the Terraform.

  2. local.scheduled_task_oidc_audience collapse (gke/main.tf:718-735) — Having both http_target.uri and oidc_token.audience read from the same local means a future endpoint rename is a guaranteed compile-time catch (Terraform plan surfaces the mismatch in both places at once) rather than a silent validator rejection that would require tracing through two separate string literals.

The cloud-run carve-out is correct — uri (actual Cloud Run service URL) and audience (service_name-prefixed string that breaks the Terraform reference cycle) are semantically distinct values and must not be collapsed. The drift risk is asymmetric: on cloud-run a divergence would surface as an OIDC validation rejection; on GKE the single local eliminates that class of error entirely.

Triggering a fresh incremental review now.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

…arsing (CR pass)

Addresses CodeRabbit's open Major on `dev.tfvars.example:90`: the
example was tagged "# Cron: Daily at 2 AM" which read as "full cron
syntax supported". A user copying it and changing to e.g.
`"*/15 * * * *"` would silently get a daily-at-00:00 schedule
instead, because the consuming code in
`modules/compute/azure/container-apps/scheduled-tasks.tf:23` only
splits on spaces and reads index [1] (the hour field).

CR's literal suggestion was to switch the example to a frequency/
interval form. Doing that would diverge the Azure tfvars files from
the AWS/GCP ones (which all use the same cron string) and would
require a parser change. Both are bigger than the actual finding.
The minimal fix is to be honest about the limitation in the docs.

Three sites updated:

  1. `terraform/environments/azure/dev.tfvars.example:78-90` —
     replaced the `# Cron: Daily at 2 AM` one-liner with a comment
     block explicitly stating that only the hour field is honored,
     pointing at the parser, and warning that sub-daily / DoW /
     DoM patterns are silently ignored.

  2. `terraform/environments/azure/variables.tf:380-390` — variable
     description rewritten to the same effect, so `terraform plan`
     output and module docs reflect reality.

  3. `terraform/modules/compute/azure/container-apps/variables.tf:225-235`
     — module-level variable description updated symmetrically so
     module consumers get the same warning.

No behavioural change. Verified `terraform validate` clean on
`terraform/environments/azure` and
`terraform/modules/compute/azure/container-apps`.

If finer-grained scheduling becomes a real requirement, the
follow-up is to extend the parser and switch the variables to
something like `recommendation_schedule_frequency` + `_interval` /
`_start_time`, then drop the cron-shaped string. That's intentionally
out of scope here.
@cristim
Copy link
Copy Markdown
Member Author

cristim commented Apr 29, 2026

Addressed the open Major on dev.tfvars.example:90 (the only un-resolved finding from earlier reviews).

The variable's parser at scheduled-tasks.tf:23 only reads the hour field (split(" ", ...)\[1\]), so the example tagged # Cron: Daily at 2 AM was misleading: a user copying */15 * * * * would silently get daily-at-midnight. CR's literal suggestion was to switch to a frequency/interval form, but that diverges Azure tfvars from the AWS/GCP shape (all cron-string today) and needs a parser change — bigger than the finding warrants. Documented the hour-only limitation in three places (dev.tfvars.example, the env-level variable description, the module-level variable description) so the misleading framing is gone without changing the value or behavior. terraform validate clean on both Azure modules.

That should clear the actionable backlog from the earlier reviews. The CR-as-comments thread on the previous push acknowledged both gke nitpicks were resolved cleanly.

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 29, 2026

@cristim: The three-place documentation fix is a pragmatic and well-scoped resolution. Rather than a disruptive parser change, surfacing the hour-only limitation at the example, env-level variable, and module-level variable description removes the misleading framing precisely where a user would encounter it.

Worth noting: a retrieved learning from a prior review on this repo confirms that azurerm_logic_app_trigger_recurrence only accepts discrete frequency/interval parameters and does not accept NCRONTAB format — so the literal CR suggestion (switch to frequency/interval) would actually require a non-trivial cron→frequency/interval translation layer in the module, which validates your decision to document rather than restructure.

Triggering a fresh incremental review now.


🧠 Learnings used
Learnt from: cristim
Repo: LeanerCloud/CUDly PR: 105
File: terraform/modules/compute/azure/cleanup-function/variables.tf:0-0
Timestamp: 2026-04-28T00:07:07.398Z
Learning: In the CUDly project, `azurerm_logic_app_trigger_recurrence` (used in `terraform/modules/compute/azure/cleanup-function/main.tf`) does NOT accept NCRONTAB format. It only accepts discrete `frequency` (e.g. "Day") and `interval` (e.g. 1) parameters. Supporting a NCRONTAB-style `var.schedule` would require a NCRONTAB→frequency/interval translation layer, which is non-trivial feature work.
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@cristim cristim marked this pull request as ready for review April 30, 2026 05:11
@cristim cristim merged commit 903ff85 into feat/multicloud-web-frontend Apr 30, 2026
4 checks passed
cristim added a commit that referenced this pull request Apr 30, 2026
PR #161 made SCHEDULED_TASK_AUTH_MODE required at startup but the env
var was only wired on GCP and Azure. AWS Lambda was missed and the
post-merge deploy crashed with:

    scheduledauth: invalid configuration: SCHEDULED_TASK_AUTH_MODE is
    unset (set to oidc, bearer, or disabled)

(See https://github.com/LeanerCloud/CUDly/actions/runs/25148590105 .)

The "obvious" fix — wire SCHEDULED_TASK_AUTH_MODE=disabled to mirror
the pre-#161 reality where the bearer check was a no-op when the
secret was empty — is wrong. ModeDisabled lets every /api/scheduled/*
request through, and the Lambda Function URL has authorization_type =
NONE (the same comment in this module's main.tf explicitly notes the
URL is public). That would ship a production auth bypass.

Instead, run AWS in bearer mode with the secret resolved from Secrets
Manager at runtime — same model Azure already uses.

Changes:

* terraform/modules/secrets/aws — provision a random_password +
  aws_secretsmanager_secret + version for the scheduled-task bearer
  (gated by var.create_scheduled_task_secret, default true). Include
  it in the secret_read policy and expose ARN + name outputs.

* terraform/modules/compute/aws/lambda — set
  SCHEDULED_TASK_AUTH_MODE=bearer +
  SCHEDULED_TASK_SECRET_NAME=var.scheduled_task_secret_name on the
  Lambda env block. Add the secret ARN to the secrets_access IAM
  policy. Add the corresponding scheduled_task_secret_arn /
  scheduled_task_secret_name variables.

* terraform/modules/compute/aws/fargate — same wiring on the task
  definition env block + IAM policy + variables. Fargate isn't the
  default platform but should not regress when selected.

* terraform/environments/aws/compute.tf — pass
  module.secrets.scheduled_task_secret_arn / _name into both
  compute_lambda and compute_fargate.

Runtime path (already in place from #161): app.go reads
SCHEDULED_TASK_SECRET_NAME, resolveScheduledTaskSecret fetches the
plaintext from AWS Secrets Manager via the SecretResolver, and
buildScheduledAuth overrides saCfg.Bearer with that value before
calling scheduledauth.New. The plaintext never lives in Lambda env or
Terraform state.

Validation: terraform validate clean for all four touched modules.
Once deployed, /api/scheduled/* on the public Lambda URL will reject
every request that doesn't carry the resolved bearer secret. AWS
schedulers (EventBridge -> direct Lambda invocation) bypass the HTTP
middleware entirely, so they are unaffected.
cristim added a commit that referenced this pull request Apr 30, 2026
)

PR #161 made SCHEDULED_TASK_AUTH_MODE required at startup but the env
var was only wired on GCP and Azure. AWS Lambda was missed and the
post-merge deploy crashed with:

    scheduledauth: invalid configuration: SCHEDULED_TASK_AUTH_MODE is
    unset (set to oidc, bearer, or disabled)

(See https://github.com/LeanerCloud/CUDly/actions/runs/25148590105 .)

The "obvious" fix — wire SCHEDULED_TASK_AUTH_MODE=disabled to mirror
the pre-#161 reality where the bearer check was a no-op when the
secret was empty — is wrong. ModeDisabled lets every /api/scheduled/*
request through, and the Lambda Function URL has authorization_type =
NONE (the same comment in this module's main.tf explicitly notes the
URL is public). That would ship a production auth bypass.

Instead, run AWS in bearer mode with the secret resolved from Secrets
Manager at runtime — same model Azure already uses.

Changes:

* terraform/modules/secrets/aws — provision a random_password +
  aws_secretsmanager_secret + version for the scheduled-task bearer
  (gated by var.create_scheduled_task_secret, default true). Include
  it in the secret_read policy and expose ARN + name outputs.

* terraform/modules/compute/aws/lambda — set
  SCHEDULED_TASK_AUTH_MODE=bearer +
  SCHEDULED_TASK_SECRET_NAME=var.scheduled_task_secret_name on the
  Lambda env block. Add the secret ARN to the secrets_access IAM
  policy. Add the corresponding scheduled_task_secret_arn /
  scheduled_task_secret_name variables.

* terraform/modules/compute/aws/fargate — same wiring on the task
  definition env block + IAM policy + variables. Fargate isn't the
  default platform but should not regress when selected.

* terraform/environments/aws/compute.tf — pass
  module.secrets.scheduled_task_secret_arn / _name into both
  compute_lambda and compute_fargate.

Runtime path (already in place from #161): app.go reads
SCHEDULED_TASK_SECRET_NAME, resolveScheduledTaskSecret fetches the
plaintext from AWS Secrets Manager via the SecretResolver, and
buildScheduledAuth overrides saCfg.Bearer with that value before
calling scheduledauth.New. The plaintext never lives in Lambda env or
Terraform state.

Validation: terraform validate clean for all four touched modules.
Once deployed, /api/scheduled/* on the public Lambda URL will reject
every request that doesn't carry the resolved bearer secret. AWS
schedulers (EventBridge -> direct Lambda invocation) bypass the HTTP
middleware entirely, so they are unaffected.
cristim added a commit that referenced this pull request Apr 30, 2026
…205) (#206)

* fix(db): idempotently restore retry_execution_id column dropped by migration drift (closes #204, closes #205)

Production /api/purchases/* surfaced 500 errors after PR #161 deployed,
with the underlying CloudWatch error:

    ERROR: column "retry_execution_id" does not exist (SQLSTATE 42703)

User-facing symptoms (dashboard "Upcoming Scheduled Purchases" panel):

  * "Failed to load purchase details: Internal server error"  (#205)
  * "Failed to cancel purchase"                                (#204)

Both buttons hit `Handler.GetExecutionByID` whose SELECT references the
retry_execution_id and retry_attempt_n columns introduced by PR #168.

Root cause: migration version drift.

  * PR #189 added 000042_recommendations_add_engine_to_key earlier on
    the same day. Deployed Lambdas applied it as version 42.
  * PR #168 then added 000042_purchase_executions_retry_linkage,
    colliding on version 42.
  * PR #195 resolved the on-disk duplicate by renaming RECOMMENDATIONS
    to 000043 — the wrong direction. Deployed DBs already had
    schema_migrations.version=42 marked as the engine content, so on
    the next migrate.Up() they:
      - skipped on-disk 000042 (now retry_linkage) entirely (DB
        version >= 42 already)
      - applied on-disk 000043 (now engine again) as a no-op via the
        ADD COLUMN IF NOT EXISTS guards already in that file
    Net effect: the retry_execution_id / retry_attempt_n columns
    were never added on prod, but the Go code expects them.

Forward-only corrective. The new 000044 replays 000042's schema
additions wrapped in IF NOT EXISTS / CREATE INDEX IF NOT EXISTS so it's
a no-op on any DB that already has the columns (fresh deploys,
manually-fixed envs) and additive on the broken ones. We don't try to
repair schema_migrations history — golang-migrate only cares about the
high-water version, and we can't tell from outside whether a given
environment skipped 42 or applied 43 twice.

The migration up + down sql files document the incident and the
edge cases inline so future readers don't have to re-derive the
history from the git log.

Closes #204
Closes #205

* fix(db): repair partially-fixed DBs in 000044, not just no-op (CR pass 1)

CR pass 1 on PR #206 flagged that the original 000044 only added columns
when missing — a manually-fixed DB with the column but no FK, or with
retry_attempt_n nullable, would still be broken after this migration.

Rewrite as five idempotent steps that each check pg_catalog /
information_schema for the precise piece they own, so every reachable
partial state is repaired:

  1. Add retry_execution_id column if missing.
  2. Add the FK on retry_execution_id -> execution_id if no FK on that
     column exists. Match by column relationship (pg_constraint join
     pg_attribute) NOT by constraint name, so a fresh-deploy DB whose
     000042 ran cleanly with the auto-generated
     `purchase_executions_retry_execution_id_fkey` is recognised as
     already-FK'd and we don't add a second.
  3. Add retry_attempt_n column if missing (as nullable, so step 4 can
     backfill before flipping to NOT NULL).
  4. Backfill any NULLs and SET DEFAULT 0 / SET NOT NULL
     unconditionally — Postgres treats those as no-ops when the
     constraint is already as requested.
  5. Create the partial index if missing.

Down stays simple: DROP COLUMN ... CASCADE removes the FK regardless of
its name.

Comments inline document each reachable partial state and why each step
is safe to re-run. Migration is a no-op on healthy DBs, fully repairing
on broken ones.
@cristim cristim deleted the fix/issue-159-gcp-secret-runtime branch April 30, 2026 14:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

effort/m Days impact/few Limited audience priority/p1 Next up; this sprint severity/high Significant harm status/blocked Blocked on something external triaged Item has been triaged type/security Security finding urgency/this-sprint Within the current sprint

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant