[codex] Add Hyperping cron healthchecks#2324
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds ChangesCron Hyperping healthchecks
Sequence Diagram(s)sequenceDiagram
participant Scheduler as process_all_cron_tasks
participant DB as postgres (pgmq.metrics / cron_tasks)
participant QueueConsumer as /functions/v1/triggers/queue_consumer/sync
participant Hyperping as hc.hyperping.io
Scheduler->>DB: SELECT enabled cron_tasks (includes healthcheck_url)
Scheduler->>QueueConsumer: POST /functions/v1/triggers/queue_consumer/sync (queue_name, batch_size, healthcheck_url)
QueueConsumer->>DB: SELECT queue_length FROM pgmq.metrics
QueueConsumer->>Hyperping: GET healthcheck_url (when process success && queue_length == 0)
Hyperping-->>QueueConsumer: HTTP 200/timeout/error
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested labels
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
Merging this PR will not alter performance
Comparing Footnotes
|
333c78a to
0480af1
Compare
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@supabase/migrations/20260521210531_cron_hyperping_healthchecks.sql`:
- Around line 206-211: The function public.process_all_cron_tasks() lacks an
explicit ACL revocation; add a REVOKE ALL ON FUNCTION
public.process_all_cron_tasks() FROM PUBLIC; (matching the pattern used for
report_cron_success_healthcheck) immediately after setting the OWNER to postgres
so the function cannot be called by PUBLIC and to follow the repo's permission
guidelines.
- Around line 166-168: The dynamic EXECUTE in the CASE branch for task.task_type
= 'function' is vulnerable because it concatenates task.target directly; change
this to use pg_catalog.format with identifier placeholders (e.g., format('SELECT
%I.%I()', task.target_schema, task.target_function)) or validate/parse
task.target against an allowed function-call regex and use format('%s', ...)
with %I/%L as appropriate; update the logic that reads cron_tasks.target to
either split into target_schema/target_function or to validate the string before
calling EXECUTE so the EXECUTE no longer concatenates raw task.target.
In `@tests/cron-healthchecks.test.ts`:
- Line 31: Replace plain it(...) with it.concurrent(...) for the unit test
titled "keeps healthcheck URLs scoped to Hyperping" and the other test at the
later position flagged in the review so both tests run in parallel; locate the
test declarations by their names (e.g., the it('keeps healthcheck URLs scoped to
Hyperping', ...) call) and update them to it.concurrent('keeps healthcheck URLs
scoped to Hyperping', async () => { ... }) and likewise change the other it(...)
to it.concurrent(...).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 5b92eb14-ea95-4929-8c29-98f9aaac81a0
📒 Files selected for processing (3)
supabase/migrations/20260521210531_cron_hyperping_healthchecks.sqltests/cron-healthchecks.test.tstests/security-definer-execute-hardening.test.ts
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
295dea9 to
0b4064e
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (1)
tests/cron-healthchecks.test.ts (1)
69-71:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAvoid table-wide cron task updates in a parallelized integration test.
Line 70 updates all rows in
public.cron_tasks; while the transaction is open this can hold broad row locks and cause cross-test contention/flakiness. Since the assertion is already scoped bytask_name, remove this global update or scope it to test-owned fixtures only.Suggested minimal change
- await client.query('UPDATE public.cron_tasks SET enabled = false')As per coding guidelines:
tests/**/*.test.ts: “Design all tests for parallel execution across files”.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/cron-healthchecks.test.ts` around lines 69 - 71, The test opens a transaction and runs a table-wide UPDATE via client.query('UPDATE public.cron_tasks SET enabled = false') which can lock many rows and cause flakiness; change that to only update the test-owned rows (e.g., add a WHERE clause filtering by task_name or specific test fixture ids) or remove the global update entirely and instead update the single task referenced in the assertion (use client.query with "UPDATE public.cron_tasks SET enabled = false WHERE task_name = $1", passing the test task_name) so the transaction only touches the targeted row(s).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@supabase/functions/_backend/triggers/queue_consumer.ts`:
- Around line 911-920: The code currently updates
cron_healthcheck_runs.pinged_at (via the client.query returning pingResult and
setting healthcheckUrl/shouldPing) before calling pingCronHealthcheck(...);
change this so the UPDATE that sets pinged_at only runs after
pingCronHealthcheck completes successfully (i.e., after a successful Hyperping
GET/2xx response), and do not persist pinged_at on timeouts or non-2xx
responses; locate the client.query that returns healthcheck_url (using
pingResult and healthcheckRunId) and instead perform a SELECT to fetch
healthcheck_url (or keep the SELECT but defer the UPDATE) and call
pingCronHealthcheck first, then run the UPDATE statement to set pinged_at =
now() only when pingCronHealthcheck returns success; apply the same change to
the similar block around pingCronHealthcheck usage at the other occurrence
referenced (the block at the later 963-972 region).
In `@supabase/migrations/20260521210531_cron_hyperping_healthchecks.sql`:
- Around line 247-268: The insert into cron_healthcheck_runs (producing
healthcheck_run_id) must reflect only successfully queued /sync dispatches:
change the flow so you iterate FOREACH queue_name IN ARRAY queue_names, attempt
PERFORM public.process_function_queue(queue_name, batch_size, NULL) collecting
successful queue_names (or a success counter), and only after the loop INSERT
INTO public.cron_healthcheck_runs with healthcheck_url, queue_names (or
expected_workers set to the success count) and retrieve healthcheck_run_id;
alternatively, if you keep the current immediate INSERT, update the stored
expected_workers (or decrement it) inside the EXCEPTION WHEN OTHERS branch
referencing healthcheck_run_id and expected_workers so the persisted run matches
actual dispatch successes and will ping correctly.
- Around line 91-104: The current loop in the cron healthcheck helper (using
p_run_id, run_record.queue_names and querying pgmq.q_* to compute
remaining_count/total_count) incorrectly counts all live rows in each queue;
change it to only count messages belonging to this run by using a per-run marker
or launched-jobs table: ensure dispatched messages are tagged with the run id
(or a run_job entry is created for each message), then replace the pgmq.q_*
COUNT query with a filtered COUNT that only includes rows for this run (e.g.,
WHERE run_id = p_run_id or JOIN to the run_jobs table), and update
remaining_count/total_count logic to sum only those scoped rows so completion is
driven by per-run items rather than live queue depth.
In `@tests/queue-consumer-message-shape.unit.test.ts`:
- Around line 289-353: Add a regression test that simulates a failed Hyperping
(non-2xx or rejected fetch) for
__queueConsumerTestUtils__.completeCronHealthcheckRun: create a run via
createHealthcheckRun and db via createHealthcheckDb(run, 0), set fetchImpl to
either a rejected promise or a Response with status 500, call
completeCronHealthcheckRun with the success flag true, then assert that reported
is false, fetchImpl was called once, run.completed_workers is 1,
run.failed_workers is 0, and run.pinged_at remains null so the failure-to-ping
branch is covered.
---
Duplicate comments:
In `@tests/cron-healthchecks.test.ts`:
- Around line 69-71: The test opens a transaction and runs a table-wide UPDATE
via client.query('UPDATE public.cron_tasks SET enabled = false') which can lock
many rows and cause flakiness; change that to only update the test-owned rows
(e.g., add a WHERE clause filtering by task_name or specific test fixture ids)
or remove the global update entirely and instead update the single task
referenced in the assertion (use client.query with "UPDATE public.cron_tasks SET
enabled = false WHERE task_name = $1", passing the test task_name) so the
transaction only touches the targeted row(s).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: e8717c12-9468-4aeb-9533-b43d3b5aa659
📒 Files selected for processing (5)
supabase/functions/_backend/triggers/queue_consumer.tssupabase/migrations/20260521210531_cron_hyperping_healthchecks.sqltests/cron-healthchecks.test.tstests/queue-consumer-message-shape.unit.test.tstests/security-definer-execute-hardening.test.ts
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (3)
tests/cron-healthchecks.test.ts (1)
65-65: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick winSwitch this test to
it.concurrent(...)to satisfy test parallelization rules.
it(...)here violates the repository rule for*.test.tsfiles.Suggested diff
- it('creates a worker-completion healthcheck run without reporting from SQL', async () => { + it.concurrent('creates a worker-completion healthcheck run without reporting from SQL', async () => {#!/bin/bash # Verify all test declarations in this file are concurrent rg -n "^\s*it(\.concurrent)?\(" tests/cron-healthchecks.test.tsAs per coding guidelines:
tests/**/*.test.ts: “use it.concurrent() instead of it() to run tests in parallel within the same file for faster CI/CD”.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/cron-healthchecks.test.ts` at line 65, Replace the synchronous test declaration it('creates a worker-completion healthcheck run without reporting from SQL', async () => { ... }) with a concurrent declaration it.concurrent('creates a worker-completion healthcheck run without reporting from SQL', async () => { ... }) so the test runs in parallel; update the invocation for the test case that uses the named string to call it.concurrent(...) instead of it(...), leaving the async body (and any hooks inside) unchanged.supabase/functions/_backend/triggers/queue_consumer.ts (1)
911-920:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDon't persist
pinged_atbefore the Hyperping GET succeeds.These lines mark the run as pinged inside the transaction, but the actual
pingCronHealthcheck(...)call happens only after commit. If Hyperping times out or returns non-2xx, the row still claims success and later retries are suppressed permanently. Fetch the URL here, perform the GET first, and only thenUPDATE ... SET pinged_at = now()on success.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@supabase/functions/_backend/triggers/queue_consumer.ts` around lines 911 - 920, The current code updates pinged_at inside the DB transaction before actually performing the external GET (see pingResult, healthcheckRunId and subsequent pingCronHealthcheck(...)), which silences retries even if the GET fails; change the flow to SELECT or query the healthcheck_url only (no SET pinged_at) using the existing query or a SELECT for the given healthcheckRunId, call pingCronHealthcheck(...) and verify a successful 2xx response, and only after a confirmed success run an UPDATE statement that sets pinged_at = now() (use the same conditional WHERE id = $1 AND pinged_at IS NULL or RETURNING to ensure you don’t overwrite racing updates). Ensure failures do not update pinged_at and that you still handle concurrent attempts safely by checking pinged_at in the UPDATE RETURNING result.tests/queue-consumer-message-shape.unit.test.ts (1)
289-353: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick winAdd the failed-Hyperping regression case.
These tests still only cover the success and no-op branches. Add a rejected or non-2xx
fetchImplcase that assertsreported === falseandrun.pinged_atstaysnull; that's the branch currently broken incompleteCronHealthcheckRun.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/queue-consumer-message-shape.unit.test.ts` around lines 289 - 353, Add a new test in tests/queue-consumer-message-shape.unit.test.ts that covers the failed-Hyperping regression by calling __queueConsumerTestUtils__.completeCronHealthcheckRun (use createHealthcheckRun and createHealthcheckDb to set up the run/db and testHealthcheckRunId) with a fetchImpl that either rejects or returns a non-2xx Response (e.g., status 500); assert that reported === false, that fetchImpl was called (since the ping was attempted), and that run.completed_workers increments but run.pinged_at remains null (and run.failed_workers unchanged for the success path or incremented appropriately if you test the failure-path variant).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@supabase/functions/_backend/triggers/queue_consumer.ts`:
- Around line 424-425: The cloudlog call inside the messages === null branch
drops structured context; update the branch that currently calls
cloudlog(`[${queueName}] Queue read failed.`) to use structured logging with the
Hono requestId (use c.get('requestId')) and include queueName and the failure
message, e.g. cloudlog({ requestId: c.get('requestId'), queueName, msg: 'Queue
read failed' }), so it matches surrounding logging patterns and preserves
request correlation.
In `@supabase/migrations/20260521210531_cron_hyperping_healthchecks.sql`:
- Around line 159-163: The current cap of 10 is preventing large runs from
finishing: replace the capped computation of calls_needed (currently using
LEAST(..., 10)) so it is computed as ceiling(queue_size / batch_size)::int
without the hard 10 limit, and remove any corresponding cap on expected_workers
so it can reflect the true number of batches; additionally modify the run
completion logic that currently waits for tracked queue depth to reach zero to
instead track dispatched batches (or a dispatched_batches / remaining_calls
counter) and declare the run complete when dispatched_batches >= calls_needed
(or remaining_calls == 0), updating any uses of calls_needed and
expected_workers accordingly.
---
Duplicate comments:
In `@supabase/functions/_backend/triggers/queue_consumer.ts`:
- Around line 911-920: The current code updates pinged_at inside the DB
transaction before actually performing the external GET (see pingResult,
healthcheckRunId and subsequent pingCronHealthcheck(...)), which silences
retries even if the GET fails; change the flow to SELECT or query the
healthcheck_url only (no SET pinged_at) using the existing query or a SELECT for
the given healthcheckRunId, call pingCronHealthcheck(...) and verify a
successful 2xx response, and only after a confirmed success run an UPDATE
statement that sets pinged_at = now() (use the same conditional WHERE id = $1
AND pinged_at IS NULL or RETURNING to ensure you don’t overwrite racing
updates). Ensure failures do not update pinged_at and that you still handle
concurrent attempts safely by checking pinged_at in the UPDATE RETURNING result.
In `@tests/cron-healthchecks.test.ts`:
- Line 65: Replace the synchronous test declaration it('creates a
worker-completion healthcheck run without reporting from SQL', async () => { ...
}) with a concurrent declaration it.concurrent('creates a worker-completion
healthcheck run without reporting from SQL', async () => { ... }) so the test
runs in parallel; update the invocation for the test case that uses the named
string to call it.concurrent(...) instead of it(...), leaving the async body
(and any hooks inside) unchanged.
In `@tests/queue-consumer-message-shape.unit.test.ts`:
- Around line 289-353: Add a new test in
tests/queue-consumer-message-shape.unit.test.ts that covers the failed-Hyperping
regression by calling __queueConsumerTestUtils__.completeCronHealthcheckRun (use
createHealthcheckRun and createHealthcheckDb to set up the run/db and
testHealthcheckRunId) with a fetchImpl that either rejects or returns a non-2xx
Response (e.g., status 500); assert that reported === false, that fetchImpl was
called (since the ping was attempted), and that run.completed_workers increments
but run.pinged_at remains null (and run.failed_workers unchanged for the success
path or incremented appropriately if you test the failure-path variant).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: abc33135-9751-4727-aaef-6251cd8b71de
📒 Files selected for processing (5)
supabase/functions/_backend/triggers/queue_consumer.tssupabase/migrations/20260521210531_cron_hyperping_healthchecks.sqltests/cron-healthchecks.test.tstests/queue-consumer-message-shape.unit.test.tstests/security-definer-execute-hardening.test.ts
0b4064e to
a66f8cb
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/cron-healthchecks.test.ts (1)
31-63:⚠️ Potential issue | 🟠 Major | ⚡ Quick winThis test now encodes unrestricted healthcheck URLs.
The assertion on Line 63 verifies that a non-Hyperping URL is accepted. That removes coverage for URL scoping and can mask security regressions.
Please split this into positive/negative coverage (allowed Hyperping URL succeeds, disallowed domain is rejected).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/cron-healthchecks.test.ts` around lines 31 - 63, Update the single test "stores any healthcheck URL on cron tasks" into two focused cases: one positive test that inserts a cron task using an allowed Hyperping URL (e.g., a hyperping.io URL) and asserts the returned healthcheck_url matches that allowed URL, and one negative test that attempts to insert a cron task with a disallowed external domain (e.g., example.com) and asserts the insert fails or the DB rejects it (expecting an error or no returned row). Keep using the same setup/teardown pattern (client = await pool.connect(), BEGIN/ROLLBACK) and reuse the taskName variable and the same INSERT query/RETURNING behavior, but for the negative case assert the failure condition instead of expecting the row to equal the disallowed URL.
♻️ Duplicate comments (5)
supabase/functions/_backend/triggers/queue_consumer.ts (1)
418-419:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winRestore structured context in queue-read failure logs.
Line 419 drops correlation metadata. Keep this log structured with
requestIdso queue read failures remain traceable.Suggested fix
if (messages === null) { - cloudlog(`[${queueName}] Queue read failed.`) + cloudlog({ + requestId: c.get('requestId'), + message: `[${queueName}] Queue read failed.`, + queueName, + }) return {As per coding guidelines: "All endpoints must receive Hono
Context<MiddlewareKeyVariables>object and usec.get('requestId')for structured logging withcloudlog()".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@supabase/functions/_backend/triggers/queue_consumer.ts` around lines 418 - 419, The null-check branch that logs queue read failures (when messages === null) currently calls cloudlog(`[${queueName}] Queue read failed.`) and drops correlation metadata; update that branch to call cloudlog with structured context including the Hono requestId retrieved via c.get('requestId') (e.g., include requestId and queueName in the log payload) so the failure remains traceable—locate the messages === null check and replace the simple string log with a structured cloudlog call that includes c.get('requestId') and queueName.tests/queue-consumer-message-shape.unit.test.ts (1)
217-290: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick winAdd a regression test for failed healthcheck ping.
Current additions cover success/no-op only. Add one case where
fetchImplreturns non-2xx (or rejects) and assertreported === falsewith one attempted call.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/queue-consumer-message-shape.unit.test.ts` around lines 217 - 290, Add a regression test in tests/queue-consumer-message-shape.unit.test.ts that exercises __queueConsumerTestUtils__.maybePingCronHealthcheck when the healthcheck request fails: create a DB with zero pending work via createHealthcheckDb(0), supply a fetchImpl mock that either returns a Response with non-2xx status (e.g. 500) or rejects, call maybePingCronHealthcheck with the same success payload used in the passing test, then assert that the function returns reported === false and that fetchImpl was called exactly once (expect(fetchImpl).toHaveBeenCalledTimes(1)); reference maybePingCronHealthcheck and createHealthcheckDb to locate where to add the case.supabase/migrations/20260521210531_cron_hyperping_healthchecks.sql (3)
33-37:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftThe 10-batch cap still truncates large queue runs.
calls_neededis clipped to 10, so queues larger than10 * batch_sizeonly dispatch the first ten workers. That leaves part of the backlog outside this cron run, which breaks end-to-end completion semantics for healthchecked runs.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@supabase/migrations/20260521210531_cron_hyperping_healthchecks.sql` around lines 33 - 37, The code currently clips calls_needed with LEAST(..., 10), causing large queues to only dispatch up to 10 batches; remove that fixed cap so calls_needed equals the computed ceil(queue_size / batch_size) (cast to int) instead of LEAST(...,10). Update the assignment for calls_needed to compute pg_catalog.ceil(queue_size / batch_size::double precision)::int (or equivalent) and remove the trailing , 10 argument so all required batches are scheduled.
94-103:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftBest-effort dispatch can falsely report multi-queue success.
When several queues share one
healthcheck_url, a failure in any later queue is downgraded to a warning after earlier queues were already dispatched with that same URL. Those successful queues can still complete and ping the healthcheck even though part of the cron task never ran.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@supabase/migrations/20260521210531_cron_hyperping_healthchecks.sql` around lines 94 - 103, The loop currently downgrades per-queue failures to warnings so a shared healthcheck_url can be pinged even if some queues failed; change the logic to record failed queues (e.g., declare an array or boolean like failed_queues or had_error) inside the EXCEPTION block for the FOR/EACH loop that calls public.process_function_queue(queue_name, batch_size, healthcheck_url), keep the RAISE WARNING for context, and after the FOREACH finishes check the recorder and RAISE EXCEPTION (including the failed queue names and SQLERRM details) if any failures occurred so the cron job reports overall failure instead of falsely succeeding.
206-208:⚠️ Potential issue | 🟠 Major | ⚡ Quick winStill executing
task.targetas raw SQL.This concatenates a cron-table value directly into a
SECURITY DEFINEREXECUTE, so a malformed or hostiletargetcan run arbitrary SQL aspostgres. Parse and validate the target first, then build the call withpg_catalog.format()instead of raw concatenation.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@supabase/migrations/20260521210531_cron_hyperping_healthchecks.sql` around lines 206 - 208, The CASE branch handling WHEN 'function' is currently concatenating task.target into a SECURITY DEFINER EXECUTE, which allows arbitrary SQL; instead parse and validate task.target (e.g. split into function name and args, enforce allowed identifier pattern and optional safe argument types) and then build the statement with pg_catalog.format() to safely quote identifiers/values before EXECUTE; update the logic in the CASE that references task.target and the EXECUTE to validate the parsed function name (deny if it contains unsafe characters or schema changes) and use pg_catalog.format() (e.g. format with %I for the function identifier and appropriate %L/%s for args) so the executed SQL is properly escaped and cannot run arbitrary commands as postgres.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@supabase/functions/_backend/triggers/queue_consumer.ts`:
- Around line 793-813: The healthcheck URL must be validated and constrained
before performing an outbound fetch to avoid SSRF: add URL validation logic that
enforces scheme === 'https' and checks the hostname against an explicit
allowlist before calling maybePingCronHealthcheck (or inside
maybePingCronHealthcheck if easier); reject or return false for
null/invalid/unauthorized URLs. Update callers that pass healthcheckUrl (the
code paths around maybePingCronHealthcheck invocation) to use a
validatedHealthcheckUrl (or pass the original but have maybePingCronHealthcheck
perform the validation) and only call pingCronHealthcheck(fetchImpl) when
validation passes; ensure fetchImpl remains injectable for tests and keep
referencing maybePingCronHealthcheck, pingCronHealthcheck, and the
healthcheckUrl parameter so reviewers can locate the changes.
In `@supabase/migrations/20260521210531_cron_hyperping_healthchecks.sql`:
- Around line 158-162: The schedule fields (current_hour, current_minute,
current_second, current_dow, current_day) are being extracted from NOW(), which
is session-timezone dependent; instead compute a single UTC timestamp (e.g.,
utc_now := timezone('UTC', NOW()) or NOW() AT TIME ZONE 'UTC') and then EXTRACT
the HOUR/MINUTE/SECOND/DOW/DAY from that utc_now, updating the assignments for
current_hour/current_minute/current_second/current_dow/current_day so run_at_*,
run_on_dow and run_on_day are based on UTC.
- Around line 19-20: Change the guard that currently reads "IF batch_size <= 0
THEN RAISE EXCEPTION 'batch_size must be positive';" to explicitly reject NULLs
by using "IF batch_size IS NULL OR batch_size <= 0 THEN RAISE EXCEPTION
'batch_size must be positive';" and apply this same replacement to the second
function that accepts the batch_size parameter (the other occurrence around the
later guard). This ensures NULL input is trapped early for both functions that
validate batch_size.
---
Outside diff comments:
In `@tests/cron-healthchecks.test.ts`:
- Around line 31-63: Update the single test "stores any healthcheck URL on cron
tasks" into two focused cases: one positive test that inserts a cron task using
an allowed Hyperping URL (e.g., a hyperping.io URL) and asserts the returned
healthcheck_url matches that allowed URL, and one negative test that attempts to
insert a cron task with a disallowed external domain (e.g., example.com) and
asserts the insert fails or the DB rejects it (expecting an error or no returned
row). Keep using the same setup/teardown pattern (client = await pool.connect(),
BEGIN/ROLLBACK) and reuse the taskName variable and the same INSERT
query/RETURNING behavior, but for the negative case assert the failure condition
instead of expecting the row to equal the disallowed URL.
---
Duplicate comments:
In `@supabase/functions/_backend/triggers/queue_consumer.ts`:
- Around line 418-419: The null-check branch that logs queue read failures (when
messages === null) currently calls cloudlog(`[${queueName}] Queue read failed.`)
and drops correlation metadata; update that branch to call cloudlog with
structured context including the Hono requestId retrieved via c.get('requestId')
(e.g., include requestId and queueName in the log payload) so the failure
remains traceable—locate the messages === null check and replace the simple
string log with a structured cloudlog call that includes c.get('requestId') and
queueName.
In `@supabase/migrations/20260521210531_cron_hyperping_healthchecks.sql`:
- Around line 33-37: The code currently clips calls_needed with LEAST(..., 10),
causing large queues to only dispatch up to 10 batches; remove that fixed cap so
calls_needed equals the computed ceil(queue_size / batch_size) (cast to int)
instead of LEAST(...,10). Update the assignment for calls_needed to compute
pg_catalog.ceil(queue_size / batch_size::double precision)::int (or equivalent)
and remove the trailing , 10 argument so all required batches are scheduled.
- Around line 94-103: The loop currently downgrades per-queue failures to
warnings so a shared healthcheck_url can be pinged even if some queues failed;
change the logic to record failed queues (e.g., declare an array or boolean like
failed_queues or had_error) inside the EXCEPTION block for the FOR/EACH loop
that calls public.process_function_queue(queue_name, batch_size,
healthcheck_url), keep the RAISE WARNING for context, and after the FOREACH
finishes check the recorder and RAISE EXCEPTION (including the failed queue
names and SQLERRM details) if any failures occurred so the cron job reports
overall failure instead of falsely succeeding.
- Around line 206-208: The CASE branch handling WHEN 'function' is currently
concatenating task.target into a SECURITY DEFINER EXECUTE, which allows
arbitrary SQL; instead parse and validate task.target (e.g. split into function
name and args, enforce allowed identifier pattern and optional safe argument
types) and then build the statement with pg_catalog.format() to safely quote
identifiers/values before EXECUTE; update the logic in the CASE that references
task.target and the EXECUTE to validate the parsed function name (deny if it
contains unsafe characters or schema changes) and use pg_catalog.format() (e.g.
format with %I for the function identifier and appropriate %L/%s for args) so
the executed SQL is properly escaped and cannot run arbitrary commands as
postgres.
In `@tests/queue-consumer-message-shape.unit.test.ts`:
- Around line 217-290: Add a regression test in
tests/queue-consumer-message-shape.unit.test.ts that exercises
__queueConsumerTestUtils__.maybePingCronHealthcheck when the healthcheck request
fails: create a DB with zero pending work via createHealthcheckDb(0), supply a
fetchImpl mock that either returns a Response with non-2xx status (e.g. 500) or
rejects, call maybePingCronHealthcheck with the same success payload used in the
passing test, then assert that the function returns reported === false and that
fetchImpl was called exactly once (expect(fetchImpl).toHaveBeenCalledTimes(1));
reference maybePingCronHealthcheck and createHealthcheckDb to locate where to
add the case.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 6b387245-628a-42c0-9bc1-7e5265a91008
📒 Files selected for processing (5)
supabase/functions/_backend/triggers/queue_consumer.tssupabase/migrations/20260521210531_cron_hyperping_healthchecks.sqltests/cron-healthchecks.test.tstests/queue-consumer-message-shape.unit.test.tstests/security-definer-execute-hardening.test.ts
a66f8cb to
8f668b4
Compare
8f668b4 to
b07069a
Compare
b07069a to
2fa0622
Compare
|



Summary (AI generated)
healthcheck_urlsupport onpublic.cron_tasks.healthcheck_urluse a small separate queue-consumer starter that passes the URL toqueue_consumer; existingprocess_function_queuecalls stay unchanged.queue_consumercalls the URL withGETonly afterprocessQueuesucceeds and the queue is empty.Motivation (AI generated)
Cron healthchecks should be reported by the consumer after the actual worker work finishes, not by SQL when work is only started.
Business Impact (AI generated)
This lets Capgo wire external cron healthchecks to background queue work without false success reports from failed worker batches.
Test Plan (AI generated)
bunx eslint supabase/functions/_backend/triggers/queue_consumer.ts tests/queue-consumer-message-shape.unit.test.ts tests/cron-healthchecks.test.ts tests/security-definer-execute-hardening.test.tsbunx sqlfluff lint --dialect postgres supabase/migrations/20260521210531_cron_hyperping_healthchecks.sqlbun run supabase:db:resetbun run supabase:with-env -- bunx vitest run tests/cron-healthchecks.test.ts tests/security-definer-execute-hardening.test.ts tests/queue-consumer-message-shape.unit.test.tsbunx vitest run tests/queue-consumer-message-shape.unit.test.tsbun lintbun typecheckgit diff --checkGenerated with AI