Skip to content

Conversation

@maparent
Copy link
Collaborator

@maparent maparent commented Nov 23, 2025

https://linear.app/discourse-graphs/issue/ENG-1075/end-sync-task-bug
Problem: the pg_advisory lock in propose_sync_task was not the right approach, as the subsequent updates were not visible to other threads when they acquired the lock, and before the release. A more classic 'for update' solves the problem better.
We also noticed a stampede problem in that waiting clients would all start at the same time.

Summary by CodeRabbit

  • Performance & Reliability
    • Enhanced sync scheduling with optimized timing mechanisms to gracefully handle concurrent operations and prevent system resource conflicts when multiple processes reschedule.
    • Improved database task coordination with better timeout management, backoff recovery, and enhanced locking mechanisms for more reliable sync task execution.

✏️ Tip: You can customize this high-level summary in your review settings.

@linear
Copy link

linear bot commented Nov 23, 2025

@supabase
Copy link

supabase bot commented Nov 23, 2025

Updates to Preview Branch (eng-1075-end-sync-task-bug) ↗︎

Deployments Status Updated
Database Sun, 23 Nov 2025 23:36:41 UTC
Services Sun, 23 Nov 2025 23:36:41 UTC
APIs Sun, 23 Nov 2025 23:36:41 UTC

Tasks are run on every commit but only new migration files are pushed.
Close and reopen this PR if you want to apply changes from existing seed or migration files.

Tasks Status Updated
Configurations Sun, 23 Nov 2025 23:36:41 UTC
Migrations Sun, 23 Nov 2025 23:36:41 UTC
Seeding Sun, 23 Nov 2025 23:36:42 UTC
Edge Functions Sun, 23 Nov 2025 23:36:44 UTC

View logs for this Workflow Run ↗︎.
Learn more about Supabase for Git ↗︎.

@maparent maparent force-pushed the eng-1075-end-sync-task-bug branch from c375d64 to 1611948 Compare November 23, 2025 23:23
@maparent
Copy link
Collaborator Author

@CodeRabbit review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 23, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 23, 2025

📝 Walkthrough

Walkthrough

The changes modify sync task scheduling across application and database layers. The application adds random jitter (100–199 ms) to rescheduling delays to prevent stampedes. The database introduces a new propose_sync_task function that validates intervals, manages sync task state with locking and backoff logic, and replaces advisory locks with row-level locks in the existing schema.

Changes

Cohort / File(s) Summary
Application-level scheduling
apps/roam/src/utils/syncDgNodesToSupabase.ts
Adds random jitter (100–199 ms) to the delay when rescheduling postponed syncs, changing timing behavior to avoid process stampedes.
Database sync task coordination
packages/database/supabase/migrations/20251123231626_use_for_update.sql
Introduces new PL/pgSQL function public.propose_sync_task that coordinates sync task scheduling, including interval validation, conflict handling, state management, exponential backoff on failures, timeout tracking, and task readiness determination.
Database locking and state management
packages/database/supabase/schemas/sync.sql
Replaces advisory locking with row-level SELECT...FOR UPDATE locking, removes pg_advisory_unlock operations, and extends task ownership updates to also set last_task_end to NULL.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant propose_sync_task
    participant sync_info

    Client->>propose_sync_task: propose_sync_task(target, function, worker, timeout, interval)
    
    Note over propose_sync_task: Validate<br/>timeout >= 1 min<br/>task_interval >= 1 min
    
    alt Validation fails
        propose_sync_task-->>Client: NULL
    else Validation passes
        rect rgb(200, 220, 255)
            Note over propose_sync_task, sync_info: Attempt insert or get existing
            propose_sync_task->>sync_info: INSERT...ON CONFLICT (sync_target, sync_function)
        end
        
        alt Row newly inserted
            propose_sync_task->>sync_info: SELECT last_successful_run
            propose_sync_task-->>Client: last_successful_run timestamp
        else Pre-existing row
            rect rgb(255, 220, 200)
                Note over propose_sync_task, sync_info: Lock for exclusive access
                propose_sync_task->>sync_info: SELECT...FOR UPDATE
            end
            
            rect rgb(220, 255, 220)
                Note over propose_sync_task: Check timeout & backoff<br/>Apply exponential backoff<br/>to task_interval
            end
            
            alt Task timed out or overlapping
                propose_sync_task->>sync_info: UPDATE status=timeout<br/>increment failure_count
                propose_sync_task-->>Client: next_planned_time
            else Task ready to execute
                propose_sync_task->>sync_info: UPDATE to active<br/>set worker, timeout, start_time<br/>reset end/start fields
                propose_sync_task-->>Client: last_task_end or now()
            else Not ready yet
                propose_sync_task->>sync_info: UPDATE status/fields as needed
                propose_sync_task-->>Client: next_planned_time
            end
        end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

  • Database migration file (20251123231626_use_for_update.sql): The new propose_sync_task function is logic-dense with multiple conditional branches, state transitions, interval calculations, exponential backoff application, and return value semantics. Requires careful verification of the logic flow, edge cases (timeouts, conflicts, backoff calculations), and correctness of the result timestamp returned in each path.
  • Schema file (sync.sql): Row-level locking replacement and the addition of last_task_end = NULL in the update requires verification of locking semantics, potential deadlock scenarios, and whether the state changes maintain data consistency.
  • Application file (syncDgNodesToSupabase.ts): Low complexity; random jitter addition is straightforward but benefits from confirming the timing intention aligns with the database-side logic.

Possibly related PRs

Pre-merge checks

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'ENG-1075 End-sync task bug' directly references the core bug being fixed (end-sync task bug) and includes the issue ticket number, clearly summarizing the main change.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 18e7685 and 7e26715.

📒 Files selected for processing (3)
  • apps/roam/src/utils/syncDgNodesToSupabase.ts (1 hunks)
  • packages/database/supabase/migrations/20251123231626_use_for_update.sql (1 hunks)
  • packages/database/supabase/schemas/sync.sql (2 hunks)
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-06-09T16:57:14.681Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 198
File: packages/database/supabase/migrations/20250605083319_alpha_upload.sql:157-337
Timestamp: 2025-06-09T16:57:14.681Z
Learning: Migration files in packages/database/supabase/migrations/ are historical snapshots that preserve database schema and functions as they existed at the time of creation. These files should not be updated to reflect current schema changes, even if they reference tables or columns that have since been modified or removed. Schema incompatibilities in migration files are expected and acceptable as they represent the valid state at the time of migration.

Applied to files:

  • packages/database/supabase/migrations/20251123231626_use_for_update.sql
  • packages/database/supabase/schemas/sync.sql
📚 Learning: 2025-10-18T18:58:16.100Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 504
File: apps/roam/src/utils/syncDgNodesToSupabase.ts:523-531
Timestamp: 2025-10-18T18:58:16.100Z
Learning: In `apps/roam/src/utils/syncDgNodesToSupabase.ts`, partial successes from `upsertNodesToSupabaseAsContent` and `addMissingEmbeddings` (indicated by numeric return values showing the count of successful operations) should NOT trigger backoff. Only complete failures (false) should trigger the exponential backoff mechanism. This design allows the sync process to continue making progress even when some items fail.

Applied to files:

  • apps/roam/src/utils/syncDgNodesToSupabase.ts
📚 Learning: 2025-05-20T03:04:21.602Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 165
File: packages/database/supabase/migrations/20250512142307_sync_table.sql:20-21
Timestamp: 2025-05-20T03:04:21.602Z
Learning: Database migration files (like the SQL files in packages/database/supabase/migrations/) should not be modified after they're created, even to fix minor issues like unused variables. Issues should be addressed in newer schema files instead to maintain migration history integrity.

Applied to files:

  • packages/database/supabase/schemas/sync.sql
📚 Learning: 2025-05-22T23:50:23.771Z
Learnt from: maparent
Repo: DiscourseGraphs/discourse-graph PR: 165
File: packages/database/supabase/migrations/20250513173724_content_concept_key.sql:37-60
Timestamp: 2025-05-22T23:50:23.771Z
Learning: For the discourse-graph project, database schema management focuses on the final state in the supabase/schemas directory, not on the individual migration files. When reviewing database changes, consider only the schema definitions in this directory, not potential duplications or conflicts across migration files.

Applied to files:

  • packages/database/supabase/schemas/sync.sql
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Supabase Preview
  • GitHub Check: Supabase Preview
🔇 Additional comments (6)
apps/roam/src/utils/syncDgNodesToSupabase.ts (1)

426-428: LGTM! Jitter addition addresses stampede problem.

The random jitter (100–299ms) effectively distributes client wake-up times when multiple instances are postponed to the same timestamp, preventing a stampede scenario. This complements the database-side FOR UPDATE locking introduced in this PR.

packages/database/supabase/migrations/20251123231626_use_for_update.sql (3)

32-35: LGTM! FOR UPDATE replaces advisory locks correctly.

The row-level lock ensures exclusive access and visibility of updates, directly addressing the PR's core objective. This is the correct approach to replace pg_advisory_lock.


46-46: Linear backoff implementation.

The backoff grows linearly with failure count: interval * (1 + failures). This provides basic backoff behavior as noted in the comment. The application layer (line 479 of syncDgNodesToSupabase.ts) separately applies exponential backoff, so the two mechanisms work in tandem.


41-44: Timeout detection logic is dense but correct.

The condition t_status = 'active' AND t_last_task_start >= coalesce(t_last_task_end, t_last_task_start) AND start_time > t_times_out_at correctly identifies a task that started but hasn't completed and has exceeded its timeout. The middle clause ensures the task is still in-flight (start time is at or after the last end time).

packages/database/supabase/schemas/sync.sql (2)

131-134: LGTM! Row-level locking ensures proper synchronization.

The FOR UPDATE clause provides the necessary exclusive lock when claiming a pre-existing task, replacing the advisory lock approach and ensuring update visibility to subsequent callers.


152-159: LGTM! Resetting last_task_end maintains clean task state.

Setting last_task_end = NULL when taking ownership ensures the task state correctly reflects that a new run is starting. This pairs properly with last_task_start = start_time to establish a fresh execution window.

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@maparent maparent requested a review from sid597 November 23, 2025 23:43
Copy link
Collaborator

@sid597 sid597 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixes what we discovered, LGTM

@maparent maparent merged commit 64230b1 into main Nov 27, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

No open projects
Status: No status

Development

Successfully merging this pull request may close these issues.

3 participants