Skip to content

Fix staging merge dropped on partial parallel collector failure#98

Draft
cursor[bot] wants to merge 1 commit into
masterfrom
cursor/critical-correctness-bugs-65a2
Draft

Fix staging merge dropped on partial parallel collector failure#98
cursor[bot] wants to merge 1 commit into
masterfrom
cursor/critical-correctness-bugs-65a2

Conversation

@cursor
Copy link
Copy Markdown

@cursor cursor Bot commented May 24, 2026

Summary

JSONL staging ingest (introduced in e31dac5 / E1) only called merge_staged_run when every parallel collector subprocess succeeded (success == len(targets)). If any single collector failed (e.g. x_signal_collector missing Grok batch, flaky RSS), all staged JSONL from successful collectors was discarded and never written to raw_signals — a regression vs pre-staging behavior where each collector wrote SQLite independently.

Root cause

parallel_collect.py gated merge on full success to mirror an all-or-nothing mental model, but staging decouples fetch from DB write; partial failure must still merge available JSONL.

Fix

  • Merge whenever CI_INGEST_STAGING orchestration is active and run_id is set (merge is a no-op if no files).
  • Pipeline still exits non-zero when collectors fail; only the data-loss path is fixed.
  • Added test_parallel_collect_staging.py; fixed test_yc_collector_skips_writer_lock_when_staging to use operational_db.

Test plan

  • uv run pytest -q → 246 passed, 1 skipped

Linear

  • Issue: COM-___ —
Open in Web View Automation 

Summary by Sourcery

Ensure JSONL staging merges partial collector results instead of dropping data when some parallel collectors fail.

Bug Fixes:

  • Allow staging ingest to merge staged runs whenever a run ID is present, even if not all parallel collectors succeed, preventing loss of successful collectors' data.

Tests:

  • Add tests covering staging merge behavior on partial collector failure and adjust existing staging test to use the operational DB fixture.

Summary by cubic

Fixes data loss in JSONL staging by merging staged files even when some parallel collectors fail. The pipeline still exits non-zero on failures; only the merge behavior changes.

  • Bug Fixes
    • Merge whenever staging is active and run_id is set; no longer require all collectors to succeed. No-op if no files.
    • Added test_parallel_collect_staging.py to ensure merge on partial failure; adjusted existing test to use operational_db.

Written for commit 2b21566. Summary will update on new commits. Review in cubic

JSONL staging only merged when every collector succeeded, so one
flaky subprocess dropped all staged raw_signals for that run. Merge
whenever a staging run_id exists; partial collector failures still
exit non-zero.

Co-authored-by: Olen Latham <olen@latham.cloud>
@sourcery-ai
Copy link
Copy Markdown

sourcery-ai Bot commented May 24, 2026

Reviewer's Guide

Updates parallel collector staging ingest so staged JSONL is merged whenever staging is enabled and a run_id is present, even if some collectors fail, and adds/regresses tests to cover partial failure behavior and proper DB fixture usage.

File-Level Changes

Change Details Files
Relax merge gating so staging runs are merged even on partial collector failure while preserving non-zero exit on failures.
  • Removed requirement that the number of successful collectors equals the number of targets before merging staged output.
  • Kept merge conditioned on staging mode being active and a CI_STAGING_RUN_ID being set so merge is still a no-op when no run is active.
  • Left failure handling intact so the pipeline still reports non-zero status when any collector fails.
apps/worker/automation/parallel_collect.py
Adjust existing staging ingest test to use the operational_db fixture so it runs in the correct DB context.
  • Updated test_yc_collector_skips_writer_lock_when_staging signature to accept the operational_db fixture.
  • Ensured the test continues to assert that JSONL staging does not hold the POSIX writer_lock.
tests/test_staging_ingest.py
Add regression test to ensure staging merge runs when some parallel collectors fail and staging cleanup is still invoked.
  • Introduced test_parallel_collect_staging.py with a test that configures staging env vars and simulates mixed collector success/failure.
  • Patched run_script to return controlled success/failure tuples per script, and patched merge_staged_run and clear_staging_run to assert they are called with the staging run_id.
  • Verified that run_parallel_collectors returns the correct number of successes and total collectors under partial failure.
tests/test_parallel_collect_staging.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

cursor Bot referenced this pull request May 24, 2026
Sequential steps (signal_url_fanout, etc.) write SQLite directly;
CI_INGEST_STAGING=1 without CI_STAGING_RUN_ID caused RuntimeError.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant