Fix staging merge dropped on partial parallel collector failure#98
Draft
cursor[bot] wants to merge 1 commit into
Draft
Fix staging merge dropped on partial parallel collector failure#98cursor[bot] wants to merge 1 commit into
cursor[bot] wants to merge 1 commit into
Conversation
JSONL staging only merged when every collector succeeded, so one flaky subprocess dropped all staged raw_signals for that run. Merge whenever a staging run_id exists; partial collector failures still exit non-zero. Co-authored-by: Olen Latham <olen@latham.cloud>
Reviewer's GuideUpdates parallel collector staging ingest so staged JSONL is merged whenever staging is enabled and a run_id is present, even if some collectors fail, and adds/regresses tests to cover partial failure behavior and proper DB fixture usage. File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
cursor Bot
referenced
this pull request
May 24, 2026
Sequential steps (signal_url_fanout, etc.) write SQLite directly; CI_INGEST_STAGING=1 without CI_STAGING_RUN_ID caused RuntimeError. Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
JSONL staging ingest (introduced in e31dac5 / E1) only called
merge_staged_runwhen every parallel collector subprocess succeeded (success == len(targets)). If any single collector failed (e.g.x_signal_collectormissing Grok batch, flaky RSS), all staged JSONL from successful collectors was discarded and never written toraw_signals— a regression vs pre-staging behavior where each collector wrote SQLite independently.Root cause
parallel_collect.pygated merge on full success to mirror an all-or-nothing mental model, but staging decouples fetch from DB write; partial failure must still merge available JSONL.Fix
CI_INGEST_STAGINGorchestration is active andrun_idis set (merge is a no-op if no files).test_parallel_collect_staging.py; fixedtest_yc_collector_skips_writer_lock_when_stagingto useoperational_db.Test plan
uv run pytest -q→ 246 passed, 1 skippedLinear
Summary by Sourcery
Ensure JSONL staging merges partial collector results instead of dropping data when some parallel collectors fail.
Bug Fixes:
Tests:
Summary by cubic
Fixes data loss in JSONL staging by merging staged files even when some parallel collectors fail. The pipeline still exits non-zero on failures; only the merge behavior changes.
run_idis set; no longer require all collectors to succeed. No-op if no files.test_parallel_collect_staging.pyto ensure merge on partial failure; adjusted existing test to useoperational_db.Written for commit 2b21566. Summary will update on new commits. Review in cubic