feat(worker): make upload_finisher source-of-truth for commit uploads by thomasrockhu-codecov · Pull Request #756 · codecov/umbrella

thomasrockhu-codecov · 2026-03-11T14:04:31Z

Summary

make upload_finisher rebuild its input from commit processing state and merge with callback payload by upload_id
include merged in finisher idempotency terminal states
remove processor-side finisher enqueueing now that finisher input is commit-state-driven

Test plan

ruff check --fix apps/worker/tasks/upload_finisher.py apps/worker/tasks/tests/unit/test_upload_finisher_task.py apps/worker/services/processing/processing.py apps/worker/services/tests/test_processing.py
ruff format apps/worker/tasks/upload_finisher.py apps/worker/tasks/tests/unit/test_upload_finisher_task.py apps/worker/services/processing/processing.py apps/worker/services/tests/test_processing.py
run worker pytest suite in CI

Made with Cursor

Note

High Risk
Reworks the core upload pipeline (state tracking, task scheduling, and merge/finish behavior) and adds new cron/merge task paths, which could impact report correctness, notification timing, and lock/idempotency under concurrency and retries.

Overview
Makes upload merging/finishing commit-state-driven and gate-controlled. ProcessingState is migrated from Redis sets to database Upload.state_id queries/updates, and update_uploads() now marks successful uploads as MERGED (with safer handling for missing error payloads).

Changes task orchestration for coverage uploads. Coverage processing no longer uses a Celery chord to call the finisher; processors mark uploads PROCESSED and attempt to enqueue upload_finisher via a new Redis finisher gate (single-run per commit with TTL). The finisher is refactored to always reconstruct batches from commit state, run time-budgeted/batched merges with self-scheduled continuation/sweep followups, and to clear the gate on terminal outcomes.

Adds recovery paths. Introduces StuckUploadsCheckTask cron to detect commits with long-stale PROCESSED uploads and trigger a new upload_merger task (with its own gate/metrics), providing watchdog/sweep/continuation behavior and post-processing (notify/PR comparison/timeseries/cache invalidation) when fully complete.

^{Written by Cursor Bugbot for commit c48148b. This will update automatically on new commits. Configure here.}

Ensure upload_finisher reconstructs all mergeable uploads for a commit rather than relying only on callback payload, and stop processor-side finisher enqueueing now that finisher input is source-of-truth. Made-with: Cursor

Update upload state transitions so successful finisher merges write MERGED state, and align finisher test expectations with the new terminal state. Made-with: Cursor

apps/worker/tasks/tests/unit/test_upload_finisher_task.py

Drop old per-commit concurrency limiter test coverage that no longer matches upload_finisher behavior and was failing test module import. Made-with: Cursor

sentry · 2026-03-11T14:35:14Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.26%. Comparing base (42aa7cc) to head (b4a0fcd).
✅ All tests successful. No failed tests found.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #756      +/-   ##
==========================================
- Coverage   92.26%   92.26%   -0.01%     
==========================================
  Files        1304     1304              
  Lines       47925    47923       -2     
  Branches     1628     1628              
==========================================
- Hits        44218    44214       -4     
- Misses       3398     3400       +2     
  Partials      309      309

Flag	Coverage Δ
workerintegration	`58.69% <76.92%> (+0.13%)`	⬆️
workerunit	`90.36% <100.00%> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

codecov-notifications · 2026-03-11T14:35:28Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

📢 Thoughts on this report? Let us know!

apps/worker/tasks/upload_finisher.py

Use reconstructed results as the base and let callback payload win on duplicate upload IDs so stale reconstruction cannot overwrite successful processor outcomes. Made-with: Cursor

apps/worker/tasks/upload_finisher.py

apps/worker/services/processing/processing.py

Use the full processed set instead of random sampling so finisher reconstruction cannot miss uploads and produce incomplete merged reports. Made-with: Cursor

apps/worker/tasks/upload_finisher.py

drazisil-codecov

Approved with a question.

Treat only merged and error uploads as already completed and rename the guard variable to all_already_merged for clearer intent. Made-with: Cursor

apps/worker/tasks/upload_finisher.py

Treat processed uploads as already finalized in finisher idempotency checks to avoid duplicate merges during mixed-version deployments. Made-with: Cursor

apps/worker/services/processing/merging.py

Guard update_uploads against malformed unsuccessful processing results by treating missing error payloads as unknown_processing instead of crashing with UnboundLocalError. Made-with: Cursor

Add a regression test for unsuccessful processing results without an error dict to ensure we persist UNKNOWN_PROCESSING instead of crashing. Made-with: Cursor

apps/worker/tasks/upload_finisher.py

apps/worker/services/processing/state.py

…ibilities Aggregate follow-up task scheduling through an enum-driven helper, extract finisher gate operations into a dedicated module, move remaining upload counting into ProcessingState, drop sweep-attempt caps, rely on DB reconstruction as source of truth, and remove UploadFlow wiring from processor/finisher paths. Made-with: Cursor

apps/worker/services/processing/processing.py

apps/worker/tasks/tests/unit/test_upload_finisher_task.py

apps/worker/services/processing/merging.py

Use followup-type countdown mapping in a single scheduler and process uploads in batches of 10 with early continuation when time budget is near exhaustion, then re-read merge-ready uploads from ProcessingState. Made-with: Cursor

apps/worker/tasks/upload_finisher.py

Persist processed-state visibility before finisher enqueue, remove redundant merged transition in finisher, and update idempotency-era test expectations to match state-driven reconstruction behavior. Made-with: Cursor

Pass db_session into ProcessingState usage in upload scheduling and align finisher unit setup with state-reconstruction behavior so worker CI paths continue to run under DB-backed processing state. Made-with: Cursor

apps/worker/services/processing/processing.py

apps/worker/tasks/upload_merger.py

Apply ruff formatting output required by the CI Run Lint hook for touched worker files. Made-with: Cursor

Return the updated master report from batch merges and stop rolling back before final save so staged upload state transitions are retained through commit. Made-with: Cursor

cursor · 2026-03-11T20:45:54Z

apps/worker/tasks/upload_finisher.py

-                    "reportid": str(upload.report.external_id),
                },
                "successful": bool(has_report),
+                "upload_id": upload.id_,


Reconstruction loses error status when intermediate report exists

Medium Severity

The _reconstruct_processing_results method determines upload success by checking whether an intermediate report exists in Redis (redis_connection.exists(report_key)). However, in processing.py, when processing_result.error is truthy but processing_result.report is also present, both the error is recorded AND the intermediate report is saved. The original callback payload correctly set successful=False in that case, but the reconstruction path now sees the intermediate report exists and marks the upload as successful=True. This causes uploads with processing errors to be incorrectly merged as successful instead of being marked as errors.

Additional Locations (1)

apps/worker/tasks/upload_merger.py#L224-L239

cursor · 2026-03-11T20:45:54Z

apps/worker/tasks/stuck_uploads_cron.py

+                        "commit_yaml": {},
+                        "trigger": "cron",
+                    }
+                )


Stuck uploads cron passes empty YAML configuration

Medium Severity

The StuckUploadsCheckTask triggers the merger with commit_yaml: {}, an empty dict. The merger uses this YAML for report merging configuration including coverage precision, rounding, carry-forward session clearing, and notification settings. Commits with custom YAML configuration (e.g., coverage.precision, flag carry-forward rules) would be merged with default settings instead, potentially producing incorrect coverage values or failing to clear stale carry-forward sessions.

cursor · 2026-03-11T20:45:54Z

apps/worker/tasks/upload_merger.py

+        merge_result,
+    )
+
+    return master_report, processing_results


Unused diff parameter in merge_batch function signature

Low Severity

The merge_batch function accepts a diff parameter that is never used within the function body. The diff is instead applied later in save_and_commit. This dead parameter is misleading — a reader might expect the diff to be applied during merging, and a future maintainer could accidentally remove the save_and_commit diff application thinking it's handled in merge_batch.

apps/worker/tasks/upload_finisher.py

sentry · 2026-03-11T21:21:27Z

apps/worker/services/processing/processing.py

+    # this is a noop in normal cases, but relevant for task retries:
    state.mark_uploads_as_processing([upload_id])


Bug: A call is made to state.mark_uploads_as_processing(), but this method was removed from the refactored ProcessingState class, which will cause a runtime AttributeError.
_{Severity: CRITICAL}

Suggested Fix

Remove the call to the non-existent mark_uploads_as_processing method from processing.py and upload.py. The refactoring of ProcessingState to be database-backed made this state-tracking call obsolete, and its removal from the callers will align the code with the new design.

Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: apps/worker/services/processing/processing.py#L51-L52 Potential issue: Code in `processing.py` and `upload.py` calls the method `state.mark_uploads_as_processing()`. However, this method was removed from the refactored `ProcessingState` class, which is now database-backed instead of Redis-backed. As a result, any attempt to process an upload will raise an `AttributeError` because the method does not exist on the `ProcessingState` object. This will cause the task to crash and completely block the upload processing pipeline.

cursor · 2026-03-11T21:23:25Z

apps/worker/tasks/upload_finisher.py

+                            "continuation_needed": remaining_processed_uploads > 0,
+                            "processing_results": merged_processing_results,
+                            "upload_ids": merged_upload_ids,
+                        }


Inverted time check causes merge loop to never execute

High Severity

The time-budget condition in _process_reports_with_lock is inverted. On the first iteration, time.monotonic() - merge_start_time is ~0, which is always <= FINISHER_CONTINUATION_BUFFER_SECONDS (10). This causes an immediate return before any merging work is done. The defined constant FINISHER_MERGE_TIME_BUDGET_SECONDS (200) is never referenced in the check. The correct condition likely needs to check whether elapsed time has exceeded the budget minus the buffer, e.g., >= FINISHER_MERGE_TIME_BUDGET_SECONDS - FINISHER_CONTINUATION_BUFFER_SECONDS. As written, the finisher will never merge any reports and will instead spin in an infinite self-scheduling continuation loop.

cursor · 2026-03-11T21:23:25Z

apps/worker/tasks/upload_merger.py

+        merge_result,
+    )
+
+    return master_report, processing_results


Unused diff parameter in merge_batch function

Low Severity

The diff parameter in merge_batch is declared but never referenced in the function body. The diff is only applied later in save_and_commit. This dead parameter adds confusion about where diff application occurs and could mislead future contributors into thinking it's handled per-batch.

Reapply no-chord coverage scheduling in UploadTask, restore ProcessingState APIs used by callers/tests, bring back finisher gate existence helper, and keep finisher merge-loop helper improvements with corrected continuation timing. Made-with: Cursor

sentry · 2026-03-11T21:33:07Z

apps/worker/tasks/upload_finisher.py

+                        "run_impl: Saving combined report",
+                        extra={"processing_results": current_batch},
+                    )
+                    save_and_commit(
+                        db_session,
+                        report_service,
+                        commit,
+                        report,
+                        pending_upload_ids,
+                        diff,
+                    )


Bug: The report processing loop may apply the same diff multiple times when handling uploads in batches, as the report is reloaded from storage on each iteration.
_{Severity: MEDIUM}

Suggested Fix

To prevent the diff from being applied multiple times, load the master report from storage once before the batch processing loop begins. Pass this report object through each iteration, merging new data and only applying the diff once before the final save after the loop completes.

Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: apps/worker/tasks/upload_finisher.py#L415-L442 Potential issue: In the batch processing loop within `_process_reports_with_lock`, the master report is reloaded from storage on each iteration. The `save_and_commit` function is then called, which applies a diff to this report before saving it. Consequently, if a commit's uploads are processed in multiple batches, the same diff will be applied multiple times—once for each batch. This repeated application could lead to data corruption or incorrect report states if the `apply_diff` operation is not idempotent.

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

cursor · 2026-03-11T22:00:57Z

apps/worker/tasks/upload_finisher.py

-                log.info("run_impl: Cleaning up intermediate reports")
-                cleanup_intermediate_reports(upload_ids)
+                    merged_processing_results.extend(current_batch)
+                    merged_upload_ids.extend(pending_upload_ids)


Diff applied multiple times in multi-batch finisher loop

High Severity

In _process_reports_with_lock, each loop iteration calls perform_report_merging (which reloads the master report from storage — already with the diff applied from the previous save_and_commit) and then save_and_commit applies the diff again. For commits with more than FINISHER_UPLOAD_MERGE_BATCH_SIZE (10) uploads, the diff is applied once per batch, corrupting coverage line annotation in the final report. The upload_merger.py counterpart correctly accumulates master_report in-memory across batches and applies the diff only once at the end.

Additional Locations (2)

apps/worker/tasks/upload_finisher.py#L82-L95

apps/worker/tasks/upload_finisher.py#L827-L879

cursor · 2026-03-11T22:00:57Z

apps/worker/services/tests/test_processing_state.py

        state.mark_upload_as_processed(id)

-    # we have only processed 8 out of 9. we want to do a batched merge
+    # We now reconstruct all processed uploads in one pass.


ProcessingState test suite broken by API change

Medium Severity

ProcessingState.__init__ was changed in this PR to require db_session as a mandatory third argument, but test_processing_state.py still instantiates ProcessingState(1234, uuid4().hex) everywhere — including in the test_batch_merging_many_uploads test that was modified in this very diff. Every test in the file will raise TypeError at instantiation. Additionally, the TestProcessingStateEmptyListGuards class and the parametrized tests still assert Redis calls (mock_redis.sadd, mock_redis.srem) that are no longer made by the DB-backed implementation.

feat(worker): make upload_finisher process full commit state

2cea686

Ensure upload_finisher reconstructs all mergeable uploads for a commit rather than relying only on callback payload, and stop processor-side finisher enqueueing now that finisher input is source-of-truth. Made-with: Cursor

thomasrockhu-codecov mentioned this pull request Mar 11, 2026

feat(worker): schedule one upload_finisher via redis gate #757

Merged

3 tasks

fix(worker): mark successfully merged uploads as merged

6cab9cc

Update upload state transitions so successful finisher merges write MERGED state, and align finisher test expectations with the new terminal state. Made-with: Cursor

cursor bot reviewed Mar 11, 2026

View reviewed changes

apps/worker/tasks/tests/unit/test_upload_finisher_task.py Outdated Show resolved Hide resolved

test(worker): remove stale finisher concurrency gate tests

95a8a79

Drop old per-commit concurrency limiter test coverage that no longer matches upload_finisher behavior and was failing test module import. Made-with: Cursor

cursor bot reviewed Mar 11, 2026

View reviewed changes

apps/worker/tasks/upload_finisher.py Outdated Show resolved Hide resolved

fix(worker): prefer callback payload over reconstructed finisher data

126a0b6

Use reconstructed results as the base and let callback payload win on duplicate upload IDs so stale reconstruction cannot overwrite successful processor outcomes. Made-with: Cursor

sentry bot reviewed Mar 11, 2026

View reviewed changes

apps/worker/tasks/upload_finisher.py Outdated Show resolved Hide resolved

cursor bot reviewed Mar 11, 2026

View reviewed changes

apps/worker/services/processing/processing.py Show resolved Hide resolved

fix(worker): reconstruct all processed uploads for finisher merge

6e3b939

Use the full processed set instead of random sampling so finisher reconstruction cannot miss uploads and produce incomplete merged reports. Made-with: Cursor

drazisil-codecov reviewed Mar 11, 2026

View reviewed changes

apps/worker/tasks/upload_finisher.py Outdated Show resolved Hide resolved

drazisil-codecov approved these changes Mar 11, 2026

View reviewed changes

refactor(worker): tighten finisher idempotency to merged/error states

8852b60

Treat only merged and error uploads as already completed and rename the guard variable to all_already_merged for clearer intent. Made-with: Cursor

sentry bot reviewed Mar 11, 2026

View reviewed changes

apps/worker/tasks/upload_finisher.py Outdated Show resolved Hide resolved

cursor bot reviewed Mar 11, 2026

View reviewed changes

apps/worker/tasks/upload_finisher.py Outdated Show resolved Hide resolved

fix(worker): keep processed state terminal during finisher rollout

b75054e

Treat processed uploads as already finalized in finisher idempotency checks to avoid duplicate merges during mixed-version deployments. Made-with: Cursor

sentry bot reviewed Mar 11, 2026

View reviewed changes

apps/worker/services/processing/merging.py Show resolved Hide resolved

thomasrockhu-codecov added 2 commits March 12, 2026 00:51

fix(worker): handle unsuccessful results without error payload

4322546

Guard update_uploads against malformed unsuccessful processing results by treating missing error payloads as unknown_processing instead of crashing with UnboundLocalError. Made-with: Cursor

test(worker): cover update_uploads fallback for missing error payload

b4a0fcd

Add a regression test for unsuccessful processing results without an error dict to ensure we persist UNKNOWN_PROCESSING instead of crashing. Made-with: Cursor

cursor bot reviewed Mar 11, 2026

View reviewed changes

apps/worker/tasks/upload_finisher.py Show resolved Hide resolved

apps/worker/services/processing/state.py Show resolved Hide resolved

thomasrockhu-codecov force-pushed the tomhu/finisher-source-of-truth branch from b2da3b8 to c3ff0f9 Compare March 11, 2026 19:28

sentry bot reviewed Mar 11, 2026

View reviewed changes

apps/worker/services/processing/processing.py Show resolved Hide resolved

cursor bot reviewed Mar 11, 2026

View reviewed changes

apps/worker/services/processing/processing.py Show resolved Hide resolved

apps/worker/tasks/tests/unit/test_upload_finisher_task.py Outdated Show resolved Hide resolved

apps/worker/services/processing/merging.py Show resolved Hide resolved

cursor bot reviewed Mar 11, 2026

View reviewed changes

apps/worker/tasks/upload_finisher.py Show resolved Hide resolved

thomasrockhu-codecov added 2 commits March 12, 2026 05:11

fix(worker): address PR #756 review thread issues

2a0e36e

Persist processed-state visibility before finisher enqueue, remove redundant merged transition in finisher, and update idempotency-era test expectations to match state-driven reconstruction behavior. Made-with: Cursor

fix(worker): restore processing-state task wiring for worker CI

7dcdac6

Pass db_session into ProcessingState usage in upload scheduling and align finisher unit setup with state-reconstruction behavior so worker CI paths continue to run under DB-backed processing state. Made-with: Cursor

fix: make updates

6d43a16

sentry bot reviewed Mar 11, 2026

View reviewed changes

apps/worker/services/processing/processing.py Show resolved Hide resolved

cursor bot reviewed Mar 11, 2026

View reviewed changes

apps/worker/tasks/upload_merger.py Outdated Show resolved Hide resolved

apps/worker/tasks/upload_merger.py Outdated Show resolved Hide resolved

thomasrockhu-codecov added 3 commits March 12, 2026 05:35

style(worker): apply repo pre-commit formatting

93f3146

Apply ruff formatting output required by the CI Run Lint hook for touched worker files. Made-with: Cursor

fix(worker): preserve merged state updates in upload_merger loop

53bc3ab

Return the updated master report from batch merges and stop rolling back before final save so staged upload state transitions are retained through commit. Made-with: Cursor

fix: make edits

a0cbf7e

cursor bot reviewed Mar 11, 2026

View reviewed changes

fix: more updates

ceb7614

sentry bot reviewed Mar 11, 2026

View reviewed changes

cursor bot reviewed Mar 11, 2026

View reviewed changes

sentry bot reviewed Mar 11, 2026

View reviewed changes

cursor bot reviewed Mar 11, 2026

View reviewed changes

		# this is a noop in normal cases, but relevant for task retries:
		state.mark_uploads_as_processing([upload_id])

Conversation

thomasrockhu-codecov commented Mar 11, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Uh oh!

sentry bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

codecov-notifications bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

drazisil-codecov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot Mar 11, 2026

Choose a reason for hiding this comment

Reconstruction loses error status when intermediate report exists

Uh oh!

cursor bot Mar 11, 2026

Choose a reason for hiding this comment

Stuck uploads cron passes empty YAML configuration

Uh oh!

cursor bot Mar 11, 2026

Choose a reason for hiding this comment

Unused diff parameter in merge_batch function signature

Uh oh!

Uh oh!

sentry bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

cursor bot Mar 11, 2026

Choose a reason for hiding this comment

Inverted time check causes merge loop to never execute

Uh oh!

cursor bot Mar 11, 2026

Choose a reason for hiding this comment

Unused diff parameter in merge_batch function

Uh oh!

sentry bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Mar 11, 2026

Choose a reason for hiding this comment

Diff applied multiple times in multi-batch finisher loop

Uh oh!

cursor bot Mar 11, 2026

Choose a reason for hiding this comment

ProcessingState test suite broken by API change

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

thomasrockhu-codecov commented Mar 11, 2026 •

edited by cursor bot

Loading

sentry bot commented Mar 11, 2026 •

edited

Loading

codecov-notifications bot commented Mar 11, 2026 •

edited

Loading

Unused `diff` parameter in `merge_batch` function signature

Unused `diff` parameter in `merge_batch` function