fix(ws): stop v3 sync jobs getting stuck in running by estefaniarabadan · Pull Request #60642 · PostHog/posthog

estefaniarabadan · 2026-05-29T10:31:55Z

Problem

V3 warehouse-source syncs sometimes finish with a split state: the data loads fine and the ExternalDataSchema is Completed, but the ExternalDataJob is stuck in Running with finished_at = NULL forever.
Nothing fixes it automatically, so it shows as a phantom "still running" sync and inflates the "running jobs" counts used for billing/usage.

It's a race between two processes that both write the job row:

the consumer (warehouse-sources-load) marks the job Completed.
the post-extraction Temporal activity calculate_table_size_activity reads the job while it's still Running, does a slow S3 size lookup, then calls an unscoped job.save() that writes back every column.

When that save lands after the consumer's completion, it overwrites the whole row from its stale in-memory copy reverting status to Running and clearing finished_at (and never touching the schema, hence the split).

Changes

workflow_activities/calculate_table_size.py: scoped the job write to job.save(update_fields=["storage_delta_mib", "updated_at"]) so it can no longer overwrite status or finished_at. This is the confirmed clobber.
pipelines/common/extract.py: applied the same scoping to reset_rows_synced_if_needed's save (update_fields=["rows_synced", "updated_at"]), a latent twin that can fire on an extraction retry. No behavior change for the non-DLT pipeline, since Running is persisted independently by the create-job activity.
pipelines/pipeline_v3/postgres_queue/consumer.py: renamed the bound log contextvars schema_id/source_id/job_id to external_data_schema_id/external_data_source_id/external_data_job_id (both the per-batch bind and the recovery-sweep bind) to match the producer and make trouble shooting easier in logs.
pipelines/pipeline_v3/load/processor.py: renamed the same keys on the processor's log calls. Function-call params and Prometheus metric labels were left unchanged.
pipelines/pipeline_v3/postgres_queue/test_consumer.py: updated the bound-context assertion for the renamed key.
tests/data_imports/test_calculate_table_size.py: new regression test that injects a concurrent Completed between the activity's read and save, then asserts the status survives while storage_delta_mib is still written.

How did you test this code?

Test run

Automatic notifications

Publish to changelog?
Alert Sales and Marketing teams?

Docs update

NO

we want to be able to filter by this in the logs and have both producer and consumer

this avoid stale running state to be writen in the DB

greptile-apps · 2026-05-29T10:34:41Z

_{Reviews (1): Last reviewed commit: "make job saves to only updated intended ..." | Re-trigger Greptile}

danielcarletti

Love the job_id -> external_data_job_id (and similar) changes

deployment-status-posthog · 2026-05-29T13:55:40Z

Deploy status

Environment	Status	Deployed At	Workflow
dev	✅ Deployed	2026-05-29 13:55 UTC	Run
prod-us	✅ Deployed	2026-05-29 14:26 UTC	Run
prod-eu	✅ Deployed	2026-05-29 14:37 UTC	Run

estefaniarabadan added 3 commits May 29, 2026 12:02

rename log keys to match the producer ones

23a1744

we want to be able to filter by this in the logs and have both producer and consumer

update logger calls on the processor

76c9673

make job saves to only updated intended columns

1b2a93b

this avoid stale running state to be writen in the DB

estefaniarabadan added the skip-inkeep-docs Use this label to skip an Inkeep docs PR in posthog.com label May 29, 2026

estefaniarabadan requested a review from a team May 29, 2026 10:32

Gilbert09 approved these changes May 29, 2026

View reviewed changes

Comment thread posthog/temporal/data_imports/pipelines/common/extract.py Outdated

danielcarletti approved these changes May 29, 2026

View reviewed changes

fix CI and address PR comment

9bd5b20

estefaniarabadan merged commit 19f224a into master May 29, 2026
200 checks passed

estefaniarabadan deleted the estefania/race-condition-v3 branch May 29, 2026 13:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ws): stop v3 sync jobs getting stuck in running#60642

fix(ws): stop v3 sync jobs getting stuck in running#60642
estefaniarabadan merged 4 commits into
masterfrom
estefania/race-condition-v3

estefaniarabadan commented May 29, 2026

Uh oh!

greptile-apps Bot commented May 29, 2026

Uh oh!

Uh oh!

danielcarletti left a comment

Uh oh!

Uh oh!

deployment-status-posthog Bot commented May 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

estefaniarabadan commented May 29, 2026

Problem

Changes

How did you test this code?

Automatic notifications

Docs update

Uh oh!

greptile-apps Bot commented May 29, 2026

Uh oh!

Uh oh!

danielcarletti left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

deployment-status-posthog Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploy status

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

deployment-status-posthog Bot commented May 29, 2026 •

edited

Loading