fix(data-warehouse): use inclusive cursor on date-typed incremental sync by Gilbert09 · Pull Request #59033 · PostHog/posthog

Gilbert09 · 2026-05-19T15:05:28Z

Problem

SQL data warehouse sources (Postgres, Redshift, MySQL, MSSQL, BigQuery, Snowflake) advance their incremental cursor with WHERE incremental_field > saved_cursor. For DateTime / Timestamp / Integer / Numeric / ObjectID columns the cursor carries enough resolution that > is safe — two distinct rows essentially never share the boundary value.

For IncrementalFieldType.Date, the cursor only carries day granularity. If sync A completes mid-day and saves cursor 2026-05-13, sync B's > '2026-05-13' skips every row that lands on 2026-05-13 after sync A advanced the cursor — those rows are silently dropped from the warehouse on every subsequent sync. The bug is invisible: sync B "succeeds" with zero new rows for the boundary day.

Changes

New helper incremental_type_to_operator(field_type) in posthog/temporal/data_imports/pipelines/helpers.py returns ">=" for Date and ">" for every other IncrementalFieldType. Centralizing the decision keeps the six SQL sources in sync.
All single-shot incremental cursor predicates now flow through the helper:
- postgres/postgres.py (main scan + count query)
- postgres/partitioned_tables.py (build_partition_query, called per child partition)
- redshift/redshift.py (sampled + non-sampled scans)
- mysql/mysql.py
- mssql/mssql.py
- bigquery/bigquery.py
- snowflake/snowflake.py
Postgres windowed mode (_build_query with upper_bound_inclusive set, used by iterate_date_windows) deliberately keeps > even for Date. Consecutive windows pass the previous window's hi as the next window's lo, so >= would re-fetch every boundary row inside a single run and produce duplicates between windows. A regression test (test_windowed_mode_keeps_exclusive_lower_bound_for_date) locks this behavior in.

Trade-off this PR accepts: for APPEND sync_type on a Date-typed cursor, the next sync after a cursor advance will re-ship rows at the boundary day. Without primary-key dedup those land as duplicates. This is a smaller harm than the current silent data loss, and INCREMENTAL sync_type (the much more common case) absorbs the re-shipped rows via primary-key dedup.

How did you test this code?

I am an agent. Only automated tests were run — no manual sync verification.

New parameterized helper test in posthog/temporal/data_imports/pipelines/test_helpers.py covering every IncrementalFieldType member.
New parameterized tests in posthog/temporal/data_imports/sources/postgres/test_postgres.py:
- TestBuildQuery.test_operator_matches_field_type (Date, DateTime, Timestamp, Integer)
- TestBuildQuery.test_count_query_operator_matches_field_type
- TestBuildPartitionQuery.test_operator_matches_field_type
- TestBuildQuery.test_windowed_mode_keeps_exclusive_lower_bound_for_date — regression guard for the windowed path
All 213 postgres builder/iterator tests + the 44 mysql tests pass against the new code, including the real-DB TestIterateDateWindowsRealDb::test_yields_all_rows_over_partitioned_table that caught the earlier draft over-applying >= to windowed mode.

Publish to changelog?

no

🤖 Agent context

Tooling: Claude Code (Opus 4.7).
Considered alternative: gate the >= switch on sync_type == INCREMENTAL so APPEND keeps >. Rejected because most SQL sources don't have sync_type available where the query is built, and the bug is severe enough (silent boundary-day loss across all syncs) that the small APPEND duplication is preferable to threading a new parameter through every source.
Considered alternative: step the saved cursor back by one day for Date types so > still captures the boundary day. Rejected as more invasive (every source's cursor advancement logic would need to know about it) and harder to reason about — the operator-side fix keeps the cursor's stored value honest and isolates the change to query construction.
The windowed Postgres path (iterate_date_windows) still has the same boundary-day loss between sync runs because the first window's lo is the saved cursor and uses >. Fixing that requires iterate_date_windows to step the initial lo back by one unit for Date types — left for a follow-up since it has its own correctness considerations around partition snap-forward (_step_back_one) and is a distinct codepath from the single-shot scan the customer-facing bug surfaces in.

greptile-apps · 2026-05-19T15:08:28Z

_{Reviews (1): Last reviewed commit: "fix(data-warehouse): use inclusive curso..." | Re-trigger Greptile}

Copilot

Pull request overview

This PR fixes a correctness bug in incremental syncs for SQL data warehouse sources when the incremental cursor is a Date (day-granularity) by switching the lower-bound predicate to be inclusive (>=) for Date cursors while keeping the existing exclusive behavior (>) for higher-resolution cursor types.

Changes:

Introduces incremental_type_to_operator() to centralize the cursor lower-bound operator choice (>= for IncrementalFieldType.Date, otherwise >).
Updates multiple SQL source query builders (Postgres, Redshift, MySQL, MSSQL, BigQuery, Snowflake) to use the helper for incremental predicates.
Adds/extends tests covering the helper mapping and Postgres query builders, including a regression test ensuring Postgres windowed mode keeps an exclusive lower bound for Date.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
posthog/temporal/data_imports/pipelines/helpers.py	Adds `incremental_type_to_operator()` helper to choose `>=` for Date cursors and `>` otherwise.
posthog/temporal/data_imports/pipelines/test_helpers.py	Adds unit test for `incremental_type_to_operator()` across all `IncrementalFieldType` members.
posthog/temporal/data_imports/sources/postgres/postgres.py	Uses the helper for single-shot incremental scans and count queries; keeps `>` in windowed mode.
posthog/temporal/data_imports/sources/postgres/partitioned_tables.py	Uses the helper in per-child partition incremental queries.
posthog/temporal/data_imports/sources/postgres/test_postgres.py	Adds tests asserting operator selection in main, count, partition queries, plus windowed-mode regression coverage.
posthog/temporal/data_imports/sources/redshift/redshift.py	Uses the helper for sampled and non-sampled incremental predicates.
posthog/temporal/data_imports/sources/mysql/mysql.py	Uses the helper in MySQL incremental WHERE clause.
posthog/temporal/data_imports/sources/mssql/mssql.py	Uses the helper in MSSQL incremental WHERE clause.
posthog/temporal/data_imports/sources/bigquery/bigquery.py	Uses the helper in BigQuery incremental query construction.
posthog/temporal/data_imports/sources/snowflake/snowflake.py	Uses the helper in Snowflake incremental query construction.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

deployment-status-posthog · 2026-05-20T14:48:54Z

Deploy status

Environment	Status	Deployed At	Workflow
dev	✅ Deployed	2026-05-20 14:48 UTC	Run
prod-us	✅ Deployed	2026-05-20 15:20 UTC	Run
prod-eu	✅ Deployed	2026-05-20 15:22 UTC	Run

fix(data-warehouse): use inclusive cursor on date-typed incremental sync

dcebda3

Copilot AI review requested due to automatic review settings May 19, 2026 15:05

assign-reviewers-posthog Bot requested a review from a team May 19, 2026 15:05

Copilot started reviewing on behalf of Gilbert09 May 19, 2026 15:06 View session

Copilot AI reviewed May 19, 2026

View reviewed changes

Comment thread posthog/temporal/data_imports/pipelines/test_helpers.py

MarconLP approved these changes May 19, 2026

View reviewed changes

Gilbert09 and others added 4 commits May 19, 2026 17:36

Potential fix for pull request finding

b8fc28b

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Merge branch 'master' into tom/date-cursor-inclusive

83a09ad

chore(data-warehouse): fix import sort in test_helpers

df5e358

Fix

bd8c3c7

Gilbert09 merged commit a81a147 into master May 20, 2026
212 checks passed

Gilbert09 deleted the tom/date-cursor-inclusive branch May 20, 2026 14:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(data-warehouse): use inclusive cursor on date-typed incremental sync#59033

fix(data-warehouse): use inclusive cursor on date-typed incremental sync#59033
Gilbert09 merged 5 commits into
masterfrom
tom/date-cursor-inclusive

Gilbert09 commented May 19, 2026

Uh oh!

greptile-apps Bot commented May 19, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

deployment-status-posthog Bot commented May 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Gilbert09 commented May 19, 2026

Problem

Changes

How did you test this code?

Publish to changelog?

🤖 Agent context

Uh oh!

greptile-apps Bot commented May 19, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

deployment-status-posthog Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploy status

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

deployment-status-posthog Bot commented May 20, 2026 •

edited

Loading