Skip to content

[fix](streaming-job) fix postgres historical-date timestamp handling in cdc-client#63618

Open
JNSimba wants to merge 5 commits into
apache:masterfrom
JNSimba:fix/streaming-job-pg-snapshot-calendar-drift
Open

[fix](streaming-job) fix postgres historical-date timestamp handling in cdc-client#63618
JNSimba wants to merge 5 commits into
apache:masterfrom
JNSimba:fix/streaming-job-pg-snapshot-calendar-drift

Conversation

@JNSimba
Copy link
Copy Markdown
Member

@JNSimba JNSimba commented May 25, 2026

What problem does this PR solve?

Problem Summary:

When a Postgres CDC streaming job ingests rows whose timestamp / date columns hold historical values (pre-1970 with sub-millisecond precision, or pre-1582 / pre-1901 dates), two independent bugs in cdc-client cause data corruption or task crash:

  1. DebeziumJsonDeserializer.convertTimestamp uses signed / and % on negative micros / nanos, producing a negative nanoOfMillisecond and tripping Flink TimestampData's checkArgument(nanoOfMillisecond >= 0). Result: the ingestion task crashes whenever a pre-1970 timestamp with sub-millisecond precision flows through (e.g. 1969-12-31 23:59:59.999123).

  2. The snapshot path reads column values via rs.getObject(), which routes through PG JDBC's TimestampUtils + GregorianCalendar. For pre-1582 timestamps the Julian/proleptic cutover shifts values by N days; for pre-1901 timestamps the JVM time zone's LMT offset shifts values by the LMT difference (e.g. ~343s in Asia/Shanghai). Result: the same PG value (e.g. 0001-01-01 00:00:00) yields different doris values depending on whether the row was synced via snapshot or via binlog.

This PR fixes both:

  1. Use Math.floorDiv / Math.floorMod so the millisecond / nanosecond split stays valid for negative epoch values.
  2. Dispatch TIMESTAMP / TIMESTAMPTZ / DATE columns through LocalDateTime / OffsetDateTime / LocalDate in the snapshot reader, bypassing GregorianCalendar entirely. Preserve the legacy Timestamp(Long.MAX/MIN_VALUE) sentinel for +/-infinity.

Release note

Fix postgres CDC streaming job ingestion crash and value drift for historical-date timestamp / date columns.

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
  • Behavior changed:

    • No.
  • Does this need documentation?

    • No.

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@JNSimba
Copy link
Copy Markdown
Member Author

JNSimba commented May 25, 2026

/review

@JNSimba
Copy link
Copy Markdown
Member Author

JNSimba commented May 25, 2026

run buildall

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review result: request changes.

Critical checkpoint conclusions:

  • Goal/test proof: The code changes target PostgreSQL CDC historical timestamp/date consistency and add unit/regression coverage, but the regression coverage is incomplete because the new suite has qt_ checks without the required expected .out file.
  • Scope/focus: The implementation is focused on the CDC snapshot/binlog conversion paths; the large copied shadow classes are consistent with existing cdc_client patching practice, though they increase maintenance cost.
  • Concurrency: The changed conversion logic is per-row/per-record and does not add shared mutable concurrency concerns; snapshot task slot lifecycle remains in the existing fetch-task flow.
  • Lifecycle/static initialization: No new cross-TU/static initialization concerns found.
  • Configuration/compatibility: No new config or storage/protocol compatibility issue found.
  • Parallel paths: Snapshot and binlog timestamp/date paths were reviewed; the PR updates snapshot extraction and deserializer conversion paths relevant to the described issue.
  • Special conditions: Historical values and nulls are covered; infinity handling is partially preserved for timestamp/timestamptz and I did not identify a definite blocker there.
  • Test coverage/results: Blocking issue found: missing generated regression result file for the new qt_ assertions.
  • Observability/transactions/persistence/FE-BE variables: Not materially involved in this PR.
  • Performance: No obvious hot-path performance regression found in the reviewed changes.

User focus: No additional user-provided review focus was supplied.

@JNSimba
Copy link
Copy Markdown
Member Author

JNSimba commented May 25, 2026

/review

@JNSimba
Copy link
Copy Markdown
Member Author

JNSimba commented May 25, 2026

run buildall

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one blocking correctness gap in the PostgreSQL snapshot timestamp/date fix.

Critical checkpoints:

  • Goal/test proof: the scalar historical timestamp/date snapshot vs binlog drift is covered by the new regression suite and unit tests, and the previous missing golden-file concern is now addressed in this head SHA. Temporal array coverage is still missing.
  • Scope/focus: the change is mostly focused, but the copied connector overrides need to preserve all affected type paths.
  • Concurrency/lifecycle: no new Doris-side shared mutable concurrency issue found in the reviewed paths; slot lifecycle follows the copied Flink CDC flow.
  • Config/compatibility: no new config or storage/protocol compatibility issue found.
  • Parallel paths: scalar timestamp/date paths were updated, but PostgreSQL temporal arrays remain on the old PgArray path and can still diverge.
  • Testing: regression coverage covers scalar timestamp, timestamptz, and date, but not array variants, which are supported by the existing PG array mapping.
  • Observability/performance/transactions: no additional blocking issue found.

User focus: no additional user-provided review focus was supplied.

@JNSimba
Copy link
Copy Markdown
Member Author

JNSimba commented May 26, 2026

/review

@JNSimba
Copy link
Copy Markdown
Member Author

JNSimba commented May 26, 2026

run buildall

@JNSimba
Copy link
Copy Markdown
Member Author

JNSimba commented May 26, 2026

run external

@github-actions
Copy link
Copy Markdown
Contributor

OpenCode automated review failed and did not complete.

Error: Review step was failure (possibly timeout or cancelled)
Workflow run: https://github.com/apache/doris/actions/runs/26430949466

Please inspect the workflow logs and rerun the review after the underlying issue is resolved.

@JNSimba
Copy link
Copy Markdown
Member Author

JNSimba commented May 26, 2026

run external

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants