Skip to content

[FLINK-39749][mysql-cdc] Support configurable string chunk key comparison mode to align with MySQL collation#4413

Open
ziyanTOP wants to merge 1 commit into
apache:masterfrom
ziyanTOP:flink-cdc-string-key-compare-mode
Open

[FLINK-39749][mysql-cdc] Support configurable string chunk key comparison mode to align with MySQL collation#4413
ziyanTOP wants to merge 1 commit into
apache:masterfrom
ziyanTOP:flink-cdc-string-key-compare-mode

Conversation

@ziyanTOP
Copy link
Copy Markdown

What is the purpose of the change

Fix chunk splitting and binlog event routing issues when MySQL collation differs from Java's natural String ordering.

Java `String.compareTo()` is case-sensitive (Unicode code point order), while MySQL collations like `utf8mb4_general_ci` are case-insensitive. This mismatch causes chunk boundaries computed by Java to diverge from actual MySQL row ordering, leading to premature unbounded chunks, overlapping splits, or lost binlog events.

See FLINK-39749 for details.

Brief change log

  • Introduce `ChunkKeyCompareMode` enum: `DEFAULT`, `CASE_INSENSITIVE`, `BINARY`
  • `DEFAULT`: preserves existing behavior (`String.compareTo()`)
  • `CASE_INSENSITIVE`: uses `String.compareToIgnoreCase()` for Java-side comparisons
  • `BINARY`: injects `BINARY` keyword in SQL predicates and uses byte-level comparison in Java
  • Cover all three API layers: DataStream API, Flink SQL, Pipeline YAML
  • Update documentation (EN + ZH) and add test coverage

Verifying this change

This change is already covered by existing tests (`MySqlTableSourceFactoryTest`) and has been verified in a production Pipeline job (MySQL -> Paimon) with `CHAR(36)` UUID primary key and `utf8mb4_general_ci` collation.

Does this pull request potentially affect one of the following parts

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with `@Public(Evolving)`: no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): yes (affects chunk key comparison in snapshot/binlog phases)
  • Anything that affects deployment or recovery: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? yes
  • If yes, how is the feature documented? docs

@github-actions github-actions Bot added docs Improvements or additions to documentation mysql-cdc-connector mysql-pipeline-connector labels May 25, 2026
@ziyanTOP ziyanTOP changed the title [FLINK-39749][mysql-cdc] Add scan.incremental.snapshot.string-key.compare-mode option [FLINK-39749][mysql-cdc] Support configurable string chunk key comparison mode to align with MySQL collation May 25, 2026
@ziyanTOP ziyanTOP force-pushed the flink-cdc-string-key-compare-mode branch from 1c15142 to 9e4bea6 Compare May 25, 2026 10:25
…pare-mode option

This commit introduces a new configuration option `scan.incremental.snapshot.string-key.compare-mode`
to fix chunk splitting and binlog event routing issues when MySQL collation differs from Java's
natural String ordering.

Problem:
- Java String.compareTo() is case-sensitive (Unicode code point order).
- MySQL collations like utf8mb4_general_ci are case-insensitive.
- This mismatch causes chunk boundaries computed by Java to diverge from actual MySQL row ordering,
  leading to premature unbounded chunks, overlapping splits, or lost binlog events.

Solution:
- Introduce ChunkKeyCompareMode enum: DEFAULT, CASE_INSENSITIVE, BINARY.
- DEFAULT: preserves existing behavior (String.compareTo()).
- CASE_INSENSITIVE: uses String.compareToIgnoreCase() for Java-side comparisons.
- BINARY: injects BINARY keyword in SQL predicates and uses byte-level comparison in Java.

Changes cover all three API layers:
- DataStream API (MySqlSourceBuilder)
- Flink SQL (MySqlTableSourceFactory)
- Pipeline YAML (MySqlDataSourceFactory)

Also updates documentation (EN + ZH) and adds test coverage.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ziyanTOP ziyanTOP force-pushed the flink-cdc-string-key-compare-mode branch from 9e4bea6 to 9a43277 Compare May 25, 2026 10:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs Improvements or additions to documentation mysql-cdc-connector mysql-pipeline-connector

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant