[FLINK-39749][mysql-cdc] Support configurable string chunk key comparison mode to align with MySQL collation#4413
Open
ziyanTOP wants to merge 1 commit into
Open
Conversation
1c15142 to
9e4bea6
Compare
…pare-mode option This commit introduces a new configuration option `scan.incremental.snapshot.string-key.compare-mode` to fix chunk splitting and binlog event routing issues when MySQL collation differs from Java's natural String ordering. Problem: - Java String.compareTo() is case-sensitive (Unicode code point order). - MySQL collations like utf8mb4_general_ci are case-insensitive. - This mismatch causes chunk boundaries computed by Java to diverge from actual MySQL row ordering, leading to premature unbounded chunks, overlapping splits, or lost binlog events. Solution: - Introduce ChunkKeyCompareMode enum: DEFAULT, CASE_INSENSITIVE, BINARY. - DEFAULT: preserves existing behavior (String.compareTo()). - CASE_INSENSITIVE: uses String.compareToIgnoreCase() for Java-side comparisons. - BINARY: injects BINARY keyword in SQL predicates and uses byte-level comparison in Java. Changes cover all three API layers: - DataStream API (MySqlSourceBuilder) - Flink SQL (MySqlTableSourceFactory) - Pipeline YAML (MySqlDataSourceFactory) Also updates documentation (EN + ZH) and adds test coverage. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
9e4bea6 to
9a43277
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What is the purpose of the change
Fix chunk splitting and binlog event routing issues when MySQL collation differs from Java's natural String ordering.
Java `String.compareTo()` is case-sensitive (Unicode code point order), while MySQL collations like `utf8mb4_general_ci` are case-insensitive. This mismatch causes chunk boundaries computed by Java to diverge from actual MySQL row ordering, leading to premature unbounded chunks, overlapping splits, or lost binlog events.
See FLINK-39749 for details.
Brief change log
Verifying this change
This change is already covered by existing tests (`MySqlTableSourceFactoryTest`) and has been verified in a production Pipeline job (MySQL -> Paimon) with `CHAR(36)` UUID primary key and `utf8mb4_general_ci` collation.
Does this pull request potentially affect one of the following parts
Documentation