[FLINK-39777] Support configurable HashFunction strategies in PrePartitionOperator by haruki-830 · Pull Request #4423 · apache/flink-cdc

haruki-830 · 2026-06-03T01:45:26Z

Summary

This commit adds configurable partitioning strategy support to Flink CDC pipelines, enabling users to switch to table-id hashing for small tables via YAML configuration, reducing unnecessary primary-key hashing overhead.

Key Changes

New HashFunctionStrategy Enum

Introduced HashFunctionStrategy enum with two options: PRIMARY_KEY (hash by TableId + primary keys) and TABLE_ID (hash by TableId only).
Designed with @PublicEvolving annotation, allowing future strategies like ROUND_ROBIN or COLUMNS.

New TableIdHashFunctionProvider

Added TableIdHashFunctionProvider that computes hash based solely on TableId.
Uses singleton pattern for HashFunction since it is stateless.
Suitable for small tables, or tables with no/changing primary keys.

Pipeline Configuration Option

Added pipeline option "partitioning.strategy" to allow switching strategies via YAML.
When unset, behavior is identical to current, ensuring full backward compatibility.

Comprehensive Testing

Added TableIdHashFunctionProviderTest with 7 test.
Added PrePartitionOperatorTest case verifying TABLE_ID strategy.
Added FlinkPipelineComposerTest configurations for PRIMARY_KEY and TABLE_ID strategy validation.

Configuration Example

Route same-table events to a single subtask

pipeline:
  name: my-cdc-job
  partitioning.strategy: TABLE_ID

Force load-balanced distribution by primary keys

pipeline:
  name: my-cdc-job
  partitioning.strategy: PRIMARY_KEY

JIRA Reference

https://issues.apache.org/jira/browse/FLINK-39777

…itionOperator

haruki-830 · 2026-06-03T03:41:02Z

@lvyanquan could you please help review this PR?

lvyanquan

Thanks @haruki-830 for the feature, left some comments.

lvyanquan · 2026-06-03T05:34:06Z

                            "The timeout time for SchemaOperator to wait downstream SchemaChangeEvent applying finished, the default value is 3 minutes.");

+    public static final ConfigOption<HashFunctionStrategy> PIPELINE_PARTITIONING_STRATEGY =
+            ConfigOptions.key("partitioning.strategy")


Please update the document to guide users on how to use it.

lvyanquan · 2026-06-03T06:34:36Z

Please also update https://github.com/apache/flink-cdc/blob/master/flink-cdc-cli/src/test/resources/definitions/pipeline-definition-full.yaml#L56 / https://github.com/apache/flink-cdc/blob/master/flink-cdc-cli/src/test/resources/definitions/pipeline-definition-full-with-repsym.yaml#L59 and related tests.

lvyanquan · 2026-06-03T12:25:18Z

+            Optional.ofNullable(tableId.getNamespace()).ifPresent(objectsToHash::add);
+            Optional.ofNullable(tableId.getSchemaName()).ifPresent(objectsToHash::add);
+            objectsToHash.add(tableId.getTableName());
+            this.cachedHash = (Objects.hash(objectsToHash.toArray()) * 31) & 0x7FFFFFFF;


Using TableId#hashCode directly instead of Objects.hash(objectsToHash.toArray()) would be simpler.

[FLINK-39777] Support configurable HashFunction strategies in PrePart…

75c60f7

…itionOperator

github-actions Bot added composer common runtime labels Jun 3, 2026

haruki-830 marked this pull request as ready for review June 3, 2026 02:17

lvyanquan reviewed Jun 3, 2026

View reviewed changes

Address review comments

dd95108

github-actions Bot added the cli label Jun 3, 2026

lvyanquan reviewed Jun 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-39777] Support configurable HashFunction strategies in PrePartitionOperator#4423

[FLINK-39777] Support configurable HashFunction strategies in PrePartitionOperator#4423
haruki-830 wants to merge 2 commits into
apache:masterfrom
haruki-830:FLINK-39777

haruki-830 commented Jun 3, 2026 •

edited

Loading

Uh oh!

haruki-830 commented Jun 3, 2026

Uh oh!

lvyanquan left a comment •

edited

Loading

Uh oh!

lvyanquan Jun 3, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lvyanquan commented Jun 3, 2026 •

edited

Loading

Uh oh!

lvyanquan Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

haruki-830 commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Changes

Configuration Example

Route same-table events to a single subtask

Force load-balanced distribution by primary keys

JIRA Reference

Uh oh!

haruki-830 commented Jun 3, 2026

Uh oh!

lvyanquan left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lvyanquan Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lvyanquan commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lvyanquan Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

haruki-830 commented Jun 3, 2026 •

edited

Loading

lvyanquan left a comment •

edited

Loading

lvyanquan commented Jun 3, 2026 •

edited

Loading