Skip to content

[FLINK-39232] Loosen transformed schema merging validation check#4315

Merged
yuxiqian merged 1 commit intoapache:masterfrom
yuxiqian:FLINK-39232
Mar 16, 2026
Merged

[FLINK-39232] Loosen transformed schema merging validation check#4315
yuxiqian merged 1 commit intoapache:masterfrom
yuxiqian:FLINK-39232

Conversation

@yuxiqian
Copy link
Member

@yuxiqian yuxiqian commented Mar 16, 2026

This closes FLINK-39232.

Currently, transform module supports a bizarre feature: one may define multiple transform rules like this:

transform:
  - source-table: foo.bar
    projection: \*, proj_1
    filter: PREDICATE
  - source-table: foo.bar
    projection: \*, proj_2

Data rows in mydb.web_order that matches PREDICATE will be handled in rule #1, while others should be handled in rule #2.

Such behavior also requires partially applied rules must emit compatible schemas, sometimes unwanted. A common idiom would be like this:

transform:
  - source-table: foo.bar
    filter: id > 11037
  - source-table: \.*.\.*
    projection: \*, 'fallback_branch' AS extras

This isn't possible currently, since records in foo.bar that fall through the filter branch will be merged with the second transform rule, forcing their schemas to be identical.

@yuxiqian
Copy link
Member Author

@leonardBang @lvyanquan Would you like to take a look?

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR simplifies the transform module by enforcing strict first-match semantics: only the first matching transform rule (by source-table selector) is applied per table, eliminating the previous behavior where multiple rules with filters could cascade. This removes the need for schema merging validation across multiple matching rules.

Changes:

  • PostTransformOperator now selects a single Optional<PostTransformer> per TableId (with a Guava LoadingCache), removing the multi-rule iteration and SchemaMergingUtils.strictlyMergeSchemas usage.
  • SchemaMergingUtils.strictlyMergeSchemas and related helper methods are deleted as they are no longer needed.
  • Tests are updated: multi-rule dispatch tests are either removed or rewritten to use CASE WHEN expressions within a single rule.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
pom.xml Removes <scope>test</scope> from commons-codec (unrelated change)
PostTransformOperator.java Core logic change: single-transformer first-match with LoadingCache
SchemaMergingUtils.java Removes strictlyMergeSchemas and strictlyMergeDataTypes methods
PostTransformOperatorTest.java Removes/rewrites multi-rule tests to single-rule CASE WHEN
FlinkPipelineTransformITCase.java Rewrites multi-dispatch tests, adds fallback rule tests
FlinkPipelineComposerITCase.java Removes testTransformTwice test
FlinkPipelineComposerLenientITCase.java Removes testTransformTwice test
FlinkPipelineBatchComposerITCase.java Removes testTransformTwiceInBatchMode test
TransformE2eITCase.java Rewrites multi-rule E2E tests to single CASE WHEN rules
SchemaEvolvingTransformE2eITCase.java Rewrites multi-rule to single CASE WHEN rule

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@@ -366,7 +366,6 @@ limitations under the License.
<groupId>commons-codec</groupId>
<artifactId>commons-codec</artifactId>
<version>1.15</version>
Copy link
Member Author

@yuxiqian yuxiqian Mar 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's used in Calcite because java.lang.NoClassDefFoundError: org/apache/commons/codec/language/Soundex pops out after the modification.

Copy link
Contributor

@leonardBang leonardBang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @yuxiqian for the improvement, +1

@yuxiqian yuxiqian merged commit 5e21883 into apache:master Mar 16, 2026
38 of 40 checks passed
@yuxiqian yuxiqian deleted the FLINK-39232 branch March 16, 2026 13:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants