[FLINK-39232] Loosen transformed schema merging validation check#4315
[FLINK-39232] Loosen transformed schema merging validation check#4315yuxiqian merged 1 commit intoapache:masterfrom
Conversation
|
@leonardBang @lvyanquan Would you like to take a look? |
There was a problem hiding this comment.
Pull request overview
This PR simplifies the transform module by enforcing strict first-match semantics: only the first matching transform rule (by source-table selector) is applied per table, eliminating the previous behavior where multiple rules with filters could cascade. This removes the need for schema merging validation across multiple matching rules.
Changes:
PostTransformOperatornow selects a singleOptional<PostTransformer>perTableId(with a GuavaLoadingCache), removing the multi-rule iteration andSchemaMergingUtils.strictlyMergeSchemasusage.SchemaMergingUtils.strictlyMergeSchemasand related helper methods are deleted as they are no longer needed.- Tests are updated: multi-rule dispatch tests are either removed or rewritten to use
CASE WHENexpressions within a single rule.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| pom.xml | Removes <scope>test</scope> from commons-codec (unrelated change) |
| PostTransformOperator.java | Core logic change: single-transformer first-match with LoadingCache |
| SchemaMergingUtils.java | Removes strictlyMergeSchemas and strictlyMergeDataTypes methods |
| PostTransformOperatorTest.java | Removes/rewrites multi-rule tests to single-rule CASE WHEN |
| FlinkPipelineTransformITCase.java | Rewrites multi-dispatch tests, adds fallback rule tests |
| FlinkPipelineComposerITCase.java | Removes testTransformTwice test |
| FlinkPipelineComposerLenientITCase.java | Removes testTransformTwice test |
| FlinkPipelineBatchComposerITCase.java | Removes testTransformTwiceInBatchMode test |
| TransformE2eITCase.java | Rewrites multi-rule E2E tests to single CASE WHEN rules |
| SchemaEvolvingTransformE2eITCase.java | Rewrites multi-rule to single CASE WHEN rule |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -366,7 +366,6 @@ limitations under the License. | |||
| <groupId>commons-codec</groupId> | |||
| <artifactId>commons-codec</artifactId> | |||
| <version>1.15</version> | |||
There was a problem hiding this comment.
I believe it's used in Calcite because java.lang.NoClassDefFoundError: org/apache/commons/codec/language/Soundex pops out after the modification.
leonardBang
left a comment
There was a problem hiding this comment.
Thanks @yuxiqian for the improvement, +1
This closes FLINK-39232.
Currently, transform module supports a bizarre feature: one may define multiple transform rules like this:
Data rows in
mydb.web_orderthat matchesPREDICATEwill be handled in rule #1, while others should be handled in rule #2.Such behavior also requires partially applied rules must emit compatible schemas, sometimes unwanted. A common idiom would be like this:
This isn't possible currently, since records in
foo.barthat fall through the filter branch will be merged with the second transform rule, forcing their schemas to be identical.