[FLINK-32001][table] Row-level update should support returning partial columns #22525

luoyuxia · 2023-05-05T11:43:17Z

What is the purpose of the change

To make row-level update support returning partial columns. Without this pr, ArrayIndexOutOfBoundsException may happen in ConstraintEnforcer operator just like FLINK-32001 reported.

Brief change log

During converting updating & delete RelNode, recording the required physical coulmns.
Use the required physical coulmns recorded for row-level update & delete to prune the physical fields in CommonExecSink so that the ConstraintEnforcer won't check the coulmns not existed in the columns to be written.
Modify the logic of TestUpdateDeleteTableFactory to make it can delete / update rows with partial columns.

Verifying this change

Added test in UpdateTableITCase#testPartialUpdate,DeleteTableITCase#testRowLevelDeleteWithPartitionColumn.

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): (yes / no)
The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
The serializers: (yes / no / don't know)
The runtime per-record code paths (performance sensitive): (yes / no / don't know)
Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
The S3 file system connector: (yes / no / don't know)

Documentation

Does this pull request introduce a new feature? (yes / no)
If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

flinkbot · 2023-05-05T11:46:34Z

CI report:

c547ed4 Azure: SUCCESS

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot run azure re-run the last Azure build

fsk119

Thanks for your contribution. I left some comments.

fsk119 · 2023-05-16T09:55:51Z

...nner/src/main/java/org/apache/flink/table/planner/plan/nodes/exec/common/CommonExecSink.java

+                if (sinkAbilitySpec instanceof RowLevelUpdateSpec) {
+                    RowLevelUpdateSpec rowLevelUpdateSpec = (RowLevelUpdateSpec) sinkAbilitySpec;
+                    return getPhysicalRowType(schema, rowLevelUpdateSpec.getRequireColumnIndices());
+                } else if (sinkAbilitySpec instanceof RowLevelDeleteSpec) {
+                    RowLevelDeleteSpec rowLevelDeleteSpec = (RowLevelDeleteSpec) sinkAbilitySpec;
+                    return getPhysicalRowType(
+                            schema, rowLevelDeleteSpec.getRequiredPhysicalColumnIndices());


I just wonder whether it's better to introduce a method like Optional<RowType> getConsumedType for this sink ability spec. Actually, we already have getProducedType in the source ability spec.

There're many places use getProducedType in the source ability spec, but seems it's the only place to use something like Optional<RowType> getConsumedType. I think we can just limit it in here to avoid think too much to early and consider to expose in SinkabilitySpec if we found it'll be needed in many places in the future.

Fine. We can modify this if we needed in the future.

fsk119 · 2023-05-16T09:59:16Z

...ner/src/main/java/org/apache/flink/table/planner/plan/abilities/sink/RowLevelDeleteSpec.java

 */
 @JsonIgnoreProperties(ignoreUnknown = true)
 @JsonTypeName("RowLevelDelete")
 public class RowLevelDeleteSpec implements SinkAbilitySpec {
    public static final String FIELD_NAME_ROW_LEVEL_DELETE_MODE = "rowLevelDeleteMode";
+    public static final String FIELD_NAME_REQUIRED_PHYSICAL_COLUMN_INDICES =
+            "requiredPhysicalColumnIndices";


ProjectPushDownSpec also contains an array to mark the projection. I think requiredPhysicalColumn is enough.

What do you mean by saying requiredPhysicalColumn is enough? AFAIC, ProjectPushDownSpec also contains an array to mark the projection, so we can also make RowLevelDeleteSpec contains an array to mark the required physical columns.

oh. I mean it's better to rename to requiredPhysicalColumn to align the name behaviour. It's fine to use requiredPhysicalColumnIndices

fsk119 · 2023-05-16T09:59:40Z

...ner/src/main/java/org/apache/flink/table/planner/plan/abilities/sink/RowLevelDeleteSpec.java

@@ -78,6 +87,11 @@ public SupportsRowLevelDelete.RowLevelDeleteMode getRowLevelDeleteMode() {
        return rowLevelDeleteMode;
    }

+    @Nonnull


remove the annotation.

Why remove it? It's no bad to keep it. I think we can keep the annotation just like what we do for the other method.

Actually in the code style, it suggests it's not a necessary behaviour to mark the field not null. BTW, it's very verbose if we mark every method's return type not null[1].

[1] https://flink.apache.org/how-to-contribute/code-style-and-quality-common/#nullability-of-the-mutable-parts

Make sense to me. I'll remove it.

fsk119 · 2023-05-16T10:00:23Z

...ner/src/main/java/org/apache/flink/table/planner/plan/abilities/sink/RowLevelDeleteSpec.java

+    public int[] getRequiredPhysicalColumnIndices() {
+        return requiredPhysicalColumnIndices;
+    }
+
    @Override
    public boolean equals(Object o) {


Don't forget to modify the equals method.

fsk119 · 2023-05-16T10:01:40Z

...nner/src/main/java/org/apache/flink/table/planner/plan/nodes/exec/common/CommonExecSink.java

+                int fieldIndex = sinkRowType.getFieldIndex(uniqueConstraint.getColumns().get(i));
+                if (fieldIndex == -1) {
+                    return new int[0];
+                }


I think we should try our best to validate all avaliable pk columns.

In here, we do validate all avaliable pk columns.
The logic is try to find all the pk columns from the ResolvedSchema, but if for one pk column, we can't find from the ResolvedSchema, that means the pk columns miss some columns in which case we will consider it as no pk columns.
Note: this case only happens in update statement, the required columns returned by connector don't contain all pk columns.

I mean why we don't return all available positions here?

uniqueConstraint.getColumns().stream() .mapToInt(sinkRowType::getFieldIndex) .filter(i -> i == -1) .toArray();

I find the indcies is used as shuffle key, in the update statement we don't need shuffle?

We use primary keys as shuffle keys, but they still are primary keys which maybe used for other purposes in the CommonExecSink. If not all primary keys are available, we should consider it as no primary keys which is the semantic of primary keys.
Btw, I have moved such logic to BatchExecSink to make every clear since only update statement which is only supported in batch needs to consider it.

fsk119

Thanks for your contribution. I left some comments.

fsk119

Thanks for your contribution. I left some comments.

luoyuxia · 2023-05-16T13:37:55Z

@fsk119 Thanks for your review. Address your comments in 9b4b58f

luoyuxia · 2023-05-17T02:54:37Z

@fsk119 Thanks for reviewing again. Address your comments in 19c680d

lincoln-lil

@luoyuxia thanks for fixing this! Overall looks good to me, and it is recommended to add some more cases to cover partition, metadata(both virtual and non-virtual) and computed columns.

luoyuxia · 2023-05-17T09:11:15Z

@flinkbot run azure

lincoln-lil · 2023-05-17T09:12:06Z

...lanner/src/test/java/org/apache/flink/table/planner/runtime/batch/sql/DeleteTableITCase.java

@@ -138,7 +181,7 @@ public void testMixDelete() throws Exception {
    @Test
    public void testStatementSetContainDeleteAndInsert() throws Exception {
        tEnv().executeSql(
-                        "CREATE TABLE t (a int, b string, c double) WITH"
+                        "CREATE TABLE t (a int , b string, c double) WITH"


nit: rm extra space

luoyuxia · 2023-05-17T09:31:31Z

@luoyuxia thanks for fixing this! Overall looks good to me, and it is recommended to add some more cases to cover partition, metadata(both virtual and non-virtual) and computed columns.

Agree. I create a separate Jira FLINK-32117 to track it as it seems a separate issue to me. I expect to finish it in another jira.

luoyuxia · 2023-05-17T09:32:08Z

@lincoln-lil @fsk119 Thanks for your review. I have addressed your comments.

lincoln-lil

@luoyuxia seems e2e test2 always fails, could you contact the release manager to see what happens？

luoyuxia · 2023-05-18T01:15:26Z

Others also encounter the problem as reported in FLINK-32121. The reason is shown as FLINK-32123.

…l columns

flinkbot added the component=TableSQL/Runtime label May 5, 2023

luoyuxia force-pushed the FLINK-32001 branch 3 times, most recently from 3c3db8c to e63a149 Compare May 12, 2023 01:37

fsk119 reviewed May 16, 2023

View reviewed changes

luoyuxia force-pushed the FLINK-32001 branch from e63a149 to 9b4b58f Compare May 16, 2023 13:36

luoyuxia force-pushed the FLINK-32001 branch from 19c680d to e8e6ed2 Compare May 17, 2023 08:44

lincoln-lil reviewed May 17, 2023

View reviewed changes

lincoln-lil approved these changes May 17, 2023

View reviewed changes

luoyuxia added 5 commits May 19, 2023 09:32

[FLINK-32001][table] Row-level update should support returning partia…

9094f1f

…l columns

address comments

688ca9e

fix test failure

5951096

address comments

ea939c7

remove extra space

c547ed4

luoyuxia force-pushed the FLINK-32001 branch from d253eb7 to c547ed4 Compare May 19, 2023 01:32

luoyuxia merged commit 6695d84 into apache:master May 19, 2023

luoyuxia mentioned this pull request May 19, 2023

[FLINK-32001][table] Row-level update should support returning partia… #22600

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-32001][table] Row-level update should support returning partial columns #22525

[FLINK-32001][table] Row-level update should support returning partial columns #22525

luoyuxia commented May 5, 2023 •

edited

flinkbot commented May 5, 2023 •

edited

fsk119 left a comment

fsk119 May 16, 2023

luoyuxia May 16, 2023

fsk119 May 17, 2023

fsk119 May 16, 2023

luoyuxia May 16, 2023

fsk119 May 17, 2023

fsk119 May 16, 2023

luoyuxia May 16, 2023

fsk119 May 17, 2023

luoyuxia May 17, 2023

fsk119 May 16, 2023

fsk119 May 16, 2023

luoyuxia May 16, 2023 •

edited

fsk119 May 17, 2023

luoyuxia May 17, 2023

fsk119 left a comment

fsk119 left a comment

luoyuxia commented May 16, 2023

luoyuxia commented May 17, 2023 •

edited

lincoln-lil left a comment

luoyuxia commented May 17, 2023

lincoln-lil May 17, 2023

luoyuxia commented May 17, 2023

luoyuxia commented May 17, 2023

lincoln-lil left a comment

luoyuxia commented May 18, 2023

[FLINK-32001][table] Row-level update should support returning partial columns #22525

[FLINK-32001][table] Row-level update should support returning partial columns #22525

Conversation

luoyuxia commented May 5, 2023 • edited

What is the purpose of the change

Brief change log

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

flinkbot commented May 5, 2023 • edited

CI report:

fsk119 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luoyuxia May 16, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fsk119 left a comment

Choose a reason for hiding this comment

fsk119 left a comment

Choose a reason for hiding this comment

luoyuxia commented May 16, 2023

luoyuxia commented May 17, 2023 • edited

lincoln-lil left a comment

Choose a reason for hiding this comment

luoyuxia commented May 17, 2023

Choose a reason for hiding this comment

luoyuxia commented May 17, 2023

luoyuxia commented May 17, 2023

lincoln-lil left a comment

Choose a reason for hiding this comment

luoyuxia commented May 18, 2023

luoyuxia commented May 5, 2023 •

edited

flinkbot commented May 5, 2023 •

edited

luoyuxia May 16, 2023 •

edited

luoyuxia commented May 17, 2023 •

edited