[FLINK-25171] Validation of duplicate fields in ddl sql #18017

jelly-1203 · 2021-12-06T07:11:48Z

What is the purpose of the change

The purpose of this pull Request is to add validation to the derived table that the fields are duplicated

Brief change log

Added verification of whether the derived table fields are duplicated

Verifying this change

Added tests to verify the exception is thrown in the correct scenarios

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): no
The public API, i.e., is any changed class annotated with @Public(Evolving): no
The serializers: no
The runtime per-record code paths (performance sensitive): no
Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
The S3 file system connector: no

Documentation

Does this pull request introduce a new feature? no
If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

flinkbot · 2021-12-06T07:14:12Z

CI report:

2af8ddb Azure: SUCCESS

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot run azure re-run the last Azure build

flinkbot · 2021-12-06T07:16:09Z

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Automated Checks

Last check on commit 99e0300 (Mon Dec 06 07:16:08 UTC 2021)

Warnings:

No documentation files were touched! Remember to keep the Flink docs up to date!
This pull request references an unassigned Jira ticket. According to the code contribution guide, tickets need to be assigned before starting with the implementation work.

_{Mention the bot in a comment to re-run the automated checks.}

Review Progress

❓ 1. The [description] looks good.
❓ 2. There is [consensus] that the contribution should go into to Flink.
❓ 3. Needs [attention] from.
❓ 4. The change fits into the overall [architecture].
❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.

The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
@flinkbot approve all to approve all aspects
@flinkbot approve-until architecture to approve everything until architecture
@flinkbot attention @username1 [@username2 ..] to require somebody's attention
@flinkbot disapprove architecture to remove an approval you gave earlier

wenlong88 · 2021-12-07T07:29:37Z

Thanks for the contribution, @jelly-1203 , you may need to add some test on the change.
By the way, you can do some clean up on description of pr. The template is a guidance to help you fill in information which is important for reviewers.

jelly-1203 · 2021-12-07T08:39:40Z

Thanks for the contribution, @jelly-1203 , you may need to add some test on the change. By the way, you can do some clean up on description of pr. The template is a guidance to help you fill in information which is important for reviewers.

@wenlong88 Thanks for your comments, I will add tests for this change later and tidy up the description of this pr

jelly-1203 · 2021-12-08T02:31:24Z

Thanks for the contribution, @jelly-1203 , you may need to add some test on the change. By the way, you can do some clean up on description of pr. The template is a guidance to help you fill in information which is important for reviewers.

@wenlong88 Hello, I have added test for this change and modified the description of this pr, please help to review it again

xuyangzhong · 2021-12-08T08:26:35Z

Hi, thanks for your contribution, @jelly-1203 . It seems that you have a bad code style in your new code. You can see it in CI report above. You can follow this doc to set the code style to automate it in IDEA: https://nightlies.apache.org/flink/flink-docs-master/docs/flinkdev/ide_setup/#code-formatting

jelly-1203 · 2021-12-08T14:16:44Z

Hi, thanks for your contribution, @jelly-1203 . It seems that you have a bad code style in your new code. You can see it in CI report above. You can follow this doc to set the code style to automate it in IDEA: https://nightlies.apache.org/flink/flink-docs-master/docs/flinkdev/ide_setup/#code-formatting

Hi, @xuyangzhong @wenlong88 thanks for your comments, I have adjusted the code style, compilation is still not passed, by looking at the error message, found in the org.apache.flink.table.planner.plan.stream.sql.UnionTest, An SQL statement with repeated fields was executed, so the compilation failed. Please kindly ask whether I need to mention an issue and modify the SQL?

Error code:

util.tableEnv.executeSql(
  s"""
     |CREATE TABLE t1 (
     |  id int,
     |  ts bigint,
     |  name string,
     |  timestamp_col timestamp(3),
     |  val bigint,
     |  name varchar(32),
     |  timestamp_ltz_col as TO_TIMESTAMP_LTZ(ts, 3),
     |  watermark for timestamp_col as timestamp_col
     |) WITH (
     |  'connector' = 'values',
     |  'bounded' = 'false'
     |)
   """.stripMargin)

xuyangzhong · 2021-12-09T04:24:20Z

Hi, @jelly-1203. You can rename one of the "name" for convenience. You needn't to mention your issue here because the GIT can record this change. If someone confuses this modifier, he can find your issue. So just modify it directly and code review will verify this correctness about this modifier. BTW, you may also need to modify the generated plan in UnionTest.xml.

jelly-1203 · 2021-12-09T05:27:13Z

Hi, @jelly-1203. You can rename one of the "name" for convenience. You needn't to mention your issue here because the GIT can record this change. If someone confuses this modifier, he can find your issue. So just modify it directly and code review will verify this correctness about this modifier. BTW, you may also need to modify the generated plan in UnionTest.xml.

Hi, @xuyangzhong, Thanks for your comment, I will modify it and note the match to the logical plan in uniontest.xml

xuyangzhong

I have left some comments following.

xuyangzhong · 2021-12-09T07:16:42Z

...-table-planner/src/test/scala/org/apache/flink/table/planner/plan/stream/sql/UnionTest.scala

@@ -40,10 +40,9 @@ class UnionTest extends TableTestBase {
         |CREATE TABLE t1 (
         |  id int,
         |  ts bigint,
-         |  name string,
+         |  name varchar(32),


If you want to delete one of them, I think it's better to keep the one that has the same type with other tables.

xuyangzhong · 2021-12-09T07:35:38Z

...able-planner/src/main/java/org/apache/flink/table/planner/operations/MergeTableLikeUtil.java

+                    if (oldType != null) {
+                        throw new ValidationException(
+                                String.format(
+                                        "A column named '%s' already exists in the derived table.",


The exception message is confusing where the derived table is, because users maybe only use "create table ..." instead of "create table ... like ...". But the previous one can also cause this bug. IMO, you can delete the word 'derived'.

…of the name field of table T1 in UnionTest

jelly-1203 · 2021-12-09T08:07:07Z

I have left some comments following.

Hi,@xuyangzhong Thanks for your advice. I will make adjustments as soon as possible

wenlong88

Thanks for the contribution, I left a comment. I think it is not a good place where the validation added currently.

wenlong88 · 2021-12-09T09:27:49Z

...able-planner/src/main/java/org/apache/flink/table/planner/operations/MergeTableLikeUtil.java

@@ -494,7 +494,13 @@ private void collectPhysicalFieldsTypes(List<SqlNode> derivedColumns) {
                    boolean nullable = type.getNullable() == null ? true : type.getNullable();
                    RelDataType relType = type.deriveType(sqlValidator, nullable);
                    // add field name and field type to physical field list
-                    physicalFieldNamesToTypes.put(name, relType);
+                    RelDataType oldType = physicalFieldNamesToTypes.put(name, relType);


I think it is not enough to add check here, when there is name conflicts between computedColumn or metadata Column, the check here would not work well. you can try to add validation in appendDerivedColumns.

I think it is not enough to add check here, when there is name conflicts between computedColumn or metadata Column, the check here would not work well. you can try to add validation in appendDerivedColumns.

Hi, @wenlong88 Thanks for your advice. I think it is very meaningful. I will verify it first and make adjustments in time if there is any problem

I think it is not enough to add check here, when there is name conflicts between computedColumn or metadata Column, the check here would not work well. you can try to add validation in appendDerivedColumns.

Hi, @wenlong88 Thanks for your advice, which is of great help to me. I found several problems during the testing process

Duplicate columns are overwritten when the compute column is first and the regular column is last

If the metadata column is first, use the compute column in the middle and the regular column in the rear. If the metadata column name is the same as the regular column name, use the compute column in the middle and the regular column name. In the generated comput column process, namely in accessibleFieldNamesToTypes. PutAll methods covered in repeated fields.

I have adjusted the code accordingly and added test for adjustment. Please help to review it

wenlong88 · 2021-12-10T05:22:05Z

...able-planner/src/main/java/org/apache/flink/table/planner/operations/MergeTableLikeUtil.java

+                    if (!result.isEmpty()) {
+                        throw new ValidationException(
+                                "A field name conflict exists between a field of the regular type and a field of the Metadata type.");
+                    }


may be we can just check duplication when put the new Column to the columns, at the end of this function?

may be we can just check duplication when put the new Column to the columns, at the end of this function?

hi @wenlong88
Thanks for your review and comment. I do not think duplication can be checked when putting the new Column to the columns, at the end of this function. The reasons are as follows:

If computeColumn or MetadataColumn uses overwrite's Merge strategy, duplicate fields are allowed for the same type.

To add physical accessibleFieldNamesToTypes columns and metadata columns, if metadataColumn before, and repeated physical column, leads to putAll, Metadata columns overwrite duplicate physical columns, which can result in a generated computeColumn that is not as expected.

may be we can just check duplication when put the new Column to the columns, at the end of this function?

hi,@wenlong88 Do you think my view is correct?

hi, @jelly-1203 thanks for the update and analysis, I agree that it not possible to add unified check at the end. but I still think that the check here can be improved a bit:

it seems that the check here is not relevant to the new added ComputingColumn, if you want to check the duplication, it is better to add it when updating metadataFieldNamesToTypes.

according to current implementation, regular columns have top priority(we collect collectPhysicalFieldsTypes at the beginning), so we may also need to check if there is duplicated name in physicalFieldNamesToTypes when trying to a metadata column or computed column. If we add such check, the check in 1 is not necessary any more.

what do you think?

hi, @jelly-1203 thanks for the update and analysis, I agree that it not possible to add unified check at the end. but I still think that the check here can be improved a bit:

it seems that the check here is not relevant to the new added ComputingColumn, if you want to check the duplication, it is better to add it when updating metadataFieldNamesToTypes.

according to current implementation, regular columns have top priority(we collect collectPhysicalFieldsTypes at the beginning), so we may also need to check if there is duplicated name in physicalFieldNamesToTypes when trying to a metadata column or computed column. If we add such check, the check in 1 is not necessary any more.

what do you think?

hi, @wenlong88 I think your suggestion is reasonable, I will try to modify it

… columns to put a new column into columns at the end of function.

wenlong88 · 2021-12-15T09:43:05Z

...-planner/src/test/java/org/apache/flink/table/planner/operations/MergeTableLikeUtilTest.java

+                        regularColumn("four", DataTypes.STRING()));
+
+        thrown.expect(ValidationException.class);
+        thrown.expectMessage(


in this case, I think it would be better to throw with information that there are duplicate column with name 'two' in metadata column and regular column?

Contributor

Ok, I'll return this error message more explicitly

wenlong88 · 2021-12-15T09:51:44Z

...able-planner/src/main/java/org/apache/flink/table/planner/operations/MergeTableLikeUtil.java

+                    if (!result.isEmpty()) {
+                        throw new ValidationException(
+                                "A field name conflict exists between a field of the regular type and a field of the Metadata type.");
+                    }


hi, @jelly-1203 thanks for the update and analysis, I agree that it not possible to add unified check at the end. but I still think that the check here can be improved a bit:

it seems that the check here is not relevant to the new added ComputingColumn, if you want to check the duplication, it is better to add it when updating metadataFieldNamesToTypes.

according to current implementation, regular columns have top priority(we collect collectPhysicalFieldsTypes at the beginning), so we may also need to check if there is duplicated name in physicalFieldNamesToTypes when trying to a metadata column or computed column. If we add such check, the check in 1 is not necessary any more.

what do you think?

…_validation

jelly-1203 · 2021-12-16T03:02:04Z

Hi, @wenlong88 I have adjusted the position of the verification logic and made the error message return more clear. Could you please help to review it and see what needs to be improved? Thank you

jelly-1203 · 2021-12-17T02:53:38Z

Hi, @wenlong88 Could you please review it and see what needs to be improved

wenlong88 · 2021-12-17T05:26:25Z

LGTM，cc @godfreyhe to do the final check

jelly-1203 · 2021-12-22T06:07:06Z

Hi, @godfreyhe
Please have time to do the final check. If there is any deficiency, I will continue to improve

jelly-1203 · 2022-01-04T05:46:54Z

Hi,@wenlong88 could you please ping @godfreyhe again? There is no progress on this issue

jelly-1203 · 2022-01-19T06:20:55Z

anyone?

wenlong88 · 2022-01-19T09:30:23Z

@jelly-1203 thanks for the following up, I would ping @godfreyhe offline to follow the pr.

godfreyhe

Sorry for the late response, LGTM, I will merge it

This closes #18017 (cherry picked from commit 34de398)

This closes apache#18017

Added validation for duplicate fields in derived tables

99e0300

jelly-1203 changed the title ~~[FLINK-25171] Added validation for duplicate fields in derived tables~~ [FLINK-25171] Validation of duplicate fields in derived tables Dec 6, 2021

rmetzger added the component=TableSQL/Planner label Dec 6, 2021

jelly-1203 closed this Dec 7, 2021

jelly-1203 reopened this Dec 7, 2021

Added verification of whether the derived table fields are duplicated

7864672

Check code style and fix it

4566912

Delete duplicate columns from table T1 in unionTest

196abdb

xuyangzhong reviewed Dec 9, 2021

View reviewed changes

For general use, remove the derived modifier && change the data type …

c69f68a

…of the name field of table T1 in UnionTest

jelly-1203 changed the title ~~[FLINK-25171] Validation of duplicate fields in derived tables~~ [FLINK-25171] Validation of duplicate fields in ddl sql Dec 9, 2021

wenlong88 suggested changes Dec 9, 2021

View reviewed changes

shouzuo1 added 3 commits December 9, 2021 18:45

check and apply code style

db9ceb0

back to name varchar32

7c0945b

Add duplicate validation for fields of type compute and metadata

52e96c4

wenlong88 reviewed Dec 10, 2021

View reviewed changes

shouzuo1 added 2 commits December 10, 2021 18:38

Move repeatability validation for SqlRegularColumn and other types of…

a3995c2

… columns to put a new column into columns at the end of function.

back to 52e96c4

ac17949

wenlong88 reviewed Dec 15, 2021

View reviewed changes

Modify the verification mode to clarify the error information

fff6dbb

jelly-1203 added 2 commits December 16, 2021 01:51

Fix single test failure

707d75f

Merge remote-tracking branch 'flink-master/master' into Derived_table…

2af8ddb

…_validation

godfreyhe approved these changes Jan 25, 2022

View reviewed changes

godfreyhe closed this in 34de398 Jan 25, 2022

godfreyhe pushed a commit that referenced this pull request Jan 25, 2022

[FLINK-25171][table-planner] Validation of duplicate fields in ddl sql

70acbc5

This closes #18017 (cherry picked from commit 34de398)

niklassemmler pushed a commit to niklassemmler/flink that referenced this pull request Feb 3, 2022

[FLINK-25171][table-planner] Validation of duplicate fields in ddl sql

31799f3

This closes apache#18017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-25171] Validation of duplicate fields in ddl sql #18017

[FLINK-25171] Validation of duplicate fields in ddl sql #18017

jelly-1203 commented Dec 6, 2021 •

edited

flinkbot commented Dec 6, 2021 •

edited

flinkbot commented Dec 6, 2021

wenlong88 commented Dec 7, 2021

jelly-1203 commented Dec 7, 2021

jelly-1203 commented Dec 8, 2021

xuyangzhong commented Dec 8, 2021

jelly-1203 commented Dec 8, 2021 •

edited

xuyangzhong commented Dec 9, 2021 •

edited

jelly-1203 commented Dec 9, 2021

xuyangzhong left a comment

xuyangzhong Dec 9, 2021

xuyangzhong Dec 9, 2021

jelly-1203 commented Dec 9, 2021

wenlong88 left a comment

wenlong88 Dec 9, 2021

jelly-1203 Dec 9, 2021

jelly-1203 Dec 9, 2021

wenlong88 Dec 10, 2021

jelly-1203 Dec 10, 2021 •

edited

jelly-1203 Dec 15, 2021

wenlong88 Dec 15, 2021

jelly-1203 Dec 15, 2021

wenlong88 Dec 15, 2021

jelly-1203 Dec 15, 2021

wenlong88 Dec 15, 2021

jelly-1203 commented Dec 16, 2021

jelly-1203 commented Dec 17, 2021

wenlong88 commented Dec 17, 2021

jelly-1203 commented Dec 22, 2021

jelly-1203 commented Jan 4, 2022

jelly-1203 commented Jan 19, 2022

wenlong88 commented Jan 19, 2022

godfreyhe left a comment

[FLINK-25171] Validation of duplicate fields in ddl sql #18017

[FLINK-25171] Validation of duplicate fields in ddl sql #18017

Conversation

jelly-1203 commented Dec 6, 2021 • edited

What is the purpose of the change

Brief change log

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

flinkbot commented Dec 6, 2021 • edited

CI report:

flinkbot commented Dec 6, 2021

Automated Checks

Review Progress

wenlong88 commented Dec 7, 2021

jelly-1203 commented Dec 7, 2021

jelly-1203 commented Dec 8, 2021

xuyangzhong commented Dec 8, 2021

jelly-1203 commented Dec 8, 2021 • edited

xuyangzhong commented Dec 9, 2021 • edited

jelly-1203 commented Dec 9, 2021

xuyangzhong left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jelly-1203 commented Dec 9, 2021

wenlong88 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jelly-1203 Dec 10, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jelly-1203 commented Dec 16, 2021

jelly-1203 commented Dec 17, 2021

wenlong88 commented Dec 17, 2021

jelly-1203 commented Dec 22, 2021

jelly-1203 commented Jan 4, 2022

jelly-1203 commented Jan 19, 2022

wenlong88 commented Jan 19, 2022

godfreyhe left a comment

Choose a reason for hiding this comment

jelly-1203 commented Dec 6, 2021 •

edited

flinkbot commented Dec 6, 2021 •

edited

jelly-1203 commented Dec 8, 2021 •

edited

xuyangzhong commented Dec 9, 2021 •

edited

jelly-1203 Dec 10, 2021 •

edited