Skip to content

Conversation

@luoyuxia
Copy link
Contributor

@luoyuxia luoyuxia commented Jan 17, 2023

What is the purpose of the change

To make planner supports row-level update.

Brief change log

  • Convert the SqUpdate to LogicalTableModify, and then rewrite the LogicalTableModify to a RelNode that insert into the sink with rows

    • If the update mode is ALL_ROWS, assuming the filter is a, convert the update expression to IF(a, xx, yyy).
    • Get the required columns by calling method SupportsRowLevelUpdate#applyRowLevelUpdate. We may rewrite the
      LogicaTableScan to add extra meta cols to the underlying table. Then create a Project node that only select the required
      columns.
  • If update mode is UPDATED_ROWS, add a operator to set the RowKind to RowKind#UPDATE_AFTER in CommonExecSink

  • Move the method SqlToOperationConverterTest#prepareTable to a new class to avoid SqlToOperationConverterTest is too long to fail to pass check-style.

Verifying this change

  • The test in SqlToOperationConverterTest#testUpdate to verify update statement will be converted to SinkModifyOperation.
  • The test in RowLevelDeleteUpdate to verify the plan for row-level update.
  • The IT case in UpdateTableITCase to verify update can be performed normally.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
  • The serializers: (yes / no / don't know)
  • The runtime per-record code paths (performance sensitive): (yes / no / don't know)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
  • The S3 file system connector: (yes / no / don't know)

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@flinkbot
Copy link
Collaborator

flinkbot commented Jan 17, 2023

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@luoyuxia luoyuxia force-pushed the FLINK-30665 branch 5 times, most recently from a4db6c0 to a3f1c61 Compare January 28, 2023 09:03
@luoyuxia
Copy link
Contributor Author

@lincoln-lil The pr is ready to review. Could you please help review when you're free?

@luoyuxia luoyuxia force-pushed the FLINK-30665 branch 2 times, most recently from b25d2be to 3dd9fa1 Compare January 28, 2023 12:16
… a new separate SqlToOperationConverterTestUtils to avoid SqlToOperationConverterTest is too long to make check-style fail
Copy link
Contributor

@lincoln-lil lincoln-lil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@luoyuxia thanks for contributing this! Looks good to me overall, I've left some comments.

import java.util.Map;

/** Utils for {@link org.apache.flink.table.planner.operations.SqlToOperationConverterTest} . */
public class SqlToOperationConverterTestUtils {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the SqlToOperationConverterTest has exceeded the maximum row limit, we might as well split it by sql type, such as ddl, dml and other command, extract a SqlToOperationConverterTestBase to keep all the helper methods, and split the test into three pieces: SqlDDLToOperationConverterTest, SqlDMLToOperationConverterTest, and SqlCommandToOperationConverterTest , WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A good suggestion.

RelRoot updateRelational = flinkPlanner.rel(sqlUpdate);
// get target sink table
LogicalTableModify tableModify = (LogicalTableModify) updateRelational.rel;
List<String> targetTablePath = tableModify.getTable().getQualifiedName();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use UnresolvedIdentifier directly to keep consistent with the delete operation


/**
* A sub-class of {@link SinkAbilitySpec} that can not only serialize/deserialize the row-level
* update mode & updated columns to/from JSON, but also can update existing data for {@link
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'update mode & updated columns' -> 'update mode & columns'

List<Column> updatedColumns = getUpdatedColumns(tableModify, resolvedSchema);
SupportsRowLevelUpdate.RowLevelUpdateInfo updateInfo =
supportsRowLevelUpdate.applyRowLevelUpdate(
updatedColumns, RowLevelModificationContextUtils.getScanContext());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add a local variable for RowLevelModificationContextUtils.getScanContext()

.withDescription("The name for the required columns in row-level update");

private static final ConfigOption<Boolean> ONLY_REQUIRE_UPDATED_COLUMNS_FOR_UPDATE =
ConfigOptions.key("only_require_updated_columns_for_update")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'only_require_updated_columns_for_update' -> 'only-require-updated-columns-for-update'

.booleanType()
.defaultValue(false)
.withDescription(
"Whether to only require the updated columns for update statement. ");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add additional comments: ', require all columns by default'

}

@Test
public void testUpdateWithOnlyRequireUpdatedCols() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a new case which not contains all columns in update clause, e.g., a source table with more columns here 'TABLE t (f0, f1,..., a int, b string, c double, ...)'

@luoyuxia
Copy link
Contributor Author

@lincoln-lil Thanks for reviewing. I append two commits to fix your comments. The first one is to address the comments for row-level update, the second one is to split the test SqlToOperationConverterTest.

…three different test classes to avoid it exceed the maximum row limit
Copy link
Contributor

@lincoln-lil lincoln-lil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@luoyuxia thanks for your update! I have a minor comment for the test splitting: although there is no strict guideline on how to classify various sql commands, it is clearly inappropriate to classify commands such as show, load, use, etc. into ddl or dml, should we adjust it to refer to mysql's definition of ddl, dml, and put the remaining into sqlOtherOperation (probably called sqlCommand is also easy to cause ambiguity).
Btw, since 'begin statement set' is used to assist multiple dml statements, we can also put it into dml test, WDYT?

@luoyuxia
Copy link
Contributor Author

luoyuxia commented Jan 30, 2023

@luoyuxia thanks for your update! I have a minor comment for the test splitting: although there is no strict guideline on how to classify various sql commands, it is clearly inappropriate to classify commands such as show, load, use, etc. into ddl or dml, should we adjust it to refer to mysql's definition of ddl, dml, and put the remaining into sqlOtherOperation (probably called sqlCommand is also easy to cause ambiguity).

Agree to it. The reason I put show/use into ddl is from the Hive's language manual ddl wiki. But now, I think the hive's is not correct and suitable for all systems. So i like the idea that put them into sqlOtherOperation.

Btw, since 'begin statement set' is used to assist multiple dml statements, we can also put it into dml test, WDYT?

Agree to it

@luoyuxia
Copy link
Contributor Author

@lincoln-lil Thanks for reviewing. Now I move the test for the statements that neither belong to DDL nor DML to SqlOtherOperationConverterTest

@lincoln-lil
Copy link
Contributor

@flinkbot run azure

lincoln-lil pushed a commit to lincoln-lil/flink that referenced this pull request Jan 30, 2023
Copy link
Contributor

@lincoln-lil lincoln-lil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

chucheng92 pushed a commit to chucheng92/flink that referenced this pull request Feb 3, 2023
akkinenivijay pushed a commit to krisnaru/flink that referenced this pull request Feb 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants