Skip to content

Conversation

dawidwys
Copy link
Contributor

What is the purpose of the change

This PR adds methods for interacting with temporary objects as described in FLIP-64:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module

Verifying this change

This change added tests.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
  • The serializers: (yes / no / don't know)
  • The runtime per-record code paths (performance sensitive): (yes / no / don't know)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / no / don't know)
  • The S3 file system connector: (yes / no / don't know)

Documentation

@dawidwys dawidwys changed the title [FLINK-14490] Add methods for interacting with temporary objects [FLINK-14490][table] Add methods for interacting with temporary objects Oct 22, 2019
@flinkbot
Copy link
Collaborator

flinkbot commented Oct 22, 2019

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Automated Checks

Last check on commit 87124ec (Wed Dec 04 14:50:47 UTC 2019)

Warnings:

  • 2 pom.xml files were touched: Check for build and licensing issues.
  • No documentation files were touched! Remember to keep the Flink docs up to date!

Mention the bot in a comment to re-run the automated checks.

Review Progress

  • ❓ 1. The [description] looks good.
  • ❓ 2. There is [consensus] that the contribution should go into to Flink.
  • ❓ 3. Needs [attention] from.
  • ❓ 4. The change fits into the overall [architecture].
  • ❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.


The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands
The @flinkbot bot supports the following commands:

  • @flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
  • @flinkbot approve all to approve all aspects
  • @flinkbot approve-until architecture to approve everything until architecture
  • @flinkbot attention @username1 [@username2 ..] to require somebody's attention
  • @flinkbot disapprove architecture to remove an approval you gave earlier

@flinkbot
Copy link
Collaborator

flinkbot commented Oct 22, 2019

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run travis re-run the last Travis build

Copy link
Contributor

@twalthr twalthr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for 3424926

Copy link
Contributor

@twalthr twalthr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some feedback for eab35b0.

" 'connector' = 'kafka', \n" +
" 'kafka.topic' = 'log.test'\n" +
")\n";
final FlinkPlannerImpl planner = getPlannerBySqlDialect(SqlDialect.DEFAULT);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the planner by dialect still necessary if we also create a parser by dialect?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It uses the dialect to create a parser for view expansion. (though we do not support views expansion yet).

Copy link
Contributor

@twalthr twalthr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments for b4030de

Copy link
Contributor

@twalthr twalthr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments for 4e9ee4e.


class ParserImpl(
catalogManager: CatalogManager,
validatorProvider: () => FlinkPlannerImpl,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plannerProvider

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the context of the parser it serves only as a validator. We do not use the toRel conversion part. Ideally it would be best to further split the FlinkPlanner class into subclasses, but did not have enough time.

Copy link
Contributor

@twalthr twalthr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dawidwys. 4122516 looks great I added rather minor comments.

// wrapper contains only sink (not source)
ConnectorCatalogTable sourceAndSink = ConnectorCatalogTable
.sourceAndSink(sourceSinkTable.getTableSource().get(), tableSink, !IS_STREAM_TABLE);
catalogManager.alterTable(sourceAndSink, getTemporaryObjectIdentifier(name), false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alterTable is not used anymore, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, alterTable works with permanent table. We do not have an API for that.

Catalog catalog = tEnv.getCatalog(tEnv.getCurrentCatalog()).orElse(null);
assertNotNull(catalog);
catalog.createTable(
ObjectPath.fromString("default_database.T1"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

side comment: we should get rid of this method, it doesn't even support the identifier escaping

Copy link
Contributor

@twalthr twalthr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should improve the docs a bit further in 7d6721c.

Copy link
Contributor

@twalthr twalthr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One little comment for 90bf886

Copy link
Contributor

@twalthr twalthr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some suggestions for 64d8ca6

Copy link
Contributor

@twalthr twalthr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some feedback for 754bf46.

@twalthr
Copy link
Contributor

twalthr commented Oct 23, 2019

Thanks for this important API refactoring. The code changes look mostly good but we should definitely further improve the JavaDocs to avoid any confusion for our existing and future users. We should also update the website documentation and add good release notes to JIRA when closing this issue. This change will annoy existing users. We need to communicate the reasons for these changes.

@dawidwys
Copy link
Contributor Author

Thank you @bowenli86 @twalthr @zjuwangg for the reviews.

I did first pass over your comments. Still need to apply a few remaining comments.

@dawidwys dawidwys force-pushed the flip-64-2 branch 2 times, most recently from ce946ec to 0fbaa98 Compare October 25, 2019 13:59
@dawidwys
Copy link
Contributor Author

Hey @bowenli86 could you have another pass over the PR. I think @twalthr will not be able to do it any time soon. I tried to address all of his comments as well as yours and @zjuwangg.

private Optional<TableLookupResult> getPermanentTable(ObjectIdentifier objectIdentifier)
throws TableNotExistException {
Catalog currentCatalog = catalogs.get(objectIdentifier.getCatalogName());
ObjectPath objectPath = new ObjectPath(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this call can be saved by calling objectIdentifier.toObjectPath()

* <p>The field names of the {@link Table} are automatically derived
* from the type of the {@link DataSet}.
*
* <p>The view is registered in the current catalog and database. To register the view in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment can be a bit misleading as the view is registered actually only in the namespace of current catalog and db, but isn't physically located in the catalog/db, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will change to The view is registered in the namespace of the current catalog and database.

return doDropTemporaryTable(identifier, (table) -> table instanceof CatalogView);
}

private boolean doDropTemporaryTable(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the API naming of "doXxx()" feels a bit weird. rename to dropTemporaryTableInternal()?

* does not exist.
*/
public void dropTable(ObjectIdentifier objectIdentifier, boolean ignoreIfNotExists) {
if (temporaryTables.containsKey(objectIdentifier)) {
Copy link
Member

@bowenli86 bowenli86 Oct 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this behavior feels a bit abnormal to me. Why cannot users drop a catalog table when a temp table with same name space exist?

Table and temp table creation and deletion should be independent of each other, e,g, from sql perspective, users should be able to run "CREATE/DROP TABLE" and "CREATE/DROP TEMP TABLE" independently

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious about this, I took a look to spark:

   * If a database is specified in `name`, this will drop the table from that database.
   * If no database is specified, this will first attempt to drop a temporary view with
   * the same name, then, if that does not exist, drop the table from the current database.

Curious about other databases, I don't know how they behave.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spark does not have 3-part qualified temporary objects. We followed MySQL on that topic.

This behavior is introduced to prevent unintentional drops of temporary tables, which might have very costly undesirable effects. Users are usually accustomed with DROP TABLE, but not necessarily with DROP TEMPORARY TABLE. If a user issues a DROP TABLE with the intent of dropping the temporary table it will actually drop in a drop of the permanent one.

Dropping tables is a very invasive operation. I tried to design the behaviour in a way to force the user make a really conscious decision on which table should be dropped.

BTW, as a side note, this behaviour is described in the FLIP-64. There were no objections towards that behaviour during the discussion.

@bowenli86
Copy link
Member

Hi Dawid, I was only able to review the last five commits as I don't really know much about planner.

This PR, as well as several individual commits, is quite huge, and it makes really hard to review and follow. Would be nice to break such good amount of work down into several smaller PRs next time.

would be good to have your reviews if you have time. @KurtYoung @JingsongLi @xuefuz

Copy link
Contributor

@KurtYoung KurtYoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments for 988bd85

Copy link
Contributor

@KurtYoung KurtYoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one question for b426a97

case e: CSqlParseException =>
throw new SqlParserException(s"SQL parse failed. ${e.getMessage}", e)
}
val sqlNode: SqlNode = parser.parse(queryString)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it intended to re-use parser here? I noticed in PlannerBase::parser you pointed out that you didn't want to reuse parser since config might changed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because the whole FlinkPlannerImpl is used only once per planning session.

Copy link
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks dawid, left some comments.

.toJava(scanInternal(Array(name)).map(t => new TableReferenceExpression(name, t)))
.toJava(
Try({
val unresolvedIdentifier = UnresolvedIdentifier.of(name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use Try? If it is an illegal name, should we throw exception?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gives positive results when resolving IN expression. But will be invoked for all UnresolvedExpressions. That's why it might not necessary be a table identifier. The contract of the ExpressionResolver is that it actually never throws exception.

As a side note. The way TableIdentifier is currently implemented is actually wrong. This method is primarily for resolving SELECT * FROM t WHERE f IN (tableName), which is actually incorrect SQL statement. It is not allowed to use identifiers in an IN clause. Therefore once we introduce the Java Expression DSL we should get rid of this catalog lookup for a TableReference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. You can put these comments to code.

val unresolvedIdentifier = UnresolvedIdentifier.of(parser.parseIdentifier(path).names: _*)
scanInternal(unresolvedIdentifier) match {
case Some(table) => createTable(table)
case None => throw new TableException(s"Table '$path' was not found.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a unify exception message? Use unresolvedIdentifier?

Map<String, String> properties = new HashMap<>(toProperties());
schemaProperties.keySet().forEach(properties::remove);

CatalogTableImpl catalogTable = new CatalogTableImpl(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CatalogTable has partitionKeys too, consider add partition keys in CatalogTableImpl.toProperties and parse partition keys here?
(We can add later too)

Copy link
Contributor Author

@dawidwys dawidwys Oct 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I agree we should do that. I think we can do that in a follow-up? If I understand it correctly it will still use the ProjectableTableSource if we do not expose the partitions through the CatalogTable right?

WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.
After https://issues.apache.org/jira/browse/FLINK-14381 , we removed partition support for temporary table for clean up PartitionableTableSource/Sink.
I created JIRA to track it: https://issues.apache.org/jira/browse/FLINK-14543 , will fixed it in 1.10.

Map<String, String> properties = new HashMap<>(toProperties());
schemaProperties.keySet().forEach(properties::remove);

CatalogTableImpl catalogTable = new CatalogTableImpl(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can use CatalogTableBuilder here?
Looks like there some bugs in CatalogTableBuilder, it not remove schemaProperties in properties?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use the CatalogTableBuilder, yes. Wasn't aware of the class. I am not sure though, if we need the CatalogTableBuilder class in a long run. I think the functionality is actually duplicated with the one in ConnectTableDescriptor.

* does not exist.
*/
public void dropTable(ObjectIdentifier objectIdentifier, boolean ignoreIfNotExists) {
if (temporaryTables.containsKey(objectIdentifier)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious about this, I took a look to spark:

   * If a database is specified in `name`, this will drop the table from that database.
   * If no database is specified, this will first attempt to drop a temporary view with
   * the same name, then, if that does not exist, drop the table from the current database.

Curious about other databases, I don't know how they behave.

Copy link
Contributor

@KurtYoung KurtYoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment for 77880fc

Copy link
Contributor

@KurtYoung KurtYoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comment for Parser's interface: dfc8f07

@dawidwys
Copy link
Contributor Author

I tried to address all your comments @KurtYoung @JingsongLi @bowenli86

Could you have another look?

Copy link
Contributor

@KurtYoung KurtYoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dawidwys for this big effort, it indeed cleaned lots of APIs. The changes LGTM now and +1 to merge.

@dawidwys
Copy link
Contributor Author

Thanks for the reviews. Merging

@dawidwys dawidwys merged commit fe966d4 into apache:master Oct 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants