Skip to content

Conversation

@JingsongLi
Copy link
Contributor

@JingsongLi JingsongLi commented Feb 10, 2020

What is the purpose of the change

Discussion in: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improve-TableFactory-td36647.html

Vote in: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-Improve-TableFactory-to-add-Context-td37211.html

Motivation:
Now the main needs and problems are:

Connector can't get TableConfig[1], and some behaviors really need to be
controlled by the user's table configuration. In the era of catalog, we
can't put these config in connector properties, which is too inconvenient.
A context class also allows for future modifications without touching the TableFactory interface again.

Brief change log

  1. Interface modification for table factories
  2. Apply new interface in blink planner.

Verifying this change

SinkTest.testTableSourceSinkFactory

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
  • The serializers: (yes / no / don't know)
  • The runtime per-record code paths (performance sensitive): (yes / no / don't know)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / no / don't know)
  • The S3 file system connector: (yes / no / don't know)

Documentation

  • Does this pull request introduce a new feature? (yes)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@flinkbot
Copy link
Collaborator

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Automated Checks

Last check on commit 86c4939 (Mon Feb 10 08:03:11 UTC 2020)

Warnings:

  • No documentation files were touched! Remember to keep the Flink docs up to date!

Mention the bot in a comment to re-run the automated checks.

Review Progress

  • ❓ 1. The [description] looks good.
  • ❓ 2. There is [consensus] that the contribution should go into to Flink.
  • ❓ 3. Needs [attention] from.
  • ❓ 4. The change fits into the overall [architecture].
  • ❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.

Details
The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands
The @flinkbot bot supports the following commands:

  • @flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
  • @flinkbot approve all to approve all aspects
  • @flinkbot approve-until architecture to approve everything until architecture
  • @flinkbot attention @username1 [@username2 ..] to require somebody's attention
  • @flinkbot disapprove architecture to remove an approval you gave earlier

@flinkbot
Copy link
Collaborator

flinkbot commented Feb 10, 2020

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run travis re-run the last Travis build
  • @flinkbot run azure re-run the last Azure build

CatalogTable getTable();

/**
* @return readable config of this table environment.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a description that the configuration gives the factory instance the ability to access TableConfig#getConfiguration() which holds the current TableEnvironment session configurations.

* @param context context of this table sink.
* @return the configured table sink.
*/
default TableSink<T> createTableSink(Context context) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we deprecate the other createTableSink and createTableSource interfaces?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know... we can deprecate createTableSink(ObjectPath, CatalogTable). CC: @twalthr

return Optional.empty();
}

/**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make sure TableSinkFactory#createTableSink(context) is the only entry to be invoked by the planner. However, the above createTableSinkForCatalogTable method still calling the old createTableSink method in old planner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll push another commit to Support create table source/sink by context in legacy planner

def testTableSourceSinkFactory(): Unit = {
val factory = new TestContextTableFactory
util.tableEnv.getConfig.getConfiguration.setBoolean(factory.needContain, true)
util.tableEnv.registerCatalog("cat", new GenericInMemoryCatalog("default") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only guarantees one of the paths, i.e. via TableFactoryUtil#createTableSinkForCatalogTable, which is only used by Hive. However, the another path is not covered, i.e. via CatalogSourceTable#findAndCreateTableSource which is used by most users.

Maybe you can upgrade the existing TestCollectionTableFactory to use the new interface and expose a session configuration, e.g. collection.is-bounded, the flag will be accessed via Context#getConfiguration and pass to the created CollectionTableSource.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right, but I want to modify this step by step.
The previous method createTableSink(ObjectPath tablePath, CatalogTable table) is only work in createTableSinkForCatalogTable too.
after these commits looks good to you, I will create following commit to fix this.
What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But without a test to cover the new interface, we have no idea whether it works as expected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean? Is this not a test cover?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned above, this test doesn't cover another path, i.e. CatalogSourceTable#findAndCreateTableSource.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll modify these things by multi commits.

@JingsongLi
Copy link
Contributor Author

JingsongLi commented Feb 11, 2020

We can wait this one: #11055 , to be more cleaner.

@JingsongLi JingsongLi force-pushed the tablefactory branch 5 times, most recently from 8c7c673 to b1f4018 Compare February 15, 2020 13:38
Copy link
Member

@wuchong wuchong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @JingsongLi . It looks good in general. I left some minor comments.

*/
@Override
default TableSource<T> createTableSource(Context context) {
return createStreamTableSource(context);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should validate the return value of createStreamTableSource is not null. If is null, we should throw an exception to indicate users to implement createTableSource(context), because createStreamTableSource(Map) is default implemented now, and users may not override it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indicate users to implement createStreamTableSource(context)?

* @param context context of this table source.
* @return the configured table source.
*/
default StreamTableSource<T> createStreamTableSource(Context context) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this interface?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somebody implements StreamTableSourceFactory looks like need this method.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But he can still use the TableSource<T> createTableSource(context) interface.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if user use TableSource, there is no need to use StreamTableSourceFactory..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. StreamTableSourceFactory should be deprecated. It is confused why there is a StreamTableSourceFactory and TableSourceFactory. But for compatibility, users can still use StreamTableSourceFactory but indicate users to use TableSource<T> createTableSource(context).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I will remove this method.

@JingsongLi JingsongLi force-pushed the tablefactory branch 2 times, most recently from 2db8738 to daf7fda Compare February 18, 2020 13:49
Copy link
Member

@wuchong wuchong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Will merge it after tests passed in my branch.

@wuchong wuchong closed this in f6895da Feb 22, 2020
@JingsongLi JingsongLi deleted the tablefactory branch April 26, 2020 05:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants