Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-12604][table-api][table-planner] Register TableSource/Sink as CatalogTables #8549

Closed
wants to merge 9 commits into from

Conversation

dawidwys
Copy link
Contributor

What is the purpose of the change

This is next step in decoupling TableEnvironment from Calcite. It introduces registration of TableSource/Sink as CatalogTables.

Brief change log

This is based on #8521.

  • added org.apache.flink.table.catalog.ConnectorCatalogTable that wraps TableSource/Sink and used it for registration in TableEnvironment
  • added org.apache.flink.table.operations.TableSourceTableOperation for reading from an inline TableSource. This is used only when creating a Table from TableEnvironment#fromTableSource
  • removed unnecessary code duplication from classes such as TableSourceSinkTable, TableSourceTable, TableSinkTable etc.

Verifying this change

  • This change is already covered by existing tests.
  • Added test to verify TableEnviroment#fromTableSource behavior: org.apache.flink.table.runtime.stream.table.TableSourceITCase#testInlineCsvTableSource

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
  • The serializers: (yes / no / don't know)
  • The runtime per-record code paths (performance sensitive): (yes / no / don't know)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / no / don't know)
  • The S3 file system connector: (yes / no / don't know)

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@dawidwys dawidwys requested a review from twalthr May 27, 2019 06:59
@flinkbot
Copy link
Collaborator

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Review Progress

  • ❓ 1. The [description] looks good.
  • ❓ 2. There is [consensus] that the contribution should go into to Flink.
  • ❓ 3. Needs [attention] from.
  • ❓ 4. The change fits into the overall [architecture].
  • ❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.


The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands
The @flinkbot bot supports the following commands:

  • @flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
  • @flinkbot approve all to approve all aspects
  • @flinkbot approve-until architecture to approve everything until architecture
  • @flinkbot attention @username1 [@username2 ..] to require somebody's attention
  • @flinkbot disapprove architecture to remove an approval you gave earlier

TableOperations.

We do not store the DataStream & DataSet as Calcite's Tables anymore. We
treat them as inline operations. When converting from TableOperations to
RelNodes we directly create a special kind of DataStream/SetScan that
does not access the catalog.
Copy link
Contributor

@twalthr twalthr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for 7192f70

Copy link
Contributor

@twalthr twalthr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work @dawidwys. The code reads like a novel :D

I only had minor comments for 1de257b.

* <ol>
* <li>{@code [current-catalog].[current-database].[tablePath]}</li>
* <li>{@code [current-catalog].[tablePath]}</li>
* <li>{@code [tablePath]}</li>
* </ol>
*
* @param tablePath table path to look for
* @return {@link CatalogTableOperation} containing both fully qualified table identifier and its
* {@link TableSchema}.
* @return {@link ResolvedTable} wrapping original table with additional iformation about table path and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: information

private final TableSink<T2> tableSink;
private final boolean isBatch;

private static final String COMMENT = "A table sink or source backed table.";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: baked

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant "to back" that this Table is backed by a table sink or sourced.

Shall I maybe just remove the comment, the same way as we discussed here: #8521 (comment)

}

@Override
public CatalogBaseTable copy() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This violates the contract of the method. Is it actually required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately it is. All methods in the Catalog (add, get, alter etc.) call this method. I also don't like it.

tableSource,
!connectorTable.isBatch(),
FlinkStatistic.UNKNOWN()))
.orElseThrow(() -> new TableException("Querying sink only table unsupported."));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rephrase: "Catalog table does only support sink operations."?

@@ -425,20 +424,128 @@ abstract class TableEnvImpl(
"Only tables that belong to this TableEnvironment can be registered.")
}

checkValidTableName(name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add least at some checks for non-empty strings and not only whitespaces? Right now registerTableInternal fails in the check arguments of ObjectPath.

val selectedFields: Option[Array[Int]])
extends TableScan(cluster, traitSet, table) {


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: empty line

@@ -156,17 +156,6 @@ public void testTableRegister() throws Exception {
compareResultAsText(results, expected);
}

@Test(expected = TableException.class)
public void testIllegalName() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned earlier, let's keep this test but modify it with other illegal names.

FlinkStatistic.UNKNOWN());

CatalogReader catalogReader = (CatalogReader) relBuilder.getRelOptSchema();
String refId = Integer.toString(System.identityHashCode(tableSourceTable.getTableSource()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add an explanation what we are doing here? Btw should we only use the refId or should we prefix it for readability unregistered_456789?

/**
* A {@link CatalogTable} that wraps a {@link TableSource} and/or {@link TableSink}.
* This allows registering those in a {@link Catalog}. It can not be persisted as the
* source and/or sink might be inline implementations and not be representable in a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a bit confused be the "might be" here.

Are some ConnectorCatalogTables inline impl, and some are not and can be converted to properties? The exception thrown in toProperties() indicates that all ConnectorCatalogTable cannot be converted to properties

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It says the TableSource might be a inline implementation. You can either use TableEnvironment#connect than theoretically the table source is property serializable. The other possibility is just to use TableSource explicitly: TableEnvironment#fromTableSource.

The ConnectorCatalogTable therefore is always inline as we don't know which case we are handling.

public class ConnectorCatalogTable<T1, T2> extends AbstractCatalogTable {
private final TableSource<T1> tableSource;
private final TableSink<T2> tableSink;
private final boolean isBatch;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just want to double check: unlike a persistent catalog table that can be both batch and streaming, ConnectorCatalogTable can only be either batch or streaming, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the long-term this property will not be necessary anymore. However, it is required in how the table environments (batch and streaming ones) work.

* {@link TableSchema} to the {@link org.apache.flink.api.common.typeutils.CompositeType}.
*/
@Internal
public class DataSetTableOperation<E> extends TableOperation {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

javadoc for ?

@Internal
public class DataSetTableOperation<E> extends TableOperation {

private final DataSet<E> dataStream;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a special reason that we name the variable as dataStream instead of dataSet?

* {@link TableSchema} to the {@link org.apache.flink.api.common.typeutils.CompositeType}.
*/
@Internal
public class DataStreamTableOperation<E> extends TableOperation {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

javadoc for ?

Copy link
Contributor

@twalthr twalthr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update @dawidwys. +1 from my side.

@dawidwys dawidwys closed this in 88c7d82 Jun 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants