Skip to content

Conversation

@dianfu
Copy link
Contributor

@dianfu dianfu commented Feb 27, 2019

What is the purpose of the change

This pull request ported external catalog to flink-table-common

Brief change log

  • Ported the following classes to flink-table-common: ExternalCatalog, ExternalCatalogTable, CatalogNotExistException, TableNotExistException
  • Remove method ExternalCatalogTable.builder()

Verifying this change

This change is a code cleanup without any test coverage.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
  • The serializers: (no)
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
  • The S3 file system connector: (no)

Documentation

  • Does this pull request introduce a new feature? (no)
  • If yes, how is the feature documented? (not applicable)

@flinkbot
Copy link
Collaborator

flinkbot commented Feb 27, 2019

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Review Progress

  • ✅ 1. The [description] looks good.
  • ✅ 2. There is [consensus] that the contribution should go into to Flink.
  • ❓ 3. Needs [attention] from.
  • ✅ 4. The change fits into the overall [architecture].
  • ✅ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.

Details
The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands
The @flinkbot bot supports the following commands:

  • @flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
  • @flinkbot approve all to approve all aspects
  • @flinkbot approve-until architecture to approve everything until architecture
  • @flinkbot attention @username1 [@username2 ..] to require somebody's attention
  • @flinkbot disapprove architecture to remove an approval you gave earlier

@dianfu
Copy link
Contributor Author

dianfu commented Feb 27, 2019

@twalthr Have created a PR to port external catalogs to flink-table-common. Could you help to take a look when it's convenient for you? Thanks in advance. :)

Copy link
Member

@bowenli86 bowenli86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this PR!

Just share my thought on the annotation. I don't think we should add annotations to existing catalog related classes, given 1) they are not annotated before, 2) they will soon be replaced by new catalog APIs

* <p>It provides information about catalogs, databases and tables such as names, schema,
* statistics, and access information.
*/
@PublicEvolving
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove annotation

*
* <p>Use {@link ExternalCatalogTableBuilder} to integrate with the normalized descriptor-based API.
*/
@PublicEvolving
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove annotation

@dianfu dianfu force-pushed the port_ExternalCatalog branch from 3dcfe1f to 7d51f47 Compare February 28, 2019 04:45
@dianfu
Copy link
Contributor Author

dianfu commented Feb 28, 2019

@bowenli86 Thanks a lot for the suggestions. I'm fine to remove the annotation of
ExternalCatalog and ExternalCatalogTable if they will be replaced with new Catalog API as these classes may be refactored or removed at that time. Regarding to the annotation itself, I think it's needed regardless whether it exists before. Porting classes is a good opportunity to correct the annotation which may be not marked correctly before. @twalthr What's your thought?

@sunjincheng121
Copy link
Member

@flinkbot approve description
@flinkbot approve consensus

Should be rebased after all commits of FLINK-11449 be merged.

@dianfu
Copy link
Contributor Author

dianfu commented Mar 1, 2019

@sunjincheng121 Thanks a lot for the review. I will rebase the PR after
all the commits of FLINK-11449 merged. :)

@dianfu dianfu force-pushed the port_ExternalCatalog branch from 7d51f47 to 423f83a Compare March 14, 2019 08:28
@dianfu
Copy link
Contributor Author

dianfu commented Mar 15, 2019

@sunjincheng121 @twalthr I have rebased the PR since FLINK-11449 has been merged. Could you help to take a look at this PR? Thanks in advance.

@dianfu dianfu force-pushed the port_ExternalCatalog branch from f966ea9 to 683b836 Compare March 15, 2019 13:32
@twalthr
Copy link
Contributor

twalthr commented Mar 15, 2019

@bowenli86 thanks for your suggestion. I agree that the catalog API will change in the near future but this doesn't mean that we can't annotate it. The annotation allows users to know what is intended to be used or not. The current catalog APIs are still intended to be used. The Evolving tells people that it might change again. Additionally, the annotation tells contributors when to think twice about changing a class.

@bowenli86
Copy link
Member

@twalthr re: The current catalog APIs are still intended to be used. What my team is thinking of rolling out unified catalog APIs is to build them separately and not touch existing ExternalCatalog APIs, and, upon it having full capabilities of existing catalog APIs, we can just do a switch. That's how we did it internally, and of course we can discuss if you have a different rollout plan. Anyway, the above was the background I had in mind then. With that, I was concerned that japicmp may report errors when changing a class's annotation from PublicEvolving to Deprecated, thus I did some testing locally and it seems to be fine.

I think it's ok to mark them as PublicEvolving. It's a bit hard to say as everything is moving around so fast in Flink table/SQL and our planned timeline.

@twalthr
Copy link
Contributor

twalthr commented Mar 18, 2019

@bowenli86 Having a completely new interface for the unified catalog APIs and performing the switch later is completely fine. This is also what Xuefu and I had discussed offline. As far as I know, japicmp is not in place for the Table & SQL API.

Copy link
Contributor

@twalthr twalthr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @dianfu. I added a couple of comments. My biggest concern right now is that this PR does much more than just porting external catalog interfaces but also changes rowtime descriptors and field computer interfaces. If we would skip the ExternalCatalogBuilder as discussed initially we could have avoided porting a lot of classes that might change very soon anyways due to the unified catalog API.

* <p>The following example shows how to read from a connector using a JSON format and
* declaring it as a table source:
*
* <p>{@code
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use <pre>{@code …}</pre> instead. See https://reflectoring.io/howto-format-code-snippets-in-javadoc/.

// add cast to requested type and convert expression to RexNode
// TODO we cast to planner expressions as a temporary solution to keep the old interfaces
val rexExpression = Cast(expression.asInstanceOf[PlannerExpression], resultType)
val rexExpression = ApiExpressionUtils
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would leave a TODO here as it reminds us that this is still not the final solution.

</dependency>

<dependency>
<groupId>org.apache.flink</groupId>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove dependencies to flink-table-api-java-bridge or flink-table-planner already now? I think the elasticsearch module should be able to work with flink-table-common classes only, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It still depends on flink-table-api-java-bridge and flink-table-planner as Elasticsearch6UpsertTableSinkFactory implements StreamTableSinkFactory.

* @param extractor The {@link TimestampExtractor} to extract the rowtime attribute
* from the physical type.
*/
public Rowtime timestampsFromExtractor(TimestampExtractor extractor) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes the original logic. Before we were only serializing as the last option. Serializing is dangerous as we might not be able to restore the base64 string. Please reintroduce the original logic.

* Sets a custom watermark strategy to be used for the rowtime attribute.
*/
public Rowtime watermarksFromStrategy(WatermarkStrategy strategy) {
internalProperties.putString(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes the original logic. Before we were only serializing as the last option. Serializing is dangerous as we might not be able to restore the base64 string. Please reintroduce the original logic.

private final DescriptorProperties internalProperties = new DescriptorProperties(true);

/**
* These constants will be removed once RowtimeValidator is ported to flink-table-common.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it too much effort to port the RowtimeValidator now?

* Specifies the origin of the previously defined field. The origin field is defined by a
* connector or format.
*
* <p>E.g. field("myString", Types.STRING).from("CSV_MY_STRING")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use {@code field(...} for all code exmaples in those methods.

* @param fieldAccesses Field access expressions for the argument fields.
* @return The expression to extract the timestamp from the {@link TableSource} return type.
*/
public abstract Expression getExpression(FieldReferenceExpression[] fieldAccesses);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had this discussion with @sunjincheng121 before that this breaks the current interfaces and also changes the serialization format of serialized timestamp extractors. I'm not happy with this changes but I also see that we need progress here.

In any case, I would like to get such breaking changes communicated through the description of the PR.

If we would simply postpone the ExternalCatalogBuilder porting, most of the work of this PR would not have been necessary.

"50YWJsZS5zb3VyY2VzLndtc3RyYXRlZ2llcy5QdW5jdHVhdGVkV2F0ZXJtYXJrQXNzaWduZXKBUc57oaWu9A" +
"IAAHhyAD1vcmcuYXBhY2hlLmZsaW5rLnRhYmxlLnNvdXJjZXMud21zdHJhdGVnaWVzLldhdGVybWFya1N0cm" +
"F0ZWd5mB_uSxDZ8-MCAAB4cA")
"F0ZWd57RqMYVyVWhUCAAB4cA")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not break tests in "porting" pull requests.

@dianfu
Copy link
Contributor Author

dianfu commented Mar 18, 2019

@twalthr Thanks a lot for the review. Considering that this change will break the API of FieldComputer and the serialization of Rowtime, I'd like to take the original solution and avoid the changes of ExternalCatalogBuilder. What do you think? I intended to avoid removing the method ExternalCatalogTable.builder() if possible (as done in the current PR), but it seems that removing it would be a nicer solution.

@dianfu dianfu force-pushed the port_ExternalCatalog branch from 683b836 to 1f5e941 Compare March 18, 2019 14:02
@twalthr
Copy link
Contributor

twalthr commented Mar 18, 2019

@dianfu thanks for your feedback. Yes, I think we should always aim for small and less invasive steps. The external catalog interfaces will be deprecated anyway soon. If we change the rowtime descriptor and field computer interfaces, we should do it as part of a big table source refactoring. The most important goal is to unblock Table and TableEnvironment refactoring tasks.

@dianfu
Copy link
Contributor Author

dianfu commented Mar 18, 2019

@twalthr Makes much sense to me. I have updated the PR and limit the changes to a minimal. Looking forward to your feedback. :)

Copy link
Contributor

@twalthr twalthr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update @dianfu. The changes look good to me. Will merge...
@flinkbot approve all

@asfgit asfgit closed this in 1eb818b Mar 19, 2019
HuangZhenQiu pushed a commit to HuangZhenQiu/flink that referenced this pull request Mar 20, 2019
HuangZhenQiu pushed a commit to HuangZhenQiu/flink that referenced this pull request Apr 22, 2019
sunhaibotb pushed a commit to sunhaibotb/flink that referenced this pull request May 8, 2019
@dianfu dianfu deleted the port_ExternalCatalog branch June 10, 2020 02:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants