Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-15479]Override explainSource method for JDBCTableSource #10769

Merged
merged 8 commits into from Jan 19, 2020

Conversation

wangxlong
Copy link
Contributor

What is the purpose of the change

Override explainSource method for JDBCTableSource.

Brief change log

Add explainSource for JDBCTableSource.

Verifying this change

(Please pick either of the following options)
This change is a trivial work without any test coverage.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
  • The serializers: (no)
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: ( no)
  • The S3 file system connector: ( no)

Documentation

  • Does this pull request introduce a new feature? (no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@flinkbot
Copy link
Collaborator

flinkbot commented Jan 5, 2020

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Automated Checks

Last check on commit 7fe9e7e (Sun Jan 05 14:45:13 UTC 2020)

Warnings:

  • No documentation files were touched! Remember to keep the Flink docs up to date!
  • This pull request references an unassigned Jira ticket. According to the code contribution guide, tickets need to be assigned before starting with the implementation work.

Mention the bot in a comment to re-run the automated checks.

Review Progress

  • ❓ 1. The [description] looks good.
  • ❓ 2. There is [consensus] that the contribution should go into to Flink.
  • ❓ 3. Needs [attention] from.
  • ❓ 4. The change fits into the overall [architecture].
  • ❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.


The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands
The @flinkbot bot supports the following commands:

  • @flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
  • @flinkbot approve all to approve all aspects
  • @flinkbot approve-until architecture to approve everything until architecture
  • @flinkbot attention @username1 [@username2 ..] to require somebody's attention
  • @flinkbot disapprove architecture to remove an approval you gave earlier

@flinkbot
Copy link
Collaborator

flinkbot commented Jan 5, 2020

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run travis re-run the last Travis build
  • @flinkbot run azure re-run the last Azure build

@wuchong
Copy link
Member

wuchong commented Jan 6, 2020

@wangxlong could you add a test for jdbc project push down?

@godfreyhe
Copy link
Contributor

I think we should fix the default implementation of explainSource in TableSource which returns the field names from getProducedDataType instead of getTableSchema. Table schema is a logical description of a table and should not be part of the physical TableSource, while the value of explainSource should be changed with optimization (e.g. projection push down)

Copy link
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @wangxlong , As @wuchong suggestion, can you add tests to verify?

I think we should fix the default implementation of explainSource in TableSource

+1, but we can fix it in 1.11? It is non-minor change, better to do it in 1.11, we can create a JIRA to discuss it more.

@wangxlong
Copy link
Contributor Author

Thank you for your advice@wuchong @godfreyhe @JingsongLi . It is my duty to add a jdbc project push down test to verify. I am ok to fix it in 1.11.
From my side, there are two aspects need to be discussed:
1、How to fix it. I agree with @godfreyhe, we should fix the default implementation of explainSource in TableSource, just like follow:
List<String> fieldNames = ((RowType) getProducedDataType().getLogicalType()).getFieldNames(); return TableConnectorUtils.generateRuntimeName( getClass(), fieldNames.stream().toArray(String[]::new));
2、How to add a test to verify it.
I found there is a test to verify jdbc push down in JDBCTableSourceSinkFactoryTest#testJDBCWithFilter. I think the name of method maybe not good. We can change it to testJDBCFieldsProjection. On the basis of this test, we can add some code to verify source description just like follow:
List<String> fieldNames = ((RowType) actual.getProducedDataType().getLogicalType()).getFieldNames(); String expectedSourceDescription = actual.getClass().getSimpleName() + "(" + String.join(", ", fieldNames.stream().toArray(String[]::new)) + ")"; assertEquals(expectedSourceDescription ,actual.explainSource());
Thank you all, I also comment in the relative jira. Looking forward to your reply.

@JingsongLi
Copy link
Contributor

Hi @wangxlong , What I mean:

  1. If we fix it in JDBCTableSource, it should be in 1.10 and 1.9.
  2. If we fix it in TableSource, it should be in 1.11.

Option 1 and Option 2 can be two things to finish. (We can do option 1 in 1.10 and 1.9, do option 2 in master)

@wangxlong
Copy link
Contributor Author

wangxlong commented Jan 6, 2020

Hi @wangxlong , What I mean:

  1. If we fix it in JDBCTableSource, it should be in 1.10 and 1.9.
  2. If we fix it in TableSource, it should be in 1.11.

Option 1 and Option 2 can be two things to finish. (We can do option 1 in 1.10 and 1.9, do option 2 in master)

@JingsongLi I am sorry. It is my misunderstanding. I have added a test.

Should I close this, and then open two pull request against release-1.9 and release-1.10? Or we can also do option1 on master, This maybe not conflict with option2?

BTW, Should I open a JIRA to discuss option2?

@JingsongLi
Copy link
Contributor

@wangxlong Yes, you can.
LGTM +1
CC: @wuchong

// test jdbc table source description
List<String> fieldNames = ((RowType) actual.getProducedDataType().getLogicalType()).getFieldNames();
String expectedSourceDescription = actual.getClass().getSimpleName()
+ "(" + String.join(", ", fieldNames.stream().toArray(String[]::new)) + ")";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add an integreation test to verify the JDBC source can work when project is pushed down.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wuchong Good idea. I have updated.

@JingsongLi
Copy link
Contributor

ping @wuchong

Copy link
Member

@wuchong wuchong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great effort @wangxlong .
Regarding to the integration tests, I have some suggestions:

  1. Could you use Blink planner instead of Old planner? Because Old planner is deprecated, and all the new features are added to Blink planner.
  2. Could you use DDL to create JDBC tables to have a better coverage? registerTableSource is derpected.
  3. Could you move the instantiation of tEnv to static member field? This can avoid repetitive SPI discovery.

/**
* IT case for {@link JDBCTableSource}.
*/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove empty line.

@wangxlong
Copy link
Contributor Author

Thanks @wuchong, I have updated.

" 'connector.driver' = 'org.apache.derby.jdbc.EmbeddedDriver' " +
")";

static {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use @BeforeClass instead of static code block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Done.

<groupId>org.apache.flink</groupId>
<artifactId>flink-table-planner-blink_${scala.binary.version}</artifactId>
<version>${project.version}</version>
<scope>provided</scope>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we set the scope only for test?

Copy link
Contributor Author

@wangxlong wangxlong Jan 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, It is ok to set scope as test. But I don't know why set provide in old planner before.

@wangxlong
Copy link
Contributor Author

ping @wuchong

Copy link
Member

@wuchong wuchong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @wangxlong for the effort, LGTM.

@wuchong wuchong merged commit 7ccc5b3 into apache:master Jan 19, 2020
wuchong pushed a commit that referenced this pull request Jan 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants