Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-25990][table] Expose uid generator for DataStream/Transformation providers #18667

Closed
wants to merge 8 commits into from

Conversation

slinkydeveloper
Copy link
Contributor

What is the purpose of the change

Expose uid generator for DataStream/Transformation providers. Also update collect, print, kafka, hive and files connectors to properly set up uids for generated DataStreams/Transformations

@flinkbot
Copy link
Collaborator

flinkbot commented Feb 8, 2022

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@flinkbot
Copy link
Collaborator

flinkbot commented Feb 8, 2022

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Automated Checks

Last check on commit 8926e35 (Tue Feb 08 16:39:17 UTC 2022)

Warnings:

  • No documentation files were touched! Remember to keep the Flink docs up to date!

Mention the bot in a comment to re-run the automated checks.

Review Progress

  • ❓ 1. The [description] looks good.
  • ❓ 2. There is [consensus] that the contribution should go into to Flink.
  • ❓ 3. Needs [attention] from.
  • ❓ 4. The change fits into the overall [architecture].
  • ❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.


The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands
The @flinkbot bot supports the following commands:

  • @flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
  • @flinkbot approve all to approve all aspects
  • @flinkbot approve-until architecture to approve everything until architecture
  • @flinkbot attention @username1 [@username2 ..] to require somebody's attention
  • @flinkbot disapprove architecture to remove an approval you gave earlier

@slinkydeveloper
Copy link
Contributor Author

@flinkbot run azure

Copy link
Contributor

@twalthr twalthr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @slinkydeveloper. So far we have not added a single test to the uid business. I'm wondering whether it makes sense to have a temporary solution by adapting org.apache.flink.table.planner.utils.JsonPlanTestBase#compileSqlAndExecutePlan and traverse the transformations tree to check if all transformations have a uid assigned. What do you think?

Transformation<?> createTransformation(Context context);

/** Context for {@link #createTransformation(Context)}. */
interface Context {
interface Context extends ProviderContext {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to TransformationProviderContext?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this already implied by the fact that this context is within TransformationSinkProvider?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true, but if you implement a connector, you have a couple of Context classes and the code could become quite messy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then the solution is just to don't have static import for Context?

@@ -93,39 +98,40 @@ public ChangelogMode getChangelogMode(ChangelogMode requestedMode) {

@Override
public SinkRuntimeProvider getSinkRuntimeProvider(Context context) {
return (DataStreamSinkProvider)
inputStream -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did we break the API for lambdas now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes 😢 the only way to not break it is to not default implement the new method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes 😢 the only way to not break it is to not default implement the new method

…xt for generating uid and use it in DataStreamScanProvider, DataStreamSinkProvider, TransformationScanProvider, TransformationSinkProvider

Signed-off-by: slinkydeveloper <francescoguard@gmail.com>
Signed-off-by: slinkydeveloper <francescoguard@gmail.com>
Signed-off-by: slinkydeveloper <francescoguard@gmail.com>
Signed-off-by: slinkydeveloper <francescoguard@gmail.com>
Signed-off-by: slinkydeveloper <francescoguard@gmail.com>
Signed-off-by: slinkydeveloper <francescoguard@gmail.com>
…ormation uids

Signed-off-by: slinkydeveloper <francescoguard@gmail.com>
Copy link
Contributor

@twalthr twalthr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update @slinkydeveloper. I added the last comments. Should be good in the next iteration.

@@ -71,6 +71,7 @@ public void testDeduplication() throws Exception {
tableEnv.getConfig()
.getConfiguration()
.setInteger(ExecutionConfigOptions.TABLE_EXEC_RESOURCE_DEFAULT_PARALLELISM, 1);
checkTransformationUids(compiledPlan);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why not to use compileSqlAndExecutePlan? I think the configuration can also be set before without side effects?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I honestly didn't knew if this affects the plan or not, so i defaulted to just keeping the code as it is. I can change it if you want

}

@Test
public void testBatchTransformationScanProvider() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the tests are important, they test the propagation of boundedness. please readd them

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed offline, we can remove them as they're only testing mocks now

Transformation<?> createTransformation(Context context);

/** Context for {@link #createTransformation(Context)}. */
interface Context {
interface Context extends ProviderContext {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true, but if you implement a connector, you have a couple of Context classes and the code could become quite messy.

Signed-off-by: slinkydeveloper <francescoguard@gmail.com>
Copy link
Contributor

@twalthr twalthr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@twalthr twalthr closed this in 3adca15 Feb 11, 2022
@slinkydeveloper slinkydeveloper deleted the FLINK-25990 branch February 11, 2022 15:22
MrWhiteSike pushed a commit to MrWhiteSike/flink that referenced this pull request Mar 3, 2022
…ng uids in DataStream/Transformation Scan/SinkProvider

This might break existing implementations if the provider was implemented as a Java lambda. Please note that
the interface was not annotated as a @FunctionalInterface.

This closes apache#18667.
jnh5y pushed a commit to jnh5y/flink that referenced this pull request Dec 18, 2023
…ng uids in DataStream/Transformation Scan/SinkProvider

This might break existing implementations if the provider was implemented as a Java lambda. Please note that
the interface was not annotated as a @FunctionalInterface.

This closes apache#18667.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants