Skip to content

Conversation

@dannycranmer
Copy link
Contributor

What is the purpose of the change

Add an uber jar client library for Kinesis connector to be used with SQL client.

Brief change log

  • Added new maven module flink-sql-connector-kinesis
  • Updated E2E test to verify the shading of client library

Verifying this change

  • Verified manually using SQL client
    • ./sql-client.sh embedded --jar flink-sql-connector-kinesis_2.11-1.12-SNAPSHOT.jar
  • Updated flink-sql-client-test to include Kinesis client
  • Updated test_sql_client.sh to verify the shading of Kinesis jar
    • Legacy Table API is excluded for Kinesis since we are not supporting that

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): yes
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? yes
  • If yes, how is the feature documented? docs

<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>codegen-resources/**</exclude>
Copy link
Contributor Author

@dannycranmer dannycranmer Nov 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 4 excludes are bundled into the flink-connector-kinesis and result in the shading E2E check failing. I have removed them from here instead of the source connector because I am not 100% sure if they are used at runtime. For instance, it looks like the mime.types is used by S3, but not sure if this is actually used by connector. I think it is safer to leave them in the connector and remove from the SQL bundle, without performing a more thorough investigation. Thoughts?

Other options are:

  • Exclude from flink-connector-kinesis
  • Leave them in the shaded jar and allow in the E2E test

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they are used at runtime, I assume that you will

  • avoid runtime errors when depending on flink-connector-kinesis.
  • hit runtime errors when depending on flink-sql-connector-kinesis at runtime.

Is there a quick way to make the test pass without excluding those? If not, I maybe we can remove them from both and re-include if people start complaining (seems more consistent).

Copy link
Contributor Author

@dannycranmer dannycranmer Nov 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a quick way to make the test pass without excluding those?

Yes

Maybe we can remove them from both and re-include if people start complaining

Or this

Either option is a quick change, lets see what @twalthr 's preference is

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep them in flink-connector-kinesis and fully rely on Maven + manual dependency management there. For the SQL jars we do best effort dependency management to help less skilled users. But they always have the fallback to go to the regular flink-connector-kinesis.

@flinkbot
Copy link
Collaborator

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Automated Checks

Last check on commit 0a068c0 (Thu Nov 12 13:43:54 UTC 2020)

Warnings:

  • 4 pom.xml files were touched: Check for build and licensing issues.
  • Documentation files were touched, but no .zh.md files: Update Chinese documentation or file Jira ticket.

Mention the bot in a comment to re-run the automated checks.

Review Progress

  • ❓ 1. The [description] looks good.
  • ❓ 2. There is [consensus] that the contribution should go into to Flink.
  • ❓ 3. Needs [attention] from.
  • ❓ 4. The change fits into the overall [architecture].
  • ❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.

Details
The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands
The @flinkbot bot supports the following commands:

  • @flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
  • @flinkbot approve all to approve all aspects
  • @flinkbot approve-until architecture to approve everything until architecture
  • @flinkbot attention @username1 [@username2 ..] to require somebody's attention
  • @flinkbot disapprove architecture to remove an approval you gave earlier

@dannycranmer dannycranmer force-pushed the FLINK-20043 branch 2 times, most recently from d299a79 to a79cc4c Compare November 12, 2020 13:56
@flinkbot
Copy link
Collaborator

flinkbot commented Nov 12, 2020

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run travis re-run the last Travis build
  • @flinkbot run azure re-run the last Azure build

Copy link
Contributor

@aalexandrov aalexandrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Changes of the *.md files should be mirrored in the (currently untranslated) corresponding *.zh.md files of the Chinese documentation.

<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>codegen-resources/**</exclude>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they are used at runtime, I assume that you will

  • avoid runtime errors when depending on flink-connector-kinesis.
  • hit runtime errors when depending on flink-sql-connector-kinesis at runtime.

Is there a quick way to make the test pass without excluding those? If not, I maybe we can remove them from both and re-include if people start complaining (seems more consistent).

@twalthr
Copy link
Contributor

twalthr commented Nov 12, 2020

@rmetzger do you think it is ok to still merge this for 1.12? We have merged the Kinesis connector but it seems the JAR itself is not ready to be used for the SQL Client. One could argue that this is kind of a feature but it also fixes dependencies and prevents user questions around how to use the Kinesis table sources/sinks. We would avoid confusion if we merge this.

@rmetzger
Copy link
Contributor

I'm okay with making a feature freeze exception here for the following reasons:

  • the change is easy to revert
  • it is unlikely to cause any overall system stability issues
  • the biggest risk are licensing issues, but they are checked automatically and also manually at a later stage in the release process.

@aalexandrov
Copy link
Contributor

aalexandrov commented Nov 12, 2020

Thanks, @rmetzger

the biggest risk are licensing issues, but they are checked automatically and also manually at a later stage in the release process. FYI, I already did one manual check of the licenses in the corresponding FLINK-20043 Jira.

@dannycranmer
Copy link
Contributor Author

@flinkbot run azure

@sjwiesman
Copy link
Contributor

@dannycranmer I just merged in #14036 which adds an autogenerated downloads page for sql connectors. Please rebase your PR and make two changes to your docs:

  1. Please add kinesis to the list of connectors to the downloads list

  2. Replace your dependencies table in kinesis.md with the following liquid tags. This will autogenerate the table based on the above yaml.

{% assign connector = site.data.sql-connectors['kinesis'] %} 
{% include sql-connector-download-table.html 
    connector=connector
%}

@dannycranmer
Copy link
Contributor Author

Rebased and updated as per @sjwiesman comment.

Copy link
Contributor

@twalthr twalthr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last comment from my side. I will merge this on Monday if there are no further objections.

@twalthr
Copy link
Contributor

twalthr commented Nov 16, 2020

@dannycranmer I was about to merge this PR but running the SQL client test script locally failed. It seems the JAR still contains the dependencies for org.reactivestreams?

@dannycranmer
Copy link
Contributor Author

@twalthr that is odd, it passes on my machine and I can see the reactive streams lib shaded and relocated correctly. Maybe :

unzip -l flink-sql-connector-kinesis_2.11-1.12-SNAPSHOT.jar | grep reactivestreams
        0  11-12-2020 13:20   org/apache/flink/kinesis/shaded/org/reactivestreams/
      510  11-12-2020 13:20   org/apache/flink/kinesis/shaded/org/reactivestreams/Processor.class
      388  11-12-2020 13:20   org/apache/flink/kinesis/shaded/org/reactivestreams/Publisher.class
      440  11-12-2020 13:20   org/apache/flink/kinesis/shaded/org/reactivestreams/Subscriber.class
      205  11-12-2020 13:20   org/apache/flink/kinesis/shaded/org/reactivestreams/Subscription.class

The shade pattern looks correct on the connector project. Maybe you need to rebuild the connector jar locally?

@twalthr
Copy link
Contributor

twalthr commented Nov 16, 2020

I will recheck. But actually I build the entire Flink dist on my machine.

@dannycranmer
Copy link
Contributor Author

As discussed, the issue was when building from the parent project (flink-connectors) it seemed to use the full dependency graph rather than the reduced (shaded dependency graph). I have changed the shading configuration to explicitly list the required shaded dependencies, reran the E2E test and re-verified within SQL client. Also checked the shade plugin is including the same set of dependencies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants