Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-20628][connectors/rabbitmq2] RabbitMQ connector using new connector API #15140

Closed
wants to merge 2 commits into from

Conversation

pscls
Copy link

@pscls pscls commented Mar 10, 2021

What is the purpose of the change

This pull request ports the RabbitMQ connector implementation to the new Connector’s API described in FLIP-27 and FLIP-143. It includes both source and sink with at-most-once, at-least-once, and exactly-once behavior respectively.

This pull request closes the following issues (separated RabbitMQ connector Source and Sink tickets): FLINK-20628 and FLINK-21373

Brief change log

  • Source and Sink use the RabbitMQ’s Java Client API to interact with RabbitMQ
  • The RabbitMQ Source reads messages from a queue
  • At-least-once
    • Messages are acknowledged on checkpoint completion
  • Exactly-once
    • Messages are acknowledged in a transaction
  • The user has to set correlation ids for deduplication
  • The RabbitMQ Sink publishes messages to a queue
  • At-least-once
    • Unacknowledged messages are resend on checkpoints
  • Exactly-once
    • Messages between two checkpoints are published in a transaction

Verifying this change

This change added tests and can be verified as follows:

All changes are within the flink-connectors/flink-connector-rabbitmq2/ module.
Added Integration Tests can be find under org.apache.flink.connector.rabbitmq2.source and org.apache.flink.connector.rabbitmq2.sink package in the test respective directories.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (yes)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
  • The serializers: (don't know)
  • The runtime per-record code paths (performance sensitive): (don't know)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (don't know)
  • The S3 file system connector: (no)

Documentation

  • Does this pull request introduces a new feature? (yes)
  • If yes, how is the feature documented? (JavaDocs)

@flinkbot
Copy link
Collaborator

flinkbot commented Mar 10, 2021

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Automated Checks

Last check on commit 4d34b9b (Fri May 28 09:03:52 UTC 2021)

Warnings:

  • 2 pom.xml files were touched: Check for build and licensing issues.
  • No documentation files were touched! Remember to keep the Flink docs up to date!
  • This pull request references an unassigned Jira ticket. According to the code contribution guide, tickets need to be assigned before starting with the implementation work.

Mention the bot in a comment to re-run the automated checks.

Review Progress

  • ❓ 1. The [description] looks good.
  • ❓ 2. There is [consensus] that the contribution should go into to Flink.
  • ❓ 3. Needs [attention] from.
  • ❓ 4. The change fits into the overall [architecture].
  • ❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.


The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands
The @flinkbot bot supports the following commands:

  • @flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
  • @flinkbot approve all to approve all aspects
  • @flinkbot approve-until architecture to approve everything until architecture
  • @flinkbot attention @username1 [@username2 ..] to require somebody's attention
  • @flinkbot disapprove architecture to remove an approval you gave earlier

@flinkbot
Copy link
Collaborator

flinkbot commented Mar 10, 2021

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

Copy link

@fapaul fapaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great work! I did a first pass and a few things need some improments

  • Remove custom failover mechanics from sink (not your fault but we decided that Flink should handle this in the feature)
  • Exception Handling: Usually IOException are handled by Flink can be safely propagated. All other exception should use a RuntimeException but not FlinkRuntimeException because it is meant to be used in internal components.
  • Threading in the source reader: is not safe yet please revisit the threading model and check whether object are accessed by different threads concurrently.
  • Remove any sleeps from the integration tests and use proper synchronization.

@pscls pscls force-pushed the rabbitmq-connector-new-api branch 2 times, most recently from 39aa4d5 to 828662a Compare March 16, 2021 14:14
Copy link

@fapaul fapaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are almost done, great job! A few overarching things you still have to address:

  • Revisit the visibility of classes/interfaces/enums only make public which should be used by a user (or necessarily need to be...)
  • Add not null check to constructor arguments to prevent unexpected null pointer exceptions
  • Use MiniCluster for testing and do not use the environment directly. This should allow you to ease a lot of the testing setup and get rid of all static magic.

@pscls pscls force-pushed the rabbitmq-connector-new-api branch from 0c76cee to 1feef00 Compare March 17, 2021 15:10
@pscls
Copy link
Author

pscls commented Mar 17, 2021

@flinkbot run azure

1 similar comment
@pscls
Copy link
Author

pscls commented Mar 17, 2021

@flinkbot run azure

Copy link

@fapaul fapaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CI failure is unrelated to your changes. It is caused by some infrastructure problems.
I left some minor cleanup comments for the test code but after they are addressed, IMO this PR looks good 👍

Copy link

@fapaul fapaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot one thing... Please have two separate commits for the final submission and distribute the changes of the review commit accordingly.

@pscls pscls force-pushed the rabbitmq-connector-new-api branch from 1feef00 to a5cf90e Compare March 18, 2021 16:43
Copy link

@fapaul fapaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my side, this PR is good to go % some nitty last feedback. We now have to wait until an official apache committer approves and possibly merges this PR.

Good job, and thanks for all your effort!

P.S. After you did the changes, can you rebase to the latest master? I think the CI failure is already fixed on the master branch and the CI will finally be green :)

@pscls
Copy link
Author

pscls commented Mar 19, 2021

@flinkbot run azure

@fapaul
Copy link

fapaul commented Mar 19, 2021

Test failure is unrelated https://issues.apache.org/jira/browse/FLINK-21879

@fapaul
Copy link

fapaul commented Apr 8, 2021

@flinkbot run azure

1 similar comment
@fapaul
Copy link

fapaul commented Apr 12, 2021

@flinkbot run azure

@RocMarshal
Copy link
Contributor

Excuse me, @pscls Are you still working for this PR ?

@pscls
Copy link
Author

pscls commented Dec 22, 2021

Excuse me, @pscls Are you still working for this PR ?

@RocMarshal Not really, we finished it 9 months ago with @fapaul's requested changes. Since then nothing happened on our side.

@RocMarshal
Copy link
Contributor

RocMarshal commented Dec 22, 2021

@pscls Cool! Would you mind rabasing it from the latest master branch ? Thank you.

…Source API

RabbitMQ Connector using new Source API
https://issues.apache.org/jira/browse/FLINK-20628

Co-authored-by: Yannik SchrÃder <schroeder_yannik@web.de>
Co-authored-by: Pascal Schulze <pascal.schulze@student.hpi.uni-potsdam.de>
@pscls
Copy link
Author

pscls commented Dec 22, 2021

@flinkbot run azure

Copy link
Contributor

@RocMarshal RocMarshal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pscls Thanks for the update. I left some comments. Please let me know what's your opinion.

@SteNicholas
Copy link
Member

@pscls , thanks for your contribution of the new RabbitMQ connector. I left my minor comments and you should upgrade the amqp-client version for vulnerabilities.

@SteNicholas
Copy link
Member

@pscls , thanks for your updates. In general the changes look good to me and I have still left certain comments. Please take a look.

@SteNicholas
Copy link
Member

SteNicholas commented Dec 28, 2021

LGTM. @fapaul , could you help to merge this pull request? And @RocMarshal would implement the TableSource and TableSink based on this.

… Sink API

RabbitMQ Connector using new Sink API https://issues.apache.org/jira/browse/FLINK-21373

Co-authored-by: Yannik SchrÃder <schroeder_yannik@web.de>
Co-authored-by: Jan Westphal <jan.westphal306@gmail.com>
@pscls pscls force-pushed the rabbitmq-connector-new-api branch from 092da6c to 9ef2375 Compare January 2, 2022 20:29
@fapaul
Copy link

fapaul commented Jan 3, 2022

Sorry for the late reply. Thanks for the additional efforts from all sides.

We are currently finalizing the discussions around the external connector repository and thought this PR would be a good candidate since it is already reviewed and has decent test coverage.

@RocMarshal @SteNicholas Is it okay for you to leave the PR unmerged for now or do you need this connector urgently?

@SteNicholas
Copy link
Member

@fapaul , @RocMarshal would like to work for the table source and sink for the RabbitMQ connector. IMO, if the PR is unmerged for now, @RocMarshal should implement the table source and sink based on this branch.

@RocMarshal
Copy link
Contributor

Thank you @SteNicholas @fapaul .
If it is determined that it will be be merged, IMO, I would do the table source and sink work based on this PR, which will enable the two PR to undertake. Instead of making some changes again for intermediate changes when refusing this PR.
So, we only need to focus on whether the current PR can be merged to decide whether we can continue the next work based on this PR. Please let me know what's your opinion.

@MartijnVisser
Copy link
Contributor

@SteNicholas @RocMarshal I'm a bit hesitant to merge this PR because we are getting close to the release cut of Flink 1.15 and new connectors are quite often a source of flakiness. The CI doesn't check all use cases (like the Java 11 tests), so merging it could result in flakiness which I would rather avoid at this point.

I guess we could either postponing the merging of this PR until the release 1.15 branch has been cut (in a couple of weeks) or we could consider already moving this entire PR to its own external connector repo, as is the current plan already. Something like github.com/apache/flink-connector-rabbitmq - We could use that repo to also test out the testing infrastructure that we need for external connector repositories anyway.

Let me know what you think.

@SteNicholas
Copy link
Member

@MartijnVisser , the point mentioned above makes sense to me. Therefore, at present @RocMarshal could introduce the table source and sink based on the branch of this pull request. Thus, this pull request and the table implementation would be merged for Flink 1.16 version.

@MartijnVisser
Copy link
Contributor

@SteNicholas Sounds good. Keep in mind that we might not want to merge in new connectors in 1.16, depending on how quickly the external connector discussion is progressing and the building blocks will be delivered. It could actually be that we start moving out connectors in 1.16.

@MartijnVisser
Copy link
Contributor

@pscls Thanks for your patience! We've started this week with our first external connector repo project, which is moving out the Elasticsearch connector from this repository to https://github.com/apache/flink-connector-elasticsearch

I think it would be best to first get that one moved out, so we can understand the actual issues that we might run into. When that one is done, I propose to create a dedicated repo for RabbitMQ and move this code to that repo. What do you think?

CC @SteNicholas @RocMarshal

@pscls
Copy link
Author

pscls commented Mar 30, 2022

@pscls Thanks for your patience! We've started this week with our first external connector repo project, which is moving out the Elasticsearch connector from this repository to apache/flink-connector-elasticsearch

I think it would be best to first get that one moved out, so we can understand the actual issues that we might run into. When that one is done, I propose to create a dedicated repo for RabbitMQ and move this code to that repo. What do you think?

CC @SteNicholas @RocMarshal

Sounds good to me. Just ping me if there's anything to do from my side.

@MartijnVisser
Copy link
Contributor

@pscls There's now https://github.com/apache/flink-connector-rabbitmq - Would you like to move this PR to that repo, so we can merge it there?

@pscls
Copy link
Author

pscls commented May 23, 2022

@pscls There's now apache/flink-connector-rabbitmq - Would you like to move this PR to that repo, so we can merge it there?

@MartijnVisser This PR is now moved into the repository new apache/flink-connector-rabbitmq and can be found here: flink-connector-rabbitmq/pull/1

@MartijnVisser
Copy link
Contributor

Closing this PR since the connector has been moved to https://github.com/apache/flink-connector-rabbitmq and this PR is now available at apache/flink-connector-rabbitmq#1 - It would be great if we can finish this over there :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
9 participants