Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-7474] [ Streaming Connectors] AzureEventhubs-connector, support read from and write to Azure eventhubs #4535

Open
wants to merge 134 commits into
base: master
Choose a base branch
from

Conversation

zhuganghuaonnet
Copy link

Thank you very much for contributing to Apache Flink - we are happy that you want to help us improve Flink. To help the community review your contribution in the best possible way, please go through the checklist below, which will get the contribution into a shape in which it can be best reviewed.

Please understand that we do not do this to make contributions to Flink a hassle. In order to uphold a high standard of quality for code contributions, while at the same time managing a large number of contributions, we need contributors to prepare the contributions well, and give reviewers enough contextual information for the review. Please also understand that contributions that do not follow this guide will take longer to review and thus typically be picked up with lower priority by the community.

Contribution Checklist

  • Make sure that the pull request corresponds to a JIRA issue. Exceptions are made for typos in JavaDoc or documentation files, which need no JIRA issue.

  • Name the pull request in the form "[FLINK-1234] [component] Title of the pull request", where FLINK-1234 should be replaced by the actual issue number. Skip component if you are unsure about which is the best component.
    Typo fixes that have no associated JIRA issue should be named following this pattern: [hotfix] [docs] Fix typo in event time introduction or [hotfix] [javadocs] Expand JavaDoc for PuncuatedWatermarkGenerator.

  • Fill out the template below to describe the changes contributed by the pull request. That will give reviewers the context they need to do the review.

  • Make sure that the change passes the automated tests, i.e., mvn clean verify passes. You can set up Travis CI to do that following this guide.

  • Each pull request should address only one issue, not mix up code from multiple issues.

  • Each commit in the pull request has a meaningful commit message (including the JIRA id)

  • Once all items of the checklist are addressed, remove the above text and this checklist, leaving only the filled out template below.

(The sections below can be removed for hotfixes of typos)

What is the purpose of the change

(For example: This pull request makes task deployment go through the blob server, rather than through RPC. That way we avoid re-transferring them on each deployment (during recovery).)

Brief change log

(for example:)

  • The TaskInfo is stored in the blob store on job creation time as a persistent artifact
  • Deployments RPC transmits only the blob storage reference
  • TaskManagers retrieve the TaskInfo from the blob cache

Verifying this change

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end deployment with large payloads (100MB)
  • Extended integration test for recovery after master (JobManager) failure
  • Added test that validates that TaskInfo is transferred only once across recoveries
  • Manually verified the change by running a 4 node cluser with 2 JobManagers and 4 TaskManagers, a stateful streaming program, and killing one JobManager and two TaskManagers during the execution, verifying that recovery happens correctly.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
  • The serializers: (yes / no / don't know)
  • The runtime per-record code paths (performance sensitive): (yes / no / don't know)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / no / don't know)

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@tzulitai
Copy link
Contributor

Hi @zhuganghuaonnet, thanks a lot for this contribution!
However, taking a step back, is there a JIRA ticket for this contribution?
If not, please make sure that one is opened for it first. Also, for bigger contributions like this, you should make sure that the pull request template is properly filled out to ease the effort for reviewers.

Nico Kruber and others added 3 commits August 14, 2017 11:06
[FLINK-7056][blob] refactor the new API for job-related BLOBs

For a cleaner API, instead of having a nullable jobId parameter, use two methods:
one for job-related BLOBs, another for job-unrelated ones.

This closes apache#4237.
Since the generated project is an sbt project, it is
much more straightfoward for the user to create it with
the new "sbt new" command than by using giter8 (which
requires to install giter8 just for that purpose).

This closes apache#4531.
The TestingSerialRpcService produces thread interleavings which are not happening
when being executed with a proper RpcService implementation. Due to this the test
cases can fail or succeed wrongly. In order to avoid this problem, this commit
removes the TestingSerialRpcService and adapts all existing tests which used it
before.

Remove TestingSerialRpcService from MesosResourceManagerTest

Remove TestingSerialRpcService from ResourceManagerJobMasterTest

Remove TestingSerialRpcService from ResourceManagerTaskExecutorTest

Remove TestingSerialRpcService from ResourceManagerTest

Remove SerialTestingRpcService from JobMasterTest

Remove TestingSerialRpcService from TaskExecutorITCase

Remove TestingSerialRpcService from TaskExecutorTest

Remove TestingSerialRpcService from SlotPoolTest

Delete TestingSerialRpcService

This closes apache#4516.
@zhuganghuaonnet
Copy link
Author

zhuganghuaonnet commented Aug 14, 2017 via email

@greghogan
Copy link
Contributor

@zhuganghuaonnet we should also mention that new connectors are being contributed through the Flink release of Apache Bahir.

@tzulitai
Copy link
Contributor

@greghogan I don't think there's actually a consensus that all new connectors should strictly go to Apache Bahir. It still depends case by case, and brought to discussion in the mailing lists if necessary.

We have a growing base of Azure users, and I think it would be nice to have better Azure support. Having a connector for their mainly used event source would be a nice addition, IMO. What do you think?

bowenli86 and others added 5 commits August 15, 2017 12:38
…Case

This helps tease out races, for example the recently discovered one in
cleanup of incremental state handles at the SharedStateRegistry.

(cherry picked from commit d7683cc)
@greghogan
Copy link
Contributor

@tzulitai this does sound like a good candidate for inclusion in Flink. Perhaps a separate flink-connectors repo would work better than Apache Bahir.

@tzulitai
Copy link
Contributor

tzulitai commented Aug 16, 2017

@greghogan yes, a separate flink-connectors idea has always been a possible approach in the community, but AFAIK we haven't fully committed to it yet. I can continue to shepherd this contribution to be merged to Flink, and move it to flink-connectors when that eventually happens.

@tzulitai
Copy link
Contributor

@zhuganghuaonnet please let me know when you've opened the JIRA, and rebased this pull request onto the latest master. Also please remember to rename the pull request title (you can take a look at the other pull request titles as reference) and fill out the pull request template as necessary :)

Thanks a lot!

tzulitai and others added 5 commits September 7, 2017 12:54
…on schema in FlinkKinesisConsumer

This commit also adds tests for verifying that the FlinkKinesisConsumer
itself is serializable.
…oner is serializable in FlinkKinesisProducer

This commit also adds a test to verify that the FlinkKinesisProducer is
serializable.

This closes apache#4537.
- Improve deprecation message in Javadocs
- Remove usage of ProducerConfigConstants in code wherever possible
- Remove usage of ProducerConfigConstants in documentation code snippets

This closes apache#4473.
@tzulitai
Copy link
Contributor

@zhuganghuaonnet it doesn't seem like the PR is correctly rebased. Ideally, you should rebase your development branch to the latest master, and the PR only contains for your changes, and not others.

And yes, please include necessary unit tests / integration tests along with the PR!

…into eventhubs

# Conflicts:
#	flink-yarn/src/main/java/org/apache/flink/yarn/AbstractYarnClusterDescriptor.java
…tatus and waiting forevet if eventhubproducerthread quit
@yew1eb
Copy link
Contributor

yew1eb commented Sep 30, 2017

Hi @zhuganghuaonnet , you should drop others commits and squash your commits.
🍻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet