New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-13252] [KAFKA] Bump up Kafka to 0.9.0.0 #11143
Conversation
Jenkins, test this please. |
Code LGTM. |
Thanks for reviewing, @vanzin Much appreciated! |
@markgrover What is the compatibility story of Kafka 0.8 and 0.9 with existing Kafka installations? If Kafka 0.9 client API is not supposed to back compatible with Kafka 0.8 installations, then upgrading SS to use 0.9 is not a good idea. |
Test build #51000 has finished for PR 11143 at commit
|
Hi @tdas ! Thanks for reviewing. I talk about the compatibility story at length in the related PR #10953 (in particular, here) TLDR from that comment: We have two options:
Cons:
Option 2. Support both Apache Kafka 0.8 and 0.9+. Pros:
Cons:
Given that we are doing Spark 2.0 where we have the liberty of breaking away from old versions of Kafka, I proposed that we go with Option 1 and only support Kafka 0.9 release. Does that sound reasonable? |
Test build #51001 has finished for PR 11143 at commit
|
Test build #51004 has finished for PR 11143 at commit
|
FWIW, the IBM Cloud Message Hub service which is Kafka, has already moved to 0.9.0 , so i support option 1 that @markgrover suggests |
Thanks Mario. Do let us know what you think @tdas |
Test build #52012 has finished for PR 11143 at commit
|
Hi @tdas, I'd still appreciate if you could review this. Thanks! |
@koeninger may be a better reviewer at this point. |
Kafka isn't just a library dependency that a user can try out a new version of on a particular job and see if it works. It's an infrastructure component that the entire business may depend on. Upgrading running brokers is not risk-free. From a user point of view, I think it's better to have an artifact that works with spark 2.0 and kafka 0.8 brokers, even if that means moving the existing code to a spark-streaming-kafka08 subproject and doing new development on spark-streaming-kafka for kafka 0.9 (or, for that matter, kafka 0.10, which may be out by the time Spark 2.0 is ready) |
Thanks @srowen and @koeninger for your thoughts. I did take a look at what other projects are doing. In particular, I looked at Storm and Flume since they both depend on Kafka in ways similar to Spark streaming. Storm has this PR for bumping to Kafka 0.9 which is doing the same thing I am proposing in this PR - bumping Kafka from 0.8 to 0.9 without keeping support for 0.8. Flume is doing the same thing in FLUME-2855 - moving to support 0.9 without keeping 0.8 support. For both Storm and Flume, it's not even a major release, like it is the case for us. Long story short, I want to do the right thing for Spark, and I am more than happy to go implement the 2 subproject approach with maven profiles - one supporting Kafka 0.8 and other supporting Kafka 0.9. I also realize that Kafka is not a simple library dependency and there is a good chunk of operational overhead involved in folks upgrading to Kafka. However, the way things are looking right now, perhaps, that's the way Kafka community (cc @jkreps @gwenshap) wanted them to, folks essentially would have to move to Kafka 0.9 because of one of their other Kafka apps may need it. The reason I raised this discussion of getting rid of Kafka 0.8 support is because we will have carry the burden of supporting Kafka 0.8.x line for Spark 2.x line and now is our chance to think through this and drop support for Kafka 0.8, if we decide to. |
IMHO option 1 is cleaner. When Kafka went from 0.8 to 0.9, they did not leave the old high and low level consumer APIs around just because there are users that use them. If you want to upgrade to Kafka 0.9 you'll have to upgrade brokers and refactor your consumer clients to work with 0.9 API, simple as that. |
"When Kafka went from 0.8 to 0.9, they did not leave the old high and low level consumer APIs around just because there are users that use them." Kafka 0.9 still has the old high and low level consumers (and the old producers too). |
Correct but they are not compatible when using the 0.9 client with 0.8 brokers. So, your Kafka client app may need no code changes if you are using the old consumer API but you'd still have to upgrade your broker first before you upgrade your client. Anyways, since @koeninger and @tdas don't feel comfortable about my proposal upgrading to Kafka 0.9 (and dropping support for 0.8), I am working on updating this PR to support 0.8 and 0.9. It's not just a simple maven profile, because we use some non-private APIs (in AdminUtils, ZKClient, etc.) which have changed between 0.8 and 0.9, so a) I have to either do a shim layer, or b) change the code so it doesn't depend on those APIs. Will keep you all posted here. |
That's right @markgrover, the current approach used by Kafka preserves compatibility for users, but makes it a bit complicated for libraries/systems that want to support newer client features while still supporting older Kafka brokers. |
Agreeing with @mbonaci. Users who delay upgrading their Kafka brokers can continue using older versions of Spark. @markgrover If you have code that depends on AdminUtils, perhaps you will want to delay the version upgrade until Grant Henke's patches adding Admin APIs are in, and then at least you'll have a path for going off the semi-private APIs :) |
I really don't think it makes sense to discuss this PR outside of the context of SPARK-12177 and the approach taken for supporting the new consumer. Merging it as is would break things for end users without giving them any new features as a benefit, and nothing in Spark 2.0 that I'm aware of otherwise requires the kafka version bump. If SPARK-12177 ends up being implemented as a separate artifact, this PR isn't even necessary. If it ends up being implemented in the same artifact and breaking 0.8 compatibility, this PR is a small drop in the bucket compared to the overall code change and could be handled at that time. Regarding the semi-private apis, those are either in testing code, or in the receiver-based implementation which (so far) no one has shown a lot of interest in updating for the new consumer. |
@ijuma my bad, I was looking for |
This discussion is very closely tied to SPARK-12177 and so I agree with @koeninger that it makes sense for this to be decided in relation to SPARK-12177. For those interested, please follow that JIRA and its related PR #10953 |
Apache Kafka release 0.9.0.0 came out some time and we should add support for it. This JIRA is related to SPARK-12177 which is related to adding support for new consumer API only available starting v0.9.0.0
However, we should upgrade Kafka to 0.9.0.0 regardless of when (and before) the support for the new consumer API gets added.
We also use some non-public APIs from Kafka which have changed in 0.9.0.0 release. So, this change should also take care of updating those usages.