Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-13252] [KAFKA] Bump up Kafka to 0.9.0.0 #11143

Closed
wants to merge 3 commits into from

Conversation

markgrover
Copy link
Member

Apache Kafka release 0.9.0.0 came out some time and we should add support for it. This JIRA is related to SPARK-12177 which is related to adding support for new consumer API only available starting v0.9.0.0
However, we should upgrade Kafka to 0.9.0.0 regardless of when (and before) the support for the new consumer API gets added.
We also use some non-public APIs from Kafka which have changed in 0.9.0.0 release. So, this change should also take care of updating those usages.

@markgrover markgrover changed the title SPARK-13252: Bump up Kafka to 0.9.0.0 [SPARK-13252] Bump up Kafka to 0.9.0.0 Feb 9, 2016
@markgrover
Copy link
Member Author

Jenkins, test this please.

@markgrover markgrover changed the title [SPARK-13252] Bump up Kafka to 0.9.0.0 [SPARK-13252] [KAFKA] Bump up Kafka to 0.9.0.0 Feb 9, 2016
@vanzin
Copy link
Contributor

vanzin commented Feb 9, 2016

Code LGTM.

@markgrover
Copy link
Member Author

Thanks for reviewing, @vanzin Much appreciated!

@tdas
Copy link
Contributor

tdas commented Feb 9, 2016

@markgrover What is the compatibility story of Kafka 0.8 and 0.9 with existing Kafka installations? If Kafka 0.9 client API is not supposed to back compatible with Kafka 0.8 installations, then upgrading SS to use 0.9 is not a good idea.

@SparkQA
Copy link

SparkQA commented Feb 10, 2016

Test build #51000 has finished for PR 11143 at commit f46b5c8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@markgrover
Copy link
Member Author

Hi @tdas ! Thanks for reviewing.

I talk about the compatibility story at length in the related PR #10953 (in particular, here)

TLDR from that comment:
The status quo is only supporting 0.8 which will work with 0.8 or later brokers. There is definitely tons of interest in 0.9's new consumer API, so we will have to support that imminently (#10953). So, the question that really remains is, which version(s) do we support the Kafka old consumer API from?

We have two options:
Option 1. Support Apache 0.9 and later ONLY.
Pros:

  • Easy to manage and easy to support, since there is only one version of Kafka that's support.
  • No changes required from users to their Spark apps.
  • Kafka community is pushing all their users to move to 0.9.0.

Cons:

  • Users will have to upgrade Kafka brokers to Kafka 0.9

Option 2. Support both Apache Kafka 0.8 and 0.9+.

Pros:

  • Support for both 0.8 and 0.9 brokers

Cons:

  • Build management - we'll likely need a 2 maven profiles, 2 assemblies, 2 builds, actual code duplication, to publish two flavors of the kafka integration artifacts. Decide what will be our default, etc.
  • We'll have to this around for at least another major release i.e. until Spark 3.0.

Given that we are doing Spark 2.0 where we have the liberty of breaking away from old versions of Kafka, I proposed that we go with Option 1 and only support Kafka 0.9 release. Does that sound reasonable?

@SparkQA
Copy link

SparkQA commented Feb 10, 2016

Test build #51001 has finished for PR 11143 at commit f46b5c8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 10, 2016

Test build #51004 has finished for PR 11143 at commit a1cf44e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mariobriggs
Copy link
Contributor

FWIW, the IBM Cloud Message Hub service which is Kafka, has already moved to 0.9.0 , so i support option 1 that @markgrover suggests

@markgrover
Copy link
Member Author

Thanks Mario. Do let us know what you think @tdas

@SparkQA
Copy link

SparkQA commented Feb 26, 2016

Test build #52012 has finished for PR 11143 at commit d3951e0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@markgrover
Copy link
Member Author

Hi @tdas, I'd still appreciate if you could review this. Thanks!

@srowen
Copy link
Member

srowen commented Feb 26, 2016

@koeninger may be a better reviewer at this point.
I suspect that 2.0.0 is the only opportunity in the near future to move to 0.9, so I think it's a good idea. Any objections?

@koeninger
Copy link
Contributor

Kafka isn't just a library dependency that a user can try out a new version of on a particular job and see if it works.

It's an infrastructure component that the entire business may depend on. Upgrading running brokers is not risk-free.

From a user point of view, I think it's better to have an artifact that works with spark 2.0 and kafka 0.8 brokers, even if that means moving the existing code to a spark-streaming-kafka08 subproject and doing new development on spark-streaming-kafka for kafka 0.9 (or, for that matter, kafka 0.10, which may be out by the time Spark 2.0 is ready)

@markgrover
Copy link
Member Author

Thanks @srowen and @koeninger for your thoughts.

I did take a look at what other projects are doing. In particular, I looked at Storm and Flume since they both depend on Kafka in ways similar to Spark streaming.

Storm has this PR for bumping to Kafka 0.9 which is doing the same thing I am proposing in this PR - bumping Kafka from 0.8 to 0.9 without keeping support for 0.8.

Flume is doing the same thing in FLUME-2855 - moving to support 0.9 without keeping 0.8 support.

For both Storm and Flume, it's not even a major release, like it is the case for us.

Long story short, I want to do the right thing for Spark, and I am more than happy to go implement the 2 subproject approach with maven profiles - one supporting Kafka 0.8 and other supporting Kafka 0.9. I also realize that Kafka is not a simple library dependency and there is a good chunk of operational overhead involved in folks upgrading to Kafka. However, the way things are looking right now, perhaps, that's the way Kafka community (cc @jkreps @gwenshap) wanted them to, folks essentially would have to move to Kafka 0.9 because of one of their other Kafka apps may need it. The reason I raised this discussion of getting rid of Kafka 0.8 support is because we will have carry the burden of supporting Kafka 0.8.x line for Spark 2.x line and now is our chance to think through this and drop support for Kafka 0.8, if we decide to.

@mbonaci
Copy link
Contributor

mbonaci commented Mar 1, 2016

IMHO option 1 is cleaner.
Users who have to postpone upgrading Kafka brokers have the option of using older versions of Spark.

When Kafka went from 0.8 to 0.9, they did not leave the old high and low level consumer APIs around just because there are users that use them. If you want to upgrade to Kafka 0.9 you'll have to upgrade brokers and refactor your consumer clients to work with 0.9 API, simple as that.

@ijuma
Copy link
Contributor

ijuma commented Mar 1, 2016

"When Kafka went from 0.8 to 0.9, they did not leave the old high and low level consumer APIs around just because there are users that use them."

Kafka 0.9 still has the old high and low level consumers (and the old producers too).

@markgrover
Copy link
Member Author

Kafka 0.9 still has the old high and low level consumers (and the old producers too).

Correct but they are not compatible when using the 0.9 client with 0.8 brokers. So, your Kafka client app may need no code changes if you are using the old consumer API but you'd still have to upgrade your broker first before you upgrade your client.

Anyways, since @koeninger and @tdas don't feel comfortable about my proposal upgrading to Kafka 0.9 (and dropping support for 0.8), I am working on updating this PR to support 0.8 and 0.9. It's not just a simple maven profile, because we use some non-private APIs (in AdminUtils, ZKClient, etc.) which have changed between 0.8 and 0.9, so a) I have to either do a shim layer, or b) change the code so it doesn't depend on those APIs. Will keep you all posted here.

@ijuma
Copy link
Contributor

ijuma commented Mar 1, 2016

That's right @markgrover, the current approach used by Kafka preserves compatibility for users, but makes it a bit complicated for libraries/systems that want to support newer client features while still supporting older Kafka brokers.

@gwenshap
Copy link

gwenshap commented Mar 1, 2016

Agreeing with @mbonaci. Users who delay upgrading their Kafka brokers can continue using older versions of Spark.

@markgrover If you have code that depends on AdminUtils, perhaps you will want to delay the version upgrade until Grant Henke's patches adding Admin APIs are in, and then at least you'll have a path for going off the semi-private APIs :)

@koeninger
Copy link
Contributor

I really don't think it makes sense to discuss this PR outside of the context of SPARK-12177 and the approach taken for supporting the new consumer. Merging it as is would break things for end users without giving them any new features as a benefit, and nothing in Spark 2.0 that I'm aware of otherwise requires the kafka version bump.

If SPARK-12177 ends up being implemented as a separate artifact, this PR isn't even necessary. If it ends up being implemented in the same artifact and breaking 0.8 compatibility, this PR is a small drop in the bucket compared to the overall code change and could be handled at that time.

Regarding the semi-private apis, those are either in testing code, or in the receiver-based implementation which (so far) no one has shown a lot of interest in updating for the new consumer.

@mbonaci
Copy link
Contributor

mbonaci commented Mar 1, 2016

@ijuma my bad, I was looking for SimpleConsumer in the client api, totally forgetting that it's part of the core.

@markgrover markgrover closed this Mar 2, 2016
@markgrover
Copy link
Member Author

This discussion is very closely tied to SPARK-12177 and so I agree with @koeninger that it makes sense for this to be decided in relation to SPARK-12177. For those interested, please follow that JIRA and its related PR #10953

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
10 participants