Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider switching default Kafka version #1787

Closed
stanislavkozlovski opened this issue Aug 21, 2020 · 6 comments · Fixed by #1791
Closed

Consider switching default Kafka version #1787

stanislavkozlovski opened this issue Aug 21, 2020 · 6 comments · Fixed by #1791

Comments

@stanislavkozlovski
Copy link
Contributor

It seems like Sarama defaults to using Kafka version 0.8.2.

While I do realize that removing the dependence on Config.Version is in the roadmap for v2, I believe it makes sense to also bump up the default Kafka version used. 0.8.2 is a 5 year old version at this point.

The problem with defaulting to older versions when the Kafka cluster is a newer version is manifold:

  • requires Kafka to downconvert consumer response messages (e.g if a newly-configured client produced messages and a default Sarama client is reading) - this has implications on the performance of Kafka, because it uses additional CPU. Especially so if compression is enabled, at which point the broker would need to decompress, downconvert and compress the message back. Downconversion also blocks zerocopy, because your fetch response now needs to be copied into memory
  • requires Kafka to upconvert producer requests' messages
  • in general is hard to support. While the latest Kafka currently supports the oldest versions, it is in the best interest of the project to deprecate and eventually drop support for legacy versions. Otherwise it becomes hard to maintain, the test matrix grows and new features need to work around old version limitations (no idempotency, exactly once). It is easier for Kafka to deprecate/drop support when its ecosystem has done so already

Other Kafka clients have been dropping support for legacy versions, mainly due to the lack of some protocol features, like a CreateTopics request or request headers. For example, Kafka streams has required a Kafka 0.11 server as the minimum ever since version 2.2.1.

I notice that the README says:

Sarama provides a "2 releases + 2 months" compatibility guarantee: we support the two latest stable releases of Kafka and Go, and we provide a two month grace period for older releases.

So bumping to a newer version, with 0.11 at the minimum, seems like a reasonable approach to me.

@d1egoaz
Copy link
Contributor

d1egoaz commented Aug 25, 2020

This makes a lot of sense @stanislavkozlovski

For this same reason, we use internally a wrapper in top of Sarama that sets the client version to our current kafka cluster version.

0.11.0 looks like a good general version for Sarama, but sarama don't care about kafka streams, so even 0.10.x might be good enough.

@bai @dnwe @varun06 thoughts?

@ijuma
Copy link

ijuma commented Aug 25, 2020

0.11.0 is important for the reasons mentioned above. It enables message format 2 which avoids expensive upconversion/downconversion. Message format 2 is also required for correctness (see KIP-101).

@ijuma
Copy link

ijuma commented Aug 25, 2020

It's worth noting that 1.0 is nearly 3 years old at this point, so it could be a good default and one that's easy to explain.

@varun06
Copy link
Contributor

varun06 commented Aug 25, 2020

I agree, it's about time to bump up the default version. @stanislavkozlovski can you please do a PR for this, if not, i will get one out asap?

@twmb
Copy link

twmb commented Aug 25, 2020

IMO, Sarama should default to at least 0.11.0 and use ApiVersions to negotiate up. Users should put in a max version they want to allow (to avoid issues around rolling kafka upgrades).

1.0.0 may be a good default. Sarama currently requires max concurrent requests == 1 if idempotency is enabled, but this restriction is only true on 0.11.0. On 1.0.0+, 5 concurrent produce requests are allowed, which increases throughput if idempotency is enabled.

Actually, if adding in an option for the max version to support, then Sarama may as well always default to the latest version it supports. Users can override that downwards if they need to / to avoid any rolling upgrade issues.

@stanislavkozlovski
Copy link
Contributor Author

stanislavkozlovski commented Aug 26, 2020

Sarama may as well always default to the latest version it supports

@twmb
I like that idea. This change may have higher implications so it may be worth for the sarama team to consider what the correct version to introduce it would be (may want it in a minor release at least)

@stanislavkozlovski can you please do a PR for this, if not, i will get one out asap?

Will do! thanks

rkruze added a commit to rkruze/cockroach that referenced this issue May 11, 2021
Previously the Kafka sink didn't specify a version to sarama, so it
defaulted to the minimum version of 0.8.2. This behavior has been
changed in the latest version of sarama to use version 1.0 of the
Kafka API. Information on why this change was made by sarama can be
found here:
IBM/sarama#1787

Resolves cockroachdb#64842

Release note (enterprise change): Set the Kafka CDC sink to use the 1.0
version of the Kafka API.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants