-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-36837][BUILD] Upgrade Kafka to 3.1.0 #34089
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
Personally I'm in favor of holding on upgrade for major version till a couple of bugfix versions based on the major version are released. There is around 6 months for Spark 3.3.0 to be released, and we can let early-adopters to experiment with Kafka 3.0.0 (even 3.0.x) clients in the meanwhile. |
Yep, I'm also considering to wait for Kafka 3.0.1 due to KAFKA-13322, @HeartSaVioR . |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This PR is a draft because we are waiting for Apache Kafka 3.0.1 release. |
This PR is ready for review. Could you review this please, @viirya, @HeartSaVioR , @HyukjinKwon ? |
Also, cc @LuciferYang for Java 17. |
props.put("host.name", "127.0.0.1") | ||
props.put("advertised.host.name", "127.0.0.1") | ||
props.put("port", brokerPort.toString) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
KAFKA-12945, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, right!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks okay to me. Maybe @HeartSaVioR or @HyukjinKwon can take a look too.
Thank you, @viirya . Sure, I'll keep this PR for a while . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, cc @LuciferYang for Java 17.
Thanks for ping me, I manually mvn tested kafka-0-10-token-provider
, kafka-0-10
and kafka-0-10-sql
using Java 17, and all the tests passed.
LGTM + 1
Thank you, @LuciferYang . |
Merged to master for Apache Spark 3.3. |
As you suggested here, could you initiate a discussion email on dev mailing list including your suggestions (#34089 (comment)), @HeartSaVioR ? |
I don't think this is the right process. I shouldn't be someone who needs to prove the possible risks on other's proposal. If someone introduces the change (PR is basically a "proposal"), it's up to someone to explain the benefits and risks, and persuade about the change to the community despite the possible risks. I (as a committer) could ask, and reject the proposal if it doesn't make sense. What I have been asking out must be figured out by ourselves and addressed in prior to merge this. Does it make sense? That said, I can initiate the discussion thread, but if I do, it's on behalf of authors and approvers on this PR. Not a duty of myself. |
@ijuma |
Again, I have been saying we haven't constructed a good process on upgrading dependencies (especially "major version"). I am not inclined to blame someone, since the process on merging this PR has been done totally valid way in terms of BYLAWS. I'm blaming the process. |
I think this discussion has been a bit confusing because it started as a stability concern and then moved to compatibility. So, let's address them separately. On the stability front, 3.0.1 and 3.1.1 should both be stable and likely stabler than older releases. We haven't made any significant architectural changes in 3.x and a bunch of bugs have been fixed. On the compatibility front, there are two potential issues I can think of:
If the expectation for Spark users is that a minor release is a drop-in replacement and no action is expected from users, then I agree that the above poses a problem. The approach when it comes to these things vary from project to project. Since it takes a long time to go from one major to another major, the bar is usually a little lower. For example, Kafka dropped support for Scala 2.11 and added support for ZK 3.5.7 in Apache Kafka 2.5. We did a KIP for the former (Scala 2.11 upgrade), but did not do a KIP for the latter (ZK 3.5.x upgrade) as our analysis indicated that it was a compatible upgrade given how Kafka uses ZK. Even though it was a .7, this ZK version did have a bunch of critical bugs in it still, but they would probably not have been found and fixed if we had delayed adoption. |
Thank you for your patience, @ijuma . I'm also still confused here. To @HeartSaVioR , I'm not sure what you are talking by the following.
To be clear,
For the configuration changes, we can handle it as a documentation-level efforts. |
Thanks for the detailed inputs, @ijuma .
Seems like it is not a drop-in replacement which remains my concern on upgrade. We are releasing a new minor version, NOT a new major version, which end users easily expect that upgrading is "drop-in replacement". From what I understand is, Spark community is struggling very hard about NOT bringing the behavioral changes in minor version to respect the semantic versioning. That is why whenever we bring the change we make a safeguard against config and the default value is "keep the existing behavior" in many cases, unless we release a major version or the change is to solve the correctness issue. When we evaluate bumping a new major version of dependency, we must be fully aware of the breaking changes the bumping will bring to us. The breaking changes on config are one of things we totally missed. We didn't know about this till @ijuma explained to us. This is a huge hole in the process. We are saying we can fix that, but what if I wasn't here and we just let it go? After this PR we force our users to use Kafka client 3.1 (end users wouldn't know the change before they look into their new dependency tree), without knowing about the details what are the benefits they will gain and what are the possible risks they will also get. This is a non-trivial problem.
This is a good example, ZK 3.5.x upgrade didn't go through KIP (in other words, dependency upgrade tends to go through KIP), because the analysis indicated that it was a compatible upgrade given how Kafka uses ZK. Should we do the analysis and share the result before moving on, instead of simply checking with existing tests and say "OK we are good to go"? The testing in staging/production, should have been done before merging, because doing post-review is NOT a "comfortable" action, and PR author also is NOT that comfortable from post-review comments. "Let us do this first and leave the remaining on post-reviewing because we can always revert it", works technically, but everyone is not comfortable with this. |
Is it true? I don't think you can expect that with Apache Spark. You need to re-build your apps with Apache Spark 3.3 artifacts.
|
Yes, users may have to rebuild the apps, but they don't expect the "runtime" issues or "runtime" breaking changes. They simply upgrade the version, and try to build, and if the build succeeds, they will say "OK it's good". |
So, if you hit any issue Apache Kafka 3.1 on Apache Spark 3.3 (master) branch, could you elaborate specifically?
|
It is not going to be productive if we are going to defend the change already done. I'm not in favor of post-reviewing just because of this. The fact is, no one knew about the breaking changes. The analysis was done in very high level, but we soon figured out there are more, thanks for folk in the Kafka community. It is not important whether the breaking changes are minors or not. We had to try to find all things and evaluate the risks before moving on. We would be pretty much confident if we have a requirement on consulting with Kafka community on upgrading versions. Whether we enforce this to minor version upgrade or only major version upgrade is another story. This is great in terms of stability, but there is no silver bullet. This is a trade off between possible data loss vs performance. I see the conclusion in the analysis, "We don't understand the behavior of acks=all and acks=1 across different workloads and across the entire latency spectrum. We should leave the default as is.", and the default, has changed. |
It seems that you are using When Apache Spark changes the default configuration, we write a migration guide for it. We can add it to our migration guide. In addition, we can override from Spark side like we did for Hadoop with
Anything else you want to add? |
I'm sorry, but again, it is not on my business because I'm in favor of leaving the version as it is. Does Kafka have migration guide from Kafka 2 vs 3? If they provide it, could you please go through with thoughtful reviews? If they don't have, could we please construct the way how we can ensure we tell everything on breaking changes? |
Well, when I read them (3.0/3.1) before, it was about mostly Kafka server part. Please don't assume that I didn't read it.
Again, please don't underestimate or look down the other community member's efforts. We may make a mistake, but we work together to make a progress instead of sitting on the old versions forever. |
I don't have enough background on Kafka but here are the summary from what I understood by reading comments here:
For the former, this PR was reviewed by multiple committers when it was merged so I don't think there is a particular problem. Since there were a couple of post-reviews and concerns here (#34089 (comment) and #34089 (comment)), it might be great to have some feedback from Kafka maintainer(s) about the upgrade, stability and safety. |
Thank you, @HyukjinKwon . Yes, we asked here (#34089 (comment)) and got some answers (#34089 (comment) and #34089 (comment)) from Kafka maintainers. |
I'm expecting more documentations from Spark side and Kafka 3.1.1 from Kafka side (in early March). |
I think it wouldn't happen to pull Kafka community to ask for the details on migration if I didn't concern it, but it was addressed in any way, so fair enough. I figured out KIP-679 is zero mentioned in "Notable changes" in Kafka 3.0 which feels me less comfortable to simply rely on Notable changes, but it would be never figured out if we didn't hear this from Kafka community and I didn't look into the details, so I'll consider this as like "I wasn't aware of". It is OK for me to not mention this, as Kafka community didn't mention this as one of major changes. I'll consider myself as wrong for determining importance of this. As a general comment, since Kafka 3.0 brings breaking changes I'd document the version changes into SS migration guide (we didn't), linking Notable changes in Kafka doc, and explicitly mention "please contact Kafka community for details of changes". At least it helps us to route end users' concerns to the Kafka community, keep us be transparent with changes of Kafka and leave our stance as just a one of usages of Kafka client. I don't see anyone actively participating on both communities, so for me it seems to be the only valid strategy we can take as of now. Another general comment is, I'd make sure we guarantee downgrading to Kafka 2.8.1 (in runtime or even with different set of artifacts) and give end users freedom to choose Kafka 2 vs 3. We simply consider like there is no demand for end users to stick with Kafka 2.8.1. If we were trying to provide separate Kafka data source artifacts for Kafka 3, I wouldn't concern at all. It's up to all others to take my general comment or ignore it. I wouldn't mind at all. |
This oversight was fixed recently, the website will be updated soon. Like software, documentation can have bugs on occasion. Generally, it's fine to rely on the notable changes section. I am also happy to provide input directly (as I did here) whenever it's helpful for the Spark community. |
Thank you always, @ijuma . As a member of Apache Spark PMC, I've been grateful to you always for your helps in JIRA and PRs. |
@ijuma Thanks for the update. Much appreciated on your active feedbacks.
EDIT: I had to look "above" the line. My bad. |
@dongjoon-hyun FYI, @tombentley has volunteered to be the release manager for Apache Kafka 3.1.1 https://lists.apache.org/thread/zw0g8ksxhvwtv1jjcv0c33rxs0l8qs81 |
Thank you for informing that, @ijuma ! |
…nding configuration issue ### What changes were proposed in this pull request? This PR addresses a test flakiness issue in Kafka connector RDD suites #34089 (review) (Spark 3.4.0) upgraded Spark to Kafka 3.1.0, which requires a different configuration key for configuring the broker listening port. That PR updated the `KafkaTestUtils.scala` used in SQL tests, but there's a near-duplicate of that code in a different `KafkaTestUtils.scala` used by RDD API suites which wasn't updated. As a result, the RDD suites began using Kafka's default port 9092 and this results in flakiness as multiple concurrent suites hit port conflicts when trying to bind to that default port. This PR fixes that by simply copying the updated configuration from the SQL copy of `KafkaTestUtils.scala`. ### Why are the changes needed? Fix test flakiness due to port conflicts. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Ran 20 concurrent copies of `org.apache.spark.streaming.kafka010.JavaKafkaRDDSuite` in my CI environment and confirmed that this PR's changes resolve the test flakiness. Closes #41095 from JoshRosen/update-kafka-test-utils-to-fix-port-binding-flakiness. Authored-by: Josh Rosen <joshrosen@databricks.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
…nding configuration issue ### What changes were proposed in this pull request? This PR addresses a test flakiness issue in Kafka connector RDD suites #34089 (review) (Spark 3.4.0) upgraded Spark to Kafka 3.1.0, which requires a different configuration key for configuring the broker listening port. That PR updated the `KafkaTestUtils.scala` used in SQL tests, but there's a near-duplicate of that code in a different `KafkaTestUtils.scala` used by RDD API suites which wasn't updated. As a result, the RDD suites began using Kafka's default port 9092 and this results in flakiness as multiple concurrent suites hit port conflicts when trying to bind to that default port. This PR fixes that by simply copying the updated configuration from the SQL copy of `KafkaTestUtils.scala`. ### Why are the changes needed? Fix test flakiness due to port conflicts. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Ran 20 concurrent copies of `org.apache.spark.streaming.kafka010.JavaKafkaRDDSuite` in my CI environment and confirmed that this PR's changes resolve the test flakiness. Closes #41095 from JoshRosen/update-kafka-test-utils-to-fix-port-binding-flakiness. Authored-by: Josh Rosen <joshrosen@databricks.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 175fcfd) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
…nding configuration issue ### What changes were proposed in this pull request? This PR addresses a test flakiness issue in Kafka connector RDD suites apache#34089 (review) (Spark 3.4.0) upgraded Spark to Kafka 3.1.0, which requires a different configuration key for configuring the broker listening port. That PR updated the `KafkaTestUtils.scala` used in SQL tests, but there's a near-duplicate of that code in a different `KafkaTestUtils.scala` used by RDD API suites which wasn't updated. As a result, the RDD suites began using Kafka's default port 9092 and this results in flakiness as multiple concurrent suites hit port conflicts when trying to bind to that default port. This PR fixes that by simply copying the updated configuration from the SQL copy of `KafkaTestUtils.scala`. ### Why are the changes needed? Fix test flakiness due to port conflicts. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Ran 20 concurrent copies of `org.apache.spark.streaming.kafka010.JavaKafkaRDDSuite` in my CI environment and confirmed that this PR's changes resolve the test flakiness. Closes apache#41095 from JoshRosen/update-kafka-test-utils-to-fix-port-binding-flakiness. Authored-by: Josh Rosen <joshrosen@databricks.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
…nding configuration issue ### What changes were proposed in this pull request? This PR addresses a test flakiness issue in Kafka connector RDD suites apache#34089 (review) (Spark 3.4.0) upgraded Spark to Kafka 3.1.0, which requires a different configuration key for configuring the broker listening port. That PR updated the `KafkaTestUtils.scala` used in SQL tests, but there's a near-duplicate of that code in a different `KafkaTestUtils.scala` used by RDD API suites which wasn't updated. As a result, the RDD suites began using Kafka's default port 9092 and this results in flakiness as multiple concurrent suites hit port conflicts when trying to bind to that default port. This PR fixes that by simply copying the updated configuration from the SQL copy of `KafkaTestUtils.scala`. ### Why are the changes needed? Fix test flakiness due to port conflicts. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Ran 20 concurrent copies of `org.apache.spark.streaming.kafka010.JavaKafkaRDDSuite` in my CI environment and confirmed that this PR's changes resolve the test flakiness. Closes apache#41095 from JoshRosen/update-kafka-test-utils-to-fix-port-binding-flakiness. Authored-by: Josh Rosen <joshrosen@databricks.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 175fcfd) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 10fd9b4) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
…nding configuration issue ### What changes were proposed in this pull request? This PR addresses a test flakiness issue in Kafka connector RDD suites apache#34089 (review) (Spark 3.4.0) upgraded Spark to Kafka 3.1.0, which requires a different configuration key for configuring the broker listening port. That PR updated the `KafkaTestUtils.scala` used in SQL tests, but there's a near-duplicate of that code in a different `KafkaTestUtils.scala` used by RDD API suites which wasn't updated. As a result, the RDD suites began using Kafka's default port 9092 and this results in flakiness as multiple concurrent suites hit port conflicts when trying to bind to that default port. This PR fixes that by simply copying the updated configuration from the SQL copy of `KafkaTestUtils.scala`. ### Why are the changes needed? Fix test flakiness due to port conflicts. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Ran 20 concurrent copies of `org.apache.spark.streaming.kafka010.JavaKafkaRDDSuite` in my CI environment and confirmed that this PR's changes resolve the test flakiness. Closes apache#41095 from JoshRosen/update-kafka-test-utils-to-fix-port-binding-flakiness. Authored-by: Josh Rosen <joshrosen@databricks.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 175fcfd) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
…nding configuration issue ### What changes were proposed in this pull request? This PR addresses a test flakiness issue in Kafka connector RDD suites apache#34089 (review) (Spark 3.4.0) upgraded Spark to Kafka 3.1.0, which requires a different configuration key for configuring the broker listening port. That PR updated the `KafkaTestUtils.scala` used in SQL tests, but there's a near-duplicate of that code in a different `KafkaTestUtils.scala` used by RDD API suites which wasn't updated. As a result, the RDD suites began using Kafka's default port 9092 and this results in flakiness as multiple concurrent suites hit port conflicts when trying to bind to that default port. This PR fixes that by simply copying the updated configuration from the SQL copy of `KafkaTestUtils.scala`. ### Why are the changes needed? Fix test flakiness due to port conflicts. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Ran 20 concurrent copies of `org.apache.spark.streaming.kafka010.JavaKafkaRDDSuite` in my CI environment and confirmed that this PR's changes resolve the test flakiness. Closes apache#41095 from JoshRosen/update-kafka-test-utils-to-fix-port-binding-flakiness. Authored-by: Josh Rosen <joshrosen@databricks.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 175fcfd) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
What changes were proposed in this pull request?
This PR aims to upgrade Apache Kafka client library from 2.8.1 to 3.1.0 to support Java 17 officially.
Why are the changes needed?
Apache Kafka 3.1.0 has the following improvements and bug fixes including client side.
The following is the notable accumulated breaking changes at Apache Kafka 3.0+
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Pass the CIs.