Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-36837][BUILD] Upgrade Kafka to 3.1.0 #34089

Closed
wants to merge 1 commit into from
Closed

[SPARK-36837][BUILD] Upgrade Kafka to 3.1.0 #34089

wants to merge 1 commit into from

Conversation

dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Sep 24, 2021

What changes were proposed in this pull request?

This PR aims to upgrade Apache Kafka client library from 2.8.1 to 3.1.0 to support Java 17 officially.

Why are the changes needed?

Apache Kafka 3.1.0 has the following improvements and bug fixes including client side.

The following is the notable accumulated breaking changes at Apache Kafka 3.0+

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass the CIs.

@SparkQA

This comment was marked as outdated.

@SparkQA

This comment was marked as outdated.

@SparkQA

This comment was marked as outdated.

@SparkQA

This comment was marked as outdated.

@SparkQA

This comment was marked as outdated.

@SparkQA

This comment was marked as outdated.

@HeartSaVioR
Copy link
Contributor

Personally I'm in favor of holding on upgrade for major version till a couple of bugfix versions based on the major version are released. There is around 6 months for Spark 3.3.0 to be released, and we can let early-adopters to experiment with Kafka 3.0.0 (even 3.0.x) clients in the meanwhile.

@dongjoon-hyun
Copy link
Member Author

Yep, I'm also considering to wait for Kafka 3.0.1 due to KAFKA-13322, @HeartSaVioR .
(Thanks, @dongjinleekr )

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-36837][BUILD] Upgrade Kafka to 3.0.0 [SPARK-36837][BUILD] Upgrade Kafka to 3.0.1 Sep 25, 2021
@SparkQA

This comment was marked as outdated.

@SparkQA

This comment was marked as outdated.

@dongjoon-hyun
Copy link
Member Author

This PR is a draft because we are waiting for Apache Kafka 3.0.1 release.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-36837][BUILD] Upgrade Kafka to 3.0.1 [SPARK-36837][BUILD] Upgrade Kafka to 3.1.0 Jan 25, 2022
@dongjoon-hyun dongjoon-hyun marked this pull request as ready for review January 27, 2022 00:59
@dongjoon-hyun dongjoon-hyun marked this pull request as draft January 27, 2022 05:08
@dongjoon-hyun dongjoon-hyun marked this pull request as ready for review February 4, 2022 03:52
@dongjoon-hyun
Copy link
Member Author

This PR is ready for review. Could you review this please, @viirya, @HeartSaVioR , @HyukjinKwon ?

@dongjoon-hyun
Copy link
Member Author

Also, cc @LuciferYang for Java 17.

Comment on lines -491 to -493
props.put("host.name", "127.0.0.1")
props.put("advertised.host.name", "127.0.0.1")
props.put("port", brokerPort.toString)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KAFKA-12945, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, right!

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks okay to me. Maybe @HeartSaVioR or @HyukjinKwon can take a look too.

@dongjoon-hyun
Copy link
Member Author

Thank you, @viirya . Sure, I'll keep this PR for a while .

Copy link
Contributor

@LuciferYang LuciferYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, cc @LuciferYang for Java 17.

Thanks for ping me, I manually mvn tested kafka-0-10-token-provider, kafka-0-10 and kafka-0-10-sql using Java 17, and all the tests passed.

LGTM + 1

@dongjoon-hyun
Copy link
Member Author

Thank you, @LuciferYang .

@dongjoon-hyun
Copy link
Member Author

Merged to master for Apache Spark 3.3.

@dongjoon-hyun
Copy link
Member Author

As you suggested here, could you initiate a discussion email on dev mailing list including your suggestions (#34089 (comment)), @HeartSaVioR ?

@HeartSaVioR
Copy link
Contributor

HeartSaVioR commented Feb 10, 2022

I don't think this is the right process. I shouldn't be someone who needs to prove the possible risks on other's proposal. If someone introduces the change (PR is basically a "proposal"), it's up to someone to explain the benefits and risks, and persuade about the change to the community despite the possible risks. I (as a committer) could ask, and reject the proposal if it doesn't make sense.

What I have been asking out must be figured out by ourselves and addressed in prior to merge this. Does it make sense?

That said, I can initiate the discussion thread, but if I do, it's on behalf of authors and approvers on this PR. Not a duty of myself.

@HeartSaVioR
Copy link
Contributor

HeartSaVioR commented Feb 10, 2022

@ijuma
Easier way for you to understand this situation would be, Zookeeper releases their 4.0 version weeks ago (let's pretend ZK 4.0 has compatibility on 3.x in terms of client communication), and Kafka is going to adopt in Kafka 3.2. What would be the process on Kafka community to do this change? Would it require KIP, or discussion thread, or require nothing but PR?

@HeartSaVioR
Copy link
Contributor

HeartSaVioR commented Feb 10, 2022

Again, I have been saying we haven't constructed a good process on upgrading dependencies (especially "major version"). I am not inclined to blame someone, since the process on merging this PR has been done totally valid way in terms of BYLAWS. I'm blaming the process.

@ijuma
Copy link
Contributor

ijuma commented Feb 10, 2022

I think this discussion has been a bit confusing because it started as a stability concern and then moved to compatibility. So, let's address them separately.

On the stability front, 3.0.1 and 3.1.1 should both be stable and likely stabler than older releases. We haven't made any significant architectural changes in 3.x and a bunch of bugs have been fixed.

On the compatibility front, there are two potential issues I can think of:

  1. Default configuration changes. This KIP has the details on the potential compatibility issues. Some of them are edge cases, but the issue with old kafka clusters may not be so uncommon. There is an easy way to avoid this issue, but it does require the customer to set a config.

  2. Binary compatibility issues if an application uses both Spark and Kafka clients directly and they use one of the removed methods. This is somewhat unlikely, but including for completeness.

If the expectation for Spark users is that a minor release is a drop-in replacement and no action is expected from users, then I agree that the above poses a problem. The approach when it comes to these things vary from project to project. Since it takes a long time to go from one major to another major, the bar is usually a little lower.

For example, Kafka dropped support for Scala 2.11 and added support for ZK 3.5.7 in Apache Kafka 2.5. We did a KIP for the former (Scala 2.11 upgrade), but did not do a KIP for the latter (ZK 3.5.x upgrade) as our analysis indicated that it was a compatible upgrade given how Kafka uses ZK. Even though it was a .7, this ZK version did have a bunch of critical bugs in it still, but they would probably not have been found and fixed if we had delayed adoption.

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Feb 10, 2022

Thank you for your patience, @ijuma . I'm also still confused here.

To @HeartSaVioR , I'm not sure what you are talking by the following.

Again, I have been saying we haven't constructed a good process on upgrading dependencies (especially "major version").

To be clear,

  • I've been working on this since September 2021 and have been trying to respect all the community opinions including yours so far.
  • During that period, I also tested this PR with Apache Kafka 2.8.1 of course and there was no user-facing API breaking change in my perspective. So, I merged this after getting the official approval.
  • Also, as I mentioned here, there is no harm for us to test Apache Kafka 3.1 and this is not a one-way option because we still can revert in branch-3.3. I'd love to receive more constructive feedbacks like specific bug reports via JIRA, so that we can fix them together and move forward.

For the configuration changes, we can handle it as a documentation-level efforts.

@HeartSaVioR
Copy link
Contributor

HeartSaVioR commented Feb 11, 2022

Thanks for the detailed inputs, @ijuma .

If the expectation for Spark users is that a minor release is a drop-in replacement and no action is expected from users, then I agree that the above poses a problem.

Seems like it is not a drop-in replacement which remains my concern on upgrade.

@dongjoon-hyun

We are releasing a new minor version, NOT a new major version, which end users easily expect that upgrading is "drop-in replacement". From what I understand is, Spark community is struggling very hard about NOT bringing the behavioral changes in minor version to respect the semantic versioning. That is why whenever we bring the change we make a safeguard against config and the default value is "keep the existing behavior" in many cases, unless we release a major version or the change is to solve the correctness issue.

When we evaluate bumping a new major version of dependency, we must be fully aware of the breaking changes the bumping will bring to us. The breaking changes on config are one of things we totally missed. We didn't know about this till @ijuma explained to us. This is a huge hole in the process. We are saying we can fix that, but what if I wasn't here and we just let it go?

After this PR we force our users to use Kafka client 3.1 (end users wouldn't know the change before they look into their new dependency tree), without knowing about the details what are the benefits they will gain and what are the possible risks they will also get. This is a non-trivial problem.

For example, Kafka dropped support for Scala 2.11 and added support for ZK 3.5.7 in Apache Kafka 2.5. We did a KIP for the former (Scala 2.11 upgrade), but did not do a KIP for the latter (ZK 3.5.x upgrade) as our analysis indicated that it was a compatible upgrade given how Kafka uses ZK. Even though it was a .7, this ZK version did have a bunch of critical bugs in it still, but they would probably not have been found and fixed if we had delayed adoption.

This is a good example, ZK 3.5.x upgrade didn't go through KIP (in other words, dependency upgrade tends to go through KIP), because the analysis indicated that it was a compatible upgrade given how Kafka uses ZK. Should we do the analysis and share the result before moving on, instead of simply checking with existing tests and say "OK we are good to go"?

The testing in staging/production, should have been done before merging, because doing post-review is NOT a "comfortable" action, and PR author also is NOT that comfortable from post-review comments. "Let us do this first and leave the remaining on post-reviewing because we can always revert it", works technically, but everyone is not comfortable with this.

@dongjoon-hyun
Copy link
Member Author

Is it true? I don't think you can expect that with Apache Spark. You need to re-build your apps with Apache Spark 3.3 artifacts.

We are releasing a new minor version, NOT a new major version, which end users easily expect that upgrading is "drop-in replacement".

@HeartSaVioR
Copy link
Contributor

Yes, users may have to rebuild the apps, but they don't expect the "runtime" issues or "runtime" breaking changes. They simply upgrade the version, and try to build, and if the build succeeds, they will say "OK it's good".

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Feb 11, 2022

So, if you hit any issue Apache Kafka 3.1 on Apache Spark 3.3 (master) branch, could you elaborate specifically?

They don't expect the "runtime" issues or "runtime" breaking changes.

@HeartSaVioR
Copy link
Contributor

HeartSaVioR commented Feb 11, 2022

It is not going to be productive if we are going to defend the change already done. I'm not in favor of post-reviewing just because of this.

The fact is, no one knew about the breaking changes. The analysis was done in very high level, but we soon figured out there are more, thanks for folk in the Kafka community. It is not important whether the breaking changes are minors or not. We had to try to find all things and evaluate the risks before moving on. We would be pretty much confident if we have a requirement on consulting with Kafka community on upgrading versions. Whether we enforce this to minor version upgrade or only major version upgrade is another story.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-679%3A+Producer+will+enable+the+strongest+delivery+guarantee+by+default

This is great in terms of stability, but there is no silver bullet. This is a trade off between possible data loss vs performance.

https://cwiki.apache.org/confluence/display/KAFKA/An+analysis+of+the+impact+of+max.in.flight.requests.per.connection+and+acks+on+Producer+performance

I see the conclusion in the analysis, "We don't understand the behavior of acks=all and acks=1 across different workloads and across the entire latency spectrum. We should leave the default as is.", and the default, has changed.

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Feb 11, 2022

It seems that you are using the breaking change in a broader way.

When Apache Spark changes the default configuration, we write a migration guide for it. We can add it to our migration guide. In addition, we can override from Spark side like we did for Hadoop with spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version.

I see the conclusion in the analysis, "We don't understand the behavior of acks=all and acks=1 across different workloads and across the entire latency spectrum. We should leave the default as is.", and the default, has changed.

Anything else you want to add?

@HeartSaVioR
Copy link
Contributor

HeartSaVioR commented Feb 11, 2022

I'm sorry, but again, it is not on my business because I'm in favor of leaving the version as it is.

Does Kafka have migration guide from Kafka 2 vs 3? If they provide it, could you please go through with thoughtful reviews? If they don't have, could we please construct the way how we can ensure we tell everything on breaking changes?

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Feb 11, 2022

Well, when I read them (3.0/3.1) before, it was about mostly Kafka server part. Please don't assume that I didn't read it.

I'm sorry, but again, it is not on my business because I'm in favor of leaving the version as it is.

Does Kafka have migration guide from Kafka 2 vs 3? If they provide it, could you please go through with thoughtful reviews? If they don't have, could we please construct the way how we can ensure we tell everything on breaking changes?

  1. If you didn't read it, you had better read it instead of ignoring it.
  2. If you are saying that I missed something mistakenly, I can catup if you can give me some evidence.

Again, please don't underestimate or look down the other community member's efforts. We may make a mistake, but we work together to make a progress instead of sitting on the old versions forever.

@HyukjinKwon
Copy link
Member

I don't have enough background on Kafka but here are the summary from what I understood by reading comments here:

  • This PR was reviewed by multiple committers.
  • Kafka is one of the main dependency in Structured Streaming so it might need some more extra care and reviews. For example, issues such as KAFKA-13322 might cause a non-trivial impact.

For the former, this PR was reviewed by multiple committers when it was merged so I don't think there is a particular problem.
For the latter, I agree that we might need some more extra care for major dependency changes such as Parquet. In my experience at the Spark community, we put extra care of the major version upgrade when a library has many issues after last few upgrades. Might be good to document and clarify this somewhere.

Since there were a couple of post-reviews and concerns here (#34089 (comment) and #34089 (comment)), it might be great to have some feedback from Kafka maintainer(s) about the upgrade, stability and safety.

@dongjoon-hyun
Copy link
Member Author

Thank you, @HyukjinKwon . Yes, we asked here (#34089 (comment)) and got some answers (#34089 (comment) and #34089 (comment)) from Kafka maintainers.

@dongjoon-hyun
Copy link
Member Author

I'm expecting more documentations from Spark side and Kafka 3.1.1 from Kafka side (in early March).

@HeartSaVioR
Copy link
Contributor

HeartSaVioR commented Feb 14, 2022

I think it wouldn't happen to pull Kafka community to ask for the details on migration if I didn't concern it, but it was addressed in any way, so fair enough.

I figured out KIP-679 is zero mentioned in "Notable changes" in Kafka 3.0 which feels me less comfortable to simply rely on Notable changes, but it would be never figured out if we didn't hear this from Kafka community and I didn't look into the details, so I'll consider this as like "I wasn't aware of". It is OK for me to not mention this, as Kafka community didn't mention this as one of major changes. I'll consider myself as wrong for determining importance of this.

As a general comment, since Kafka 3.0 brings breaking changes I'd document the version changes into SS migration guide (we didn't), linking Notable changes in Kafka doc, and explicitly mention "please contact Kafka community for details of changes". At least it helps us to route end users' concerns to the Kafka community, keep us be transparent with changes of Kafka and leave our stance as just a one of usages of Kafka client. I don't see anyone actively participating on both communities, so for me it seems to be the only valid strategy we can take as of now.

Another general comment is, I'd make sure we guarantee downgrading to Kafka 2.8.1 (in runtime or even with different set of artifacts) and give end users freedom to choose Kafka 2 vs 3. We simply consider like there is no demand for end users to stick with Kafka 2.8.1. If we were trying to provide separate Kafka data source artifacts for Kafka 3, I wouldn't concern at all.

It's up to all others to take my general comment or ignore it. I wouldn't mind at all.

@ijuma
Copy link
Contributor

ijuma commented Feb 14, 2022

I figured out KIP-679 is zero mentioned in "Notable changes" in Kafka 3.0 which feels me less comfortable to simply rely on Notable changes

This oversight was fixed recently, the website will be updated soon. Like software, documentation can have bugs on occasion. Generally, it's fine to rely on the notable changes section. I am also happy to provide input directly (as I did here) whenever it's helpful for the Spark community.

@dongjoon-hyun
Copy link
Member Author

Thank you always, @ijuma . As a member of Apache Spark PMC, I've been grateful to you always for your helps in JIRA and PRs.

@HeartSaVioR
Copy link
Contributor

HeartSaVioR commented Feb 14, 2022

@ijuma Thanks for the update. Much appreciated on your active feedbacks.

Btw, I see the mentioning is about a bug on new default value of idempotent not properly accounted. It doesn't mention about performance implication of changing default value of ack/idempotent. (That is still not mentioned.)

If that is the Kafka community's call to consider it as minor/trivial then I'll rely on that (since we will transparently route the issue to Kafka community), but I'd wonder the reason the action is taken opposite way with the conclusion of analysis. If that is just a missing spot, it would be really helpful if Kafka community addresses it.

EDIT: I had to look "above" the line. My bad.

@ijuma
Copy link
Contributor

ijuma commented Mar 12, 2022

@dongjoon-hyun FYI, @tombentley has volunteered to be the release manager for Apache Kafka 3.1.1 https://lists.apache.org/thread/zw0g8ksxhvwtv1jjcv0c33rxs0l8qs81

@dongjoon-hyun
Copy link
Member Author

Thank you for informing that, @ijuma !

dongjoon-hyun pushed a commit that referenced this pull request May 8, 2023
…nding configuration issue

### What changes were proposed in this pull request?

This PR addresses a test flakiness issue in Kafka connector RDD suites

#34089 (review) (Spark 3.4.0) upgraded Spark to Kafka 3.1.0, which requires a different configuration key for configuring the broker listening port. That PR updated the `KafkaTestUtils.scala` used in SQL tests, but there's a near-duplicate of that code in a different `KafkaTestUtils.scala` used by RDD API suites which wasn't updated. As a result, the RDD suites began using Kafka's default port 9092 and this results in flakiness as multiple concurrent suites hit port conflicts when trying to bind to that default port.

This PR fixes that by simply copying the updated configuration from the SQL copy of `KafkaTestUtils.scala`.

### Why are the changes needed?

Fix test flakiness due to port conflicts.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Ran 20 concurrent copies of `org.apache.spark.streaming.kafka010.JavaKafkaRDDSuite` in my CI environment and confirmed that this PR's changes resolve the test flakiness.

Closes #41095 from JoshRosen/update-kafka-test-utils-to-fix-port-binding-flakiness.

Authored-by: Josh Rosen <joshrosen@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
dongjoon-hyun pushed a commit that referenced this pull request May 8, 2023
…nding configuration issue

### What changes were proposed in this pull request?

This PR addresses a test flakiness issue in Kafka connector RDD suites

#34089 (review) (Spark 3.4.0) upgraded Spark to Kafka 3.1.0, which requires a different configuration key for configuring the broker listening port. That PR updated the `KafkaTestUtils.scala` used in SQL tests, but there's a near-duplicate of that code in a different `KafkaTestUtils.scala` used by RDD API suites which wasn't updated. As a result, the RDD suites began using Kafka's default port 9092 and this results in flakiness as multiple concurrent suites hit port conflicts when trying to bind to that default port.

This PR fixes that by simply copying the updated configuration from the SQL copy of `KafkaTestUtils.scala`.

### Why are the changes needed?

Fix test flakiness due to port conflicts.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Ran 20 concurrent copies of `org.apache.spark.streaming.kafka010.JavaKafkaRDDSuite` in my CI environment and confirmed that this PR's changes resolve the test flakiness.

Closes #41095 from JoshRosen/update-kafka-test-utils-to-fix-port-binding-flakiness.

Authored-by: Josh Rosen <joshrosen@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 175fcfd)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
LuciferYang pushed a commit to LuciferYang/spark that referenced this pull request May 10, 2023
…nding configuration issue

### What changes were proposed in this pull request?

This PR addresses a test flakiness issue in Kafka connector RDD suites

apache#34089 (review) (Spark 3.4.0) upgraded Spark to Kafka 3.1.0, which requires a different configuration key for configuring the broker listening port. That PR updated the `KafkaTestUtils.scala` used in SQL tests, but there's a near-duplicate of that code in a different `KafkaTestUtils.scala` used by RDD API suites which wasn't updated. As a result, the RDD suites began using Kafka's default port 9092 and this results in flakiness as multiple concurrent suites hit port conflicts when trying to bind to that default port.

This PR fixes that by simply copying the updated configuration from the SQL copy of `KafkaTestUtils.scala`.

### Why are the changes needed?

Fix test flakiness due to port conflicts.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Ran 20 concurrent copies of `org.apache.spark.streaming.kafka010.JavaKafkaRDDSuite` in my CI environment and confirmed that this PR's changes resolve the test flakiness.

Closes apache#41095 from JoshRosen/update-kafka-test-utils-to-fix-port-binding-flakiness.

Authored-by: Josh Rosen <joshrosen@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
snmvaughan pushed a commit to snmvaughan/spark that referenced this pull request Jun 20, 2023
…nding configuration issue

### What changes were proposed in this pull request?

This PR addresses a test flakiness issue in Kafka connector RDD suites

apache#34089 (review) (Spark 3.4.0) upgraded Spark to Kafka 3.1.0, which requires a different configuration key for configuring the broker listening port. That PR updated the `KafkaTestUtils.scala` used in SQL tests, but there's a near-duplicate of that code in a different `KafkaTestUtils.scala` used by RDD API suites which wasn't updated. As a result, the RDD suites began using Kafka's default port 9092 and this results in flakiness as multiple concurrent suites hit port conflicts when trying to bind to that default port.

This PR fixes that by simply copying the updated configuration from the SQL copy of `KafkaTestUtils.scala`.

### Why are the changes needed?

Fix test flakiness due to port conflicts.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Ran 20 concurrent copies of `org.apache.spark.streaming.kafka010.JavaKafkaRDDSuite` in my CI environment and confirmed that this PR's changes resolve the test flakiness.

Closes apache#41095 from JoshRosen/update-kafka-test-utils-to-fix-port-binding-flakiness.

Authored-by: Josh Rosen <joshrosen@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 175fcfd)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 10fd9b4)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
GladwinLee pushed a commit to lyft/spark that referenced this pull request Oct 10, 2023
…nding configuration issue

### What changes were proposed in this pull request?

This PR addresses a test flakiness issue in Kafka connector RDD suites

apache#34089 (review) (Spark 3.4.0) upgraded Spark to Kafka 3.1.0, which requires a different configuration key for configuring the broker listening port. That PR updated the `KafkaTestUtils.scala` used in SQL tests, but there's a near-duplicate of that code in a different `KafkaTestUtils.scala` used by RDD API suites which wasn't updated. As a result, the RDD suites began using Kafka's default port 9092 and this results in flakiness as multiple concurrent suites hit port conflicts when trying to bind to that default port.

This PR fixes that by simply copying the updated configuration from the SQL copy of `KafkaTestUtils.scala`.

### Why are the changes needed?

Fix test flakiness due to port conflicts.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Ran 20 concurrent copies of `org.apache.spark.streaming.kafka010.JavaKafkaRDDSuite` in my CI environment and confirmed that this PR's changes resolve the test flakiness.

Closes apache#41095 from JoshRosen/update-kafka-test-utils-to-fix-port-binding-flakiness.

Authored-by: Josh Rosen <joshrosen@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 175fcfd)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
catalinii pushed a commit to lyft/spark that referenced this pull request Oct 10, 2023
…nding configuration issue

### What changes were proposed in this pull request?

This PR addresses a test flakiness issue in Kafka connector RDD suites

apache#34089 (review) (Spark 3.4.0) upgraded Spark to Kafka 3.1.0, which requires a different configuration key for configuring the broker listening port. That PR updated the `KafkaTestUtils.scala` used in SQL tests, but there's a near-duplicate of that code in a different `KafkaTestUtils.scala` used by RDD API suites which wasn't updated. As a result, the RDD suites began using Kafka's default port 9092 and this results in flakiness as multiple concurrent suites hit port conflicts when trying to bind to that default port.

This PR fixes that by simply copying the updated configuration from the SQL copy of `KafkaTestUtils.scala`.

### Why are the changes needed?

Fix test flakiness due to port conflicts.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Ran 20 concurrent copies of `org.apache.spark.streaming.kafka010.JavaKafkaRDDSuite` in my CI environment and confirmed that this PR's changes resolve the test flakiness.

Closes apache#41095 from JoshRosen/update-kafka-test-utils-to-fix-port-binding-flakiness.

Authored-by: Josh Rosen <joshrosen@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 175fcfd)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants