[SPARK-20603][SS][Test]Set default number of topic partitions to 1 to reduce the load #17863

zsxwing · 2017-05-04T21:21:24Z

What changes were proposed in this pull request?

I checked the logs of https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-2.2-test-maven-hadoop-2.7/47/ and found it took several seconds to create Kafka internal topic __consumer_offsets. As Kafka creates this topic lazily, the topic creation happens in the first test deserialization of initial offset with Spark 2.1.0 and causes it timeout.

This PR changes offsets.topic.num.partitions from the default value 50 to 1 to make creating __consumer_offsets (50 partitions -> 1 partition) much faster.

How was this patch tested?

Jenkins

SparkQA · 2017-05-04T21:42:08Z

Test build #76465 has finished for PR 17863 at commit 83dfc74.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

zsxwing · 2017-05-04T21:42:34Z

This does help. Now the time of this test is about 1 second. http://spark-tests.appspot.com/test-details?suite_name=org.apache.spark.sql.kafka010.KafkaSourceSuite&test_name=deserialization+of+initial+offset+with+Spark+2.1.0 says it was usually 3-4 seconds.

[info] KafkaSourceSuite:
[info] - deserialization of initial offset with Spark 2.1.0 (1 second, 169 milliseconds)

zsxwing · 2017-05-05T17:56:34Z

@brkyvz Could you take a look? Thanks!

brkyvz · 2017-05-05T17:58:44Z

LGTM

zsxwing · 2017-05-05T18:08:03Z

Thanks! Merging to master, 2.2 and 2.1.

…o reduce the load ## What changes were proposed in this pull request? I checked the logs of https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-2.2-test-maven-hadoop-2.7/47/ and found it took several seconds to create Kafka internal topic `__consumer_offsets`. As Kafka creates this topic lazily, the topic creation happens in the first test `deserialization of initial offset with Spark 2.1.0` and causes it timeout. This PR changes `offsets.topic.num.partitions` from the default value 50 to 1 to make creating `__consumer_offsets` (50 partitions -> 1 partition) much faster. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixiong@databricks.com> Closes #17863 from zsxwing/fix-kafka-flaky-test. (cherry picked from commit bd57882) Signed-off-by: Shixiong Zhu <shixiong@databricks.com>

Set default number of topic partitions to 1 to reduce the load

83dfc74

asfgit closed this in bd57882 May 5, 2017

zsxwing deleted the fix-kafka-flaky-test branch May 5, 2017 18:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-20603][SS][Test]Set default number of topic partitions to 1 to reduce the load #17863

[SPARK-20603][SS][Test]Set default number of topic partitions to 1 to reduce the load #17863

zsxwing commented May 4, 2017 •

edited

Loading

SparkQA commented May 4, 2017

zsxwing commented May 4, 2017 •

edited

Loading

zsxwing commented May 5, 2017

brkyvz commented May 5, 2017

zsxwing commented May 5, 2017

[SPARK-20603][SS][Test]Set default number of topic partitions to 1 to reduce the load #17863

[SPARK-20603][SS][Test]Set default number of topic partitions to 1 to reduce the load #17863

Conversation

zsxwing commented May 4, 2017 • edited Loading

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented May 4, 2017

zsxwing commented May 4, 2017 • edited Loading

zsxwing commented May 5, 2017

brkyvz commented May 5, 2017

zsxwing commented May 5, 2017

zsxwing commented May 4, 2017 •

edited

Loading

zsxwing commented May 4, 2017 •

edited

Loading