Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-20603][SS][Test]Set default number of topic partitions to 1 to reduce the load #17863

Closed
wants to merge 1 commit into from

Conversation

zsxwing
Copy link
Member

@zsxwing zsxwing commented May 4, 2017

What changes were proposed in this pull request?

I checked the logs of https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-2.2-test-maven-hadoop-2.7/47/ and found it took several seconds to create Kafka internal topic __consumer_offsets. As Kafka creates this topic lazily, the topic creation happens in the first test deserialization of initial offset with Spark 2.1.0 and causes it timeout.

This PR changes offsets.topic.num.partitions from the default value 50 to 1 to make creating __consumer_offsets (50 partitions -> 1 partition) much faster.

How was this patch tested?

Jenkins

@SparkQA
Copy link

SparkQA commented May 4, 2017

Test build #76465 has finished for PR 17863 at commit 83dfc74.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zsxwing
Copy link
Member Author

zsxwing commented May 4, 2017

This does help. Now the time of this test is about 1 second. http://spark-tests.appspot.com/test-details?suite_name=org.apache.spark.sql.kafka010.KafkaSourceSuite&test_name=deserialization+of+initial+offset+with+Spark+2.1.0 says it was usually 3-4 seconds.

[info] KafkaSourceSuite:
[info] - deserialization of initial offset with Spark 2.1.0 (1 second, 169 milliseconds)

@zsxwing
Copy link
Member Author

zsxwing commented May 5, 2017

@brkyvz Could you take a look? Thanks!

@brkyvz
Copy link
Contributor

brkyvz commented May 5, 2017

LGTM

@zsxwing
Copy link
Member Author

zsxwing commented May 5, 2017

Thanks! Merging to master, 2.2 and 2.1.

asfgit pushed a commit that referenced this pull request May 5, 2017
…o reduce the load

## What changes were proposed in this pull request?

I checked the logs of https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-2.2-test-maven-hadoop-2.7/47/ and found it took several seconds to create Kafka internal topic `__consumer_offsets`. As Kafka creates this topic lazily, the topic creation happens in the first test `deserialization of initial offset with Spark 2.1.0` and causes it timeout.

This PR changes `offsets.topic.num.partitions` from the default value 50 to 1 to make creating `__consumer_offsets` (50 partitions -> 1 partition) much faster.

## How was this patch tested?

Jenkins

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #17863 from zsxwing/fix-kafka-flaky-test.

(cherry picked from commit bd57882)
Signed-off-by: Shixiong Zhu <shixiong@databricks.com>
@asfgit asfgit closed this in bd57882 May 5, 2017
asfgit pushed a commit that referenced this pull request May 5, 2017
…o reduce the load

## What changes were proposed in this pull request?

I checked the logs of https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-2.2-test-maven-hadoop-2.7/47/ and found it took several seconds to create Kafka internal topic `__consumer_offsets`. As Kafka creates this topic lazily, the topic creation happens in the first test `deserialization of initial offset with Spark 2.1.0` and causes it timeout.

This PR changes `offsets.topic.num.partitions` from the default value 50 to 1 to make creating `__consumer_offsets` (50 partitions -> 1 partition) much faster.

## How was this patch tested?

Jenkins

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #17863 from zsxwing/fix-kafka-flaky-test.

(cherry picked from commit bd57882)
Signed-off-by: Shixiong Zhu <shixiong@databricks.com>
@zsxwing zsxwing deleted the fix-kafka-flaky-test branch May 5, 2017 18:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants