[SPARK-3613] Record only average block size in MapStatus for large stages #2470

rxin · 2014-09-20T08:22:26Z

This changes the way we send MapStatus from executors back to driver for large stages (>2000 tasks). For large stages, we no longer send one byte per block. Instead, we just send the average block size.

This makes large jobs (tens of thousands of tasks) much more reliable since the driver no longer sends huge amount of data.

Ishiihara · 2014-09-20T19:07:45Z

@rxin my understanding is that MapStatus is used to check whether a map output file contain data for a certain reducer. Why do we use actual size instead of a boolean flag? Thanks!

rxin · 2014-09-20T19:10:14Z

It's more than that. We use estimated sizes to track the total size of outstanding fetches, and try to bound that to a certain size in case an executor sends too many requests and runs out of memory.

Ishiihara · 2014-09-20T19:34:35Z

Thanks for the reply. Another question, In hash shuffle write, the data may be screwed for different map output file. For some cases, the reducer may try to fetch many files which does not contain its data. What is the overhead does this introduces?

rxin · 2014-09-21T08:00:52Z

It really depends on the number of zero-sized blocks. One thing we can possibly do is to create a compressed bitmap to track zero sized blocks, as discussed here: http://apache-spark-developers-list.1001551.n3.nabble.com/Eliminate-copy-while-sending-data-any-Akka-experts-here-td7127.html#a7146

Maybe we can use the ewah by @lemire

rxin · 2014-09-21T08:05:14Z

@Ishiihara let me know if you are interested in working on adding compressed bitmap to this.

rxin · 2014-09-21T08:12:51Z

@lemire our requirements here are very simple. We just need to have a bitmap to track the position of zero-sized blocks in Spark shuffle. Things we need from the bitmap implementation are:

Fast serialization / deserialization (if there is an byte array that we can write out directly, perfect)
Fast sequential access (just give us the non-zero sized blocks one by one).

So unlike databases, we don't need updates or intersection. I saw that you have published a new archive paper on roaring bitmaps too. Which one would you recommend for this workload?

Ishiihara · 2014-09-21T08:20:50Z

@rxin I am definitely interested in working on adding compressed bitmap. What is the first step? Thanks.

lemire · 2014-09-21T19:33:56Z

@rxin We are currently working with the Druid.io guys to integrate Roaring (http://roaringbitmap.org). We get good results and even support memory mapped bitmaps (with ByteBuffer).

At this point, I would recommend you try out roaring. I am available to help.

Ishiihara · 2014-09-21T19:48:34Z

@rxin @lemire Starting looking at Roaring.

lemire · 2014-09-22T13:27:52Z

@Ishiihara Get in touch if you have questions.

andrewor14 · 2014-09-27T01:25:03Z

test this please

SparkQA · 2014-09-27T01:29:29Z

QA tests have started for PR 2470 at commit d139abe.

This patch merges cleanly.

SparkQA · 2014-09-27T02:23:53Z

QA tests have finished for PR 2470 at commit d139abe.

This patch fails unit tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-09-27T02:23:57Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20895/

rxin · 2014-09-27T03:05:44Z

Jenkins, test this please.

SparkQA · 2014-09-27T03:08:15Z

QA tests have started for PR 2470 at commit d139abe.

This patch merges cleanly.

SparkQA · 2014-09-27T03:09:36Z

QA tests have started for PR 2470 at commit 11f5319.

This patch merges cleanly.

SparkQA · 2014-09-27T04:17:38Z

QA tests have finished for PR 2470 at commit d139abe.

This patch fails unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2014-09-27T04:18:52Z

QA tests have finished for PR 2470 at commit 11f5319.

This patch fails unit tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-09-27T04:18:55Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20900/

rxin · 2014-09-27T04:19:30Z

hm mima failed even though MapStatus is private[spark]. I will add an exclude.

SparkQA · 2014-09-27T04:29:27Z

QA tests have started for PR 2470 at commit f7e720a.

This patch merges cleanly.

SparkQA · 2014-09-27T05:37:18Z

QA tests have finished for PR 2470 at commit f7e720a.

This patch passes unit tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-09-27T05:37:21Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20902/

andrewor14 · 2014-09-29T03:32:27Z

core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala

+ */
+private[spark] class DetailedMapStatus(
+    private[this] var loc: BlockManagerId,
+    private[this] var compressedSizes: Array[Byte])


Why private this? Is this for performance reasons?

yes. i really have no need for an accessor here

…ages.

SparkQA · 2014-09-29T19:39:38Z

QA tests have started for PR 2470 at commit 822ff54.

This patch merges cleanly.

andrewor14 · 2014-09-29T19:49:29Z

Great, LPGTM

rxin · 2014-09-29T19:51:58Z

I'm glad it is P!

SparkQA · 2014-09-29T20:31:28Z

QA tests have finished for PR 2470 at commit 822ff54.

This patch fails unit tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-09-29T20:31:32Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20991/

SparkQA · 2014-09-30T04:47:30Z

QA tests have started for PR 2470 at commit 822ff54.

This patch merges cleanly.

SparkQA · 2014-09-30T05:52:02Z

QA tests have finished for PR 2470 at commit 822ff54.

This patch passes unit tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2014-09-30T05:56:03Z

Merging in master.

rxin · 2014-09-30T05:59:10Z

I also filed a new jira for the compressed bitmap thing: https://issues.apache.org/jira/browse/SPARK-3740

Ishiihara · 2014-09-30T06:53:41Z

@rxin I looked through Roaring bitmap and that is a highly compressed bitmap compared with other bitmap implementations. I will start working on this and keep you updated with progress and issues coming up during implementation. Thanks!

rxin force-pushed the mapstatus branch from d139abe to 11f5319 Compare September 27, 2014 03:05

andrewor14 reviewed Sep 29, 2014
View reviewed changes

rxin added 2 commits September 29, 2014 12:34

[SPARK-3613] Record only average block size in MapStatus for large st…

6a0401c

…ages.

Fixed a bug in MapStatus

f89d182

rxin added 2 commits September 29, 2014 12:35

Added MimaExclude.

3b86f56

Code review feedback.

822ff54

rxin force-pushed the mapstatus branch from fa9aaa8 to 822ff54 Compare September 29, 2014 19:35

asfgit closed this in 6b79bfb Sep 30, 2014

rxin deleted the mapstatus branch September 30, 2014 06:54

[SPARK-3613] Record only average block size in MapStatus for large stages #2470

[SPARK-3613] Record only average block size in MapStatus for large stages #2470

Conversation

rxin commented Sep 20, 2014

Ishiihara commented Sep 20, 2014

rxin commented Sep 20, 2014

Ishiihara commented Sep 20, 2014

rxin commented Sep 21, 2014

rxin commented Sep 21, 2014

rxin commented Sep 21, 2014

Ishiihara commented Sep 21, 2014

lemire commented Sep 21, 2014

Ishiihara commented Sep 21, 2014

lemire commented Sep 22, 2014

andrewor14 commented Sep 27, 2014

SparkQA commented Sep 27, 2014

SparkQA commented Sep 27, 2014

AmplabJenkins commented Sep 27, 2014

rxin commented Sep 27, 2014

SparkQA commented Sep 27, 2014

SparkQA commented Sep 27, 2014

SparkQA commented Sep 27, 2014

SparkQA commented Sep 27, 2014

AmplabJenkins commented Sep 27, 2014

rxin commented Sep 27, 2014

SparkQA commented Sep 27, 2014

SparkQA commented Sep 27, 2014

AmplabJenkins commented Sep 27, 2014

andrewor14 Sep 29, 2014

Choose a reason for hiding this comment

rxin Sep 29, 2014

Choose a reason for hiding this comment

SparkQA commented Sep 29, 2014

andrewor14 commented Sep 29, 2014

rxin commented Sep 29, 2014

SparkQA commented Sep 29, 2014

AmplabJenkins commented Sep 29, 2014

SparkQA commented Sep 30, 2014

SparkQA commented Sep 30, 2014

rxin commented Sep 30, 2014

rxin commented Sep 30, 2014

Ishiihara commented Sep 30, 2014