Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-3613] Record only average block size in MapStatus for large stages #2470

Closed
wants to merge 4 commits into from

Conversation

rxin
Copy link
Contributor

@rxin rxin commented Sep 20, 2014

This changes the way we send MapStatus from executors back to driver for large stages (>2000 tasks). For large stages, we no longer send one byte per block. Instead, we just send the average block size.

This makes large jobs (tens of thousands of tasks) much more reliable since the driver no longer sends huge amount of data.

@Ishiihara
Copy link
Contributor

@rxin my understanding is that MapStatus is used to check whether a map output file contain data for a certain reducer. Why do we use actual size instead of a boolean flag? Thanks!

@rxin
Copy link
Contributor Author

rxin commented Sep 20, 2014

It's more than that. We use estimated sizes to track the total size of outstanding fetches, and try to bound that to a certain size in case an executor sends too many requests and runs out of memory.

@Ishiihara
Copy link
Contributor

Thanks for the reply. Another question, In hash shuffle write, the data may be screwed for different map output file. For some cases, the reducer may try to fetch many files which does not contain its data. What is the overhead does this introduces?

@rxin
Copy link
Contributor Author

rxin commented Sep 21, 2014

It really depends on the number of zero-sized blocks. One thing we can possibly do is to create a compressed bitmap to track zero sized blocks, as discussed here: http://apache-spark-developers-list.1001551.n3.nabble.com/Eliminate-copy-while-sending-data-any-Akka-experts-here-td7127.html#a7146

Maybe we can use the ewah by @lemire

@rxin
Copy link
Contributor Author

rxin commented Sep 21, 2014

@Ishiihara let me know if you are interested in working on adding compressed bitmap to this.

@rxin
Copy link
Contributor Author

rxin commented Sep 21, 2014

@lemire our requirements here are very simple. We just need to have a bitmap to track the position of zero-sized blocks in Spark shuffle. Things we need from the bitmap implementation are:

  1. Fast serialization / deserialization (if there is an byte array that we can write out directly, perfect)
  2. Fast sequential access (just give us the non-zero sized blocks one by one).

So unlike databases, we don't need updates or intersection. I saw that you have published a new archive paper on roaring bitmaps too. Which one would you recommend for this workload?

@Ishiihara
Copy link
Contributor

@rxin I am definitely interested in working on adding compressed bitmap. What is the first step? Thanks.

@lemire
Copy link
Contributor

lemire commented Sep 21, 2014

@rxin We are currently working with the Druid.io guys to integrate Roaring (http://roaringbitmap.org). We get good results and even support memory mapped bitmaps (with ByteBuffer).

At this point, I would recommend you try out roaring. I am available to help.

@Ishiihara
Copy link
Contributor

@rxin @lemire Starting looking at Roaring.

@lemire
Copy link
Contributor

lemire commented Sep 22, 2014

@Ishiihara Get in touch if you have questions.

@andrewor14
Copy link
Contributor

test this please

@SparkQA
Copy link

SparkQA commented Sep 27, 2014

QA tests have started for PR 2470 at commit d139abe.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 27, 2014

QA tests have finished for PR 2470 at commit d139abe.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20895/

@rxin
Copy link
Contributor Author

rxin commented Sep 27, 2014

Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Sep 27, 2014

QA tests have started for PR 2470 at commit d139abe.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 27, 2014

QA tests have started for PR 2470 at commit 11f5319.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 27, 2014

QA tests have finished for PR 2470 at commit d139abe.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 27, 2014

QA tests have finished for PR 2470 at commit 11f5319.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20900/

@rxin
Copy link
Contributor Author

rxin commented Sep 27, 2014

hm mima failed even though MapStatus is private[spark]. I will add an exclude.

@SparkQA
Copy link

SparkQA commented Sep 27, 2014

QA tests have started for PR 2470 at commit f7e720a.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 27, 2014

QA tests have finished for PR 2470 at commit f7e720a.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20902/

*/
private[spark] class DetailedMapStatus(
private[this] var loc: BlockManagerId,
private[this] var compressedSizes: Array[Byte])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why private this? Is this for performance reasons?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. i really have no need for an accessor here

@SparkQA
Copy link

SparkQA commented Sep 29, 2014

QA tests have started for PR 2470 at commit 822ff54.

  • This patch merges cleanly.

@andrewor14
Copy link
Contributor

Great, LPGTM

@rxin
Copy link
Contributor Author

rxin commented Sep 29, 2014

I'm glad it is P!

@SparkQA
Copy link

SparkQA commented Sep 29, 2014

QA tests have finished for PR 2470 at commit 822ff54.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20991/

@SparkQA
Copy link

SparkQA commented Sep 30, 2014

QA tests have started for PR 2470 at commit 822ff54.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 30, 2014

QA tests have finished for PR 2470 at commit 822ff54.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor Author

rxin commented Sep 30, 2014

Merging in master.

@asfgit asfgit closed this in 6b79bfb Sep 30, 2014
@rxin
Copy link
Contributor Author

rxin commented Sep 30, 2014

I also filed a new jira for the compressed bitmap thing: https://issues.apache.org/jira/browse/SPARK-3740

@Ishiihara
Copy link
Contributor

@rxin I looked through Roaring bitmap and that is a highly compressed bitmap compared with other bitmap implementations. I will start working on this and keep you updated with progress and issues coming up during implementation. Thanks!

@rxin rxin deleted the mapstatus branch September 30, 2014 06:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants