Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-2856] Decrease initial buffer size for Kryo to 64KB. #1780

Closed
wants to merge 1 commit into from

Conversation

rxin
Copy link
Contributor

@rxin rxin commented Aug 5, 2014

No description provided.

@SparkQA
Copy link

SparkQA commented Aug 5, 2014

QA tests have started for PR 1780. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17927/consoleFull

@pwendell
Copy link
Contributor

pwendell commented Aug 5, 2014

LGTM

@@ -47,7 +47,9 @@ class KryoSerializer(conf: SparkConf)
with Logging
with Serializable {

private val bufferSize = conf.getInt("spark.kryoserializer.buffer.mb", 2) * 1024 * 1024
private val bufferSize =
(conf.getDouble("spark.kryoserializer.buffer.mb", 0.064) * 1024 * 1024).toInt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add a comment // 64KB?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's pretty obvious, isn't it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well for lazy people like me it's nice if we don't have to spend extra cycles in our head

@SparkQA
Copy link

SparkQA commented Aug 5, 2014

QA results for PR 1780:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17927/consoleFull

@rxin
Copy link
Contributor Author

rxin commented Aug 5, 2014

Thanks. Merging in master. @andrewor14 if you feel strongly about it, I can push a commit to add an one-line comment.

@mridulm
Copy link
Contributor

mridulm commented Aug 5, 2014

IIRC if kryo cant host entire serialized object in the buffer, it throws up : we saw issues with it being as high as 256 kb for some of our jobs : though we were using a different api (and it was slightly more complicated usecase).
Might be a good idea to verify this before changing default.

@asfgit asfgit closed this in 184048f Aug 5, 2014
@andrewor14
Copy link
Contributor

Nah that's fine.

@pwendell
Copy link
Contributor

pwendell commented Aug 5, 2014

@mridulm our kryo configuration has been changed from the version you are running - this value is only the initial size of the buffer, not the maximum size. Checkout the master docs:

https://github.com/apache/spark/blob/master/docs/configuration.md

Look for "spark.kryoserializer.buffer.max.mb"

@mridulm
Copy link
Contributor

mridulm commented Aug 5, 2014

Hi @pwendell, my observation about buffer size was not in context of spark ... we saw issues which "looked like" buffer overflow when the serialized object graph was large, and it was not handling the buffer growth properly.
Fortunately, this was due to a bug in our code to begin with (object being serialized was holding unrequired reference to a large graph of objects - running into an mb or so) : and so did not need to pursue it much.
But having seen something which should have been handled anyway, I want to make sure that changing the default does not cause surprises to our users.

If there are issues with buffer growth, and we lower the limit, a lot of jobs will start failing on release.

Given some of the past bugs we have fixed @pwendell (the flush issue comes to mind for example !), I am very wary of kryo - when it works, it is great, rest is suspicious :-)

asfgit pushed a commit that referenced this pull request Aug 6, 2014
Author: Reynold Xin <rxin@apache.org>

Closes #1780 from rxin/kryo-init-size and squashes the following commits:

551b935 [Reynold Xin] [SPARK-2856] Decrease initial buffer size for Kryo to 64KB.

(cherry picked from commit 184048f)
Signed-off-by: Reynold Xin <rxin@apache.org>
@pwendell
Copy link
Contributor

pwendell commented Aug 7, 2014

@mridulm ah I misunderstood! Anyways, thanks for filling it in - I agree we need to be defensive and cautious about our use here :)

@rxin rxin deleted the kryo-init-size branch August 13, 2014 08:02
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
Author: Reynold Xin <rxin@apache.org>

Closes apache#1780 from rxin/kryo-init-size and squashes the following commits:

551b935 [Reynold Xin] [SPARK-2856] Decrease initial buffer size for Kryo to 64KB.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants