Skip to content

Conversation

@vlyubin
Copy link
Contributor

@vlyubin vlyubin commented Apr 9, 2015

@SparkQA
Copy link

SparkQA commented Apr 9, 2015

Test build #29920 has started for PR 5433 at commit 527eac6.

@SparkQA
Copy link

SparkQA commented Apr 9, 2015

Test build #29920 has finished for PR 5433 at commit 527eac6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29920/
Test PASSed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this private, don't want to expose a mutable field publicly

@aarondav
Copy link
Contributor

aarondav commented Apr 9, 2015

LGTM
cc @marmbrus

@chenghao-intel
Copy link
Contributor

I must be lost in some context, why do we need to serde HashReleation? It supposed to be loaded once from the partition data iterator? isn't?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably very huge for SparkSqlSerializer.serialize(hashTable), as in ShuffledHashJoin, we will load the whole partition data into the hashTable, which mean we probably keep 2 copies of the data in the memory, is that the intention?

@aarondav
Copy link
Contributor

aarondav commented Apr 9, 2015

Note that this is only used for BroadcastHashJoins, and this is the broadcasted table. It is thus expected to be relatively small (< 10 MB). For Spark users with default configuration, the broadcast is serialized using Java serialization, which it turns out is much slower than Kryo in this case (see the benchmark).

This turns out to be a pretty significant win for very short O(seconds) queries, where a large portion of time may be spent in performing the broadcast.

@SparkQA
Copy link

SparkQA commented Apr 9, 2015

Test build #29966 has started for PR 5433 at commit d70c829.

@chenghao-intel
Copy link
Contributor

OK, I see, for BroadcastHashJoin, thanks for explanation.

@SparkQA
Copy link

SparkQA commented Apr 9, 2015

Test build #29966 has finished for PR 5433 at commit d70c829.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29966/
Test PASSed.

@marmbrus
Copy link
Contributor

Merging to master. Thanks!

@asfgit asfgit closed this in b9baa4c Apr 10, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants