-
Notifications
You must be signed in to change notification settings - Fork 29k
[SQL] [SPARK-6794] Use kryo-based SparkSqlSerializer for GeneralHashedRelation #5433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #29920 has started for PR 5433 at commit |
|
Test build #29920 has finished for PR 5433 at commit
|
|
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make this private, don't want to expose a mutable field publicly
|
LGTM |
|
I must be lost in some context, why do we need to serde |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably very huge for SparkSqlSerializer.serialize(hashTable), as in ShuffledHashJoin, we will load the whole partition data into the hashTable, which mean we probably keep 2 copies of the data in the memory, is that the intention?
|
Note that this is only used for BroadcastHashJoins, and this is the broadcasted table. It is thus expected to be relatively small (< 10 MB). For Spark users with default configuration, the broadcast is serialized using Java serialization, which it turns out is much slower than Kryo in this case (see the benchmark). This turns out to be a pretty significant win for very short O(seconds) queries, where a large portion of time may be spent in performing the broadcast. |
|
Test build #29966 has started for PR 5433 at commit |
|
OK, I see, for |
|
Test build #29966 has finished for PR 5433 at commit
|
|
Test PASSed. |
|
Merging to master. Thanks! |
Benchmarking results: http://pastie.org/private/1dneo1mta5zpsw6gmsoeq