-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SQL] [SPARK-2826] Reduce the memory copy while building the hashmap for HashOuterJoin #1765
[SQL] [SPARK-2826] Reduce the memory copy while building the hashmap for HashOuterJoin #1765
Conversation
QA tests have started for PR 1765. This patch merges cleanly. |
QA results for PR 1765: |
@@ -170,6 +164,9 @@ case class HashOuterJoin( | |||
|
|||
def output = left.output ++ right.output | |||
|
|||
@transient private[this] lazy val DUMMY_LIST = Seq[Row](null) | |||
@transient private[this] lazy val EMPTY_LIST = Seq.empty[Row] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a fairly minor nit but defining EMPTY_LIST
instead of just using Seq.empty[Row]
- Takes up extra space in the class
- Requires double checked locking / lazy initialization logic for each access
- Requires the developer to find the variable and figure out what it means, instead of
Seq.empty[Row]
which is standard scala.
All to save 4 characters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @marmbrus , I will fix that as you suggested.
I went ahead and merged this since it improves perf and my only comments were cosmetic. Thanks! |
…for HashOuterJoin This is a follow up for #1147 , this PR will improve the performance about 10% - 15% in my local tests. ``` Before: LeftOuterJoin: took 16750 ms ([3000000] records) LeftOuterJoin: took 15179 ms ([3000000] records) RightOuterJoin: took 15515 ms ([3000000] records) RightOuterJoin: took 15276 ms ([3000000] records) FullOuterJoin: took 19150 ms ([6000000] records) FullOuterJoin: took 18935 ms ([6000000] records) After: LeftOuterJoin: took 15218 ms ([3000000] records) LeftOuterJoin: took 13503 ms ([3000000] records) RightOuterJoin: took 13663 ms ([3000000] records) RightOuterJoin: took 14025 ms ([3000000] records) FullOuterJoin: took 16624 ms ([6000000] records) FullOuterJoin: took 16578 ms ([6000000] records) ``` Besides the performance improvement, I also do some clean up as suggested in #1147 Author: Cheng Hao <hao.cheng@intel.com> Closes #1765 from chenghao-intel/hash_outer_join_fixing and squashes the following commits: ab1f9e0 [Cheng Hao] Reduce the memory copy while building the hashmap (cherry picked from commit 5d54d71) Signed-off-by: Michael Armbrust <michael@databricks.com>
…for HashOuterJoin This is a follow up for apache#1147 , this PR will improve the performance about 10% - 15% in my local tests. ``` Before: LeftOuterJoin: took 16750 ms ([3000000] records) LeftOuterJoin: took 15179 ms ([3000000] records) RightOuterJoin: took 15515 ms ([3000000] records) RightOuterJoin: took 15276 ms ([3000000] records) FullOuterJoin: took 19150 ms ([6000000] records) FullOuterJoin: took 18935 ms ([6000000] records) After: LeftOuterJoin: took 15218 ms ([3000000] records) LeftOuterJoin: took 13503 ms ([3000000] records) RightOuterJoin: took 13663 ms ([3000000] records) RightOuterJoin: took 14025 ms ([3000000] records) FullOuterJoin: took 16624 ms ([6000000] records) FullOuterJoin: took 16578 ms ([6000000] records) ``` Besides the performance improvement, I also do some clean up as suggested in apache#1147 Author: Cheng Hao <hao.cheng@intel.com> Closes apache#1765 from chenghao-intel/hash_outer_join_fixing and squashes the following commits: ab1f9e0 [Cheng Hao] Reduce the memory copy while building the hashmap
This is a follow up for #1147 , this PR will improve the performance about 10% - 15% in my local tests.
Besides the performance improvement, I also do some clean up as suggested in #1147