Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rowexec: add merge hash join processor #40393

Closed
wants to merge 2 commits into from

Conversation

yuzefovich
Copy link
Member

@yuzefovich yuzefovich commented Aug 31, 2019

sql: fix some typos and move some functions

This commit fixes several typos as well as slightly refactors the
code in a few places in order to expose the functions for reuse.

rowexec: add merge hash join processor

This commit adds a new merge hash join processor which can be used
when we have ordering on the subset of equality columns. It first
applies the merging logic only on the ordering columns to find
merge groups, and then performs a hash join only within those merge
groups. At the moment, only INNER join is supported, and the processor
is not being planned.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@yuzefovich
Copy link
Member Author

Here are the benchmarks:

BenchmarkMergeHashJoiner/RepeatSide=none/InputSize=0/MergeHash-12         	  500000	      3496 ns/op
BenchmarkMergeHashJoiner/RepeatSide=none/InputSize=0/Hash-12              	 1000000	      2226 ns/op
BenchmarkMergeHashJoiner/RepeatSide=none/InputSize=4/MergeHash-12         	  200000	      8741 ns/op	  14.64 MB/s
BenchmarkMergeHashJoiner/RepeatSide=none/InputSize=4/Hash-12              	  200000	      7365 ns/op	  17.38 MB/s
BenchmarkMergeHashJoiner/RepeatSide=none/InputSize=16/MergeHash-12        	  100000	     17345 ns/op	  29.52 MB/s
BenchmarkMergeHashJoiner/RepeatSide=none/InputSize=16/Hash-12             	  100000	     15318 ns/op	  33.42 MB/s
BenchmarkMergeHashJoiner/RepeatSide=none/InputSize=256/MergeHash-12       	   10000	    184424 ns/op	  44.42 MB/s
BenchmarkMergeHashJoiner/RepeatSide=none/InputSize=256/Hash-12            	   10000	    175717 ns/op	  46.62 MB/s
BenchmarkMergeHashJoiner/RepeatSide=none/InputSize=4096/MergeHash-12      	     500	   2895067 ns/op	  45.27 MB/s
BenchmarkMergeHashJoiner/RepeatSide=none/InputSize=4096/Hash-12           	     500	   2751885 ns/op	  47.63 MB/s
BenchmarkMergeHashJoiner/RepeatSide=none/InputSize=65536/MergeHash-12     	      30	  48350694 ns/op	  43.37 MB/s
BenchmarkMergeHashJoiner/RepeatSide=none/InputSize=65536/Hash-12          	      30	  50898554 ns/op	  41.20 MB/s
BenchmarkMergeHashJoiner/RepeatSide=left/InputSize=0/MergeHash-12         	  500000	      3599 ns/op
BenchmarkMergeHashJoiner/RepeatSide=left/InputSize=0/Hash-12              	 1000000	      2267 ns/op
BenchmarkMergeHashJoiner/RepeatSide=left/InputSize=4/MergeHash-12         	  200000	      7877 ns/op	  16.25 MB/s
BenchmarkMergeHashJoiner/RepeatSide=left/InputSize=4/Hash-12              	  200000	      7463 ns/op	  17.15 MB/s
BenchmarkMergeHashJoiner/RepeatSide=left/InputSize=16/MergeHash-12        	  200000	     11789 ns/op	  43.43 MB/s
BenchmarkMergeHashJoiner/RepeatSide=left/InputSize=16/Hash-12             	  100000	     15512 ns/op	  33.01 MB/s
BenchmarkMergeHashJoiner/RepeatSide=left/InputSize=256/MergeHash-12       	   20000	     87181 ns/op	  93.96 MB/s
BenchmarkMergeHashJoiner/RepeatSide=left/InputSize=256/Hash-12            	   10000	    174991 ns/op	  46.81 MB/s
BenchmarkMergeHashJoiner/RepeatSide=left/InputSize=4096/MergeHash-12      	    1000	   1334836 ns/op	  98.19 MB/s
BenchmarkMergeHashJoiner/RepeatSide=left/InputSize=4096/Hash-12           	     500	   2703277 ns/op	  48.49 MB/s
BenchmarkMergeHashJoiner/RepeatSide=left/InputSize=65536/MergeHash-12     	      50	  24402907 ns/op	  85.94 MB/s
BenchmarkMergeHashJoiner/RepeatSide=left/InputSize=65536/Hash-12          	      30	  49160955 ns/op	  42.66 MB/s
BenchmarkMergeHashJoiner/RepeatSide=right/InputSize=0/MergeHash-12        	  500000	      3587 ns/op
BenchmarkMergeHashJoiner/RepeatSide=right/InputSize=0/Hash-12             	 1000000	      2299 ns/op
BenchmarkMergeHashJoiner/RepeatSide=right/InputSize=4/MergeHash-12        	  200000	      8088 ns/op	  15.82 MB/s
BenchmarkMergeHashJoiner/RepeatSide=right/InputSize=4/Hash-12             	  200000	      7387 ns/op	  17.33 MB/s
BenchmarkMergeHashJoiner/RepeatSide=right/InputSize=16/MergeHash-12       	  100000	     13118 ns/op	  39.03 MB/s
BenchmarkMergeHashJoiner/RepeatSide=right/InputSize=16/Hash-12            	  100000	     13932 ns/op	  36.75 MB/s
BenchmarkMergeHashJoiner/RepeatSide=right/InputSize=256/MergeHash-12      	   10000	    107095 ns/op	  76.49 MB/s
BenchmarkMergeHashJoiner/RepeatSide=right/InputSize=256/Hash-12           	   10000	    130576 ns/op	  62.74 MB/s
BenchmarkMergeHashJoiner/RepeatSide=right/InputSize=4096/MergeHash-12     	    1000	   1576680 ns/op	  83.13 MB/s
BenchmarkMergeHashJoiner/RepeatSide=right/InputSize=4096/Hash-12          	    1000	   1992915 ns/op	  65.77 MB/s
BenchmarkMergeHashJoiner/RepeatSide=right/InputSize=65536/MergeHash-12    	      50	  29762031 ns/op	  70.46 MB/s
BenchmarkMergeHashJoiner/RepeatSide=right/InputSize=65536/Hash-12         	      50	  33199760 ns/op	  63.17 MB/s
BenchmarkMergeHashJoiner/RepeatSide=both/InputSize=0/MergeHash-12         	  500000	      3556 ns/op
BenchmarkMergeHashJoiner/RepeatSide=both/InputSize=0/Hash-12              	 1000000	      2400 ns/op
BenchmarkMergeHashJoiner/RepeatSide=both/InputSize=4/MergeHash-12         	  200000	      9499 ns/op	  13.47 MB/s
BenchmarkMergeHashJoiner/RepeatSide=both/InputSize=4/Hash-12              	  200000	      8358 ns/op	  15.31 MB/s
BenchmarkMergeHashJoiner/RepeatSide=both/InputSize=16/MergeHash-12        	  100000	     18313 ns/op	  27.96 MB/s
BenchmarkMergeHashJoiner/RepeatSide=both/InputSize=16/Hash-12             	  100000	     20149 ns/op	  25.41 MB/s
BenchmarkMergeHashJoiner/RepeatSide=both/InputSize=256/MergeHash-12       	    5000	    374204 ns/op	  21.89 MB/s
BenchmarkMergeHashJoiner/RepeatSide=both/InputSize=256/Hash-12            	    2000	    616406 ns/op	  13.29 MB/s
BenchmarkMergeHashJoiner/RepeatSide=both/InputSize=4096/MergeHash-12      	     100	  19120332 ns/op	   6.86 MB/s
BenchmarkMergeHashJoiner/RepeatSide=both/InputSize=4096/Hash-12           	      50	  33479559 ns/op	   3.91 MB/s
BenchmarkMergeHashJoiner/RepeatSide=both/InputSize=65536/MergeHash-12     	       1	1104999095 ns/op	   1.90 MB/s
BenchmarkMergeHashJoiner/RepeatSide=both/InputSize=65536/Hash-12          	       1	1961126187 ns/op	   1.07 MB/s

When RepeatSide=none, this processor will be basically a hash joiner (every merge group will be of length 1), so there is a minor performance hit due to additional overhead. Also, this overhead shows up when we have small inputs.
But in other cases, MergeHashJoiner wins. I definitely do not expect this processor to be planned in actual queries for 19.2, but I think it might be useful as a basis for future work.

@RaduBerinde
Copy link
Member

I would also benchmark against a segmented sort + merge join (this is what I expect the best plan to be currently).

@yuzefovich yuzefovich force-pushed the mergehashjoin branch 2 times, most recently from c53519d to 213d897 Compare September 3, 2019 22:28
@yuzefovich
Copy link
Member Author

Added the benchmark for Sort Chunks + Merge Joiner:

BenchmarkMergeHashJoiner/RepeatSide=none/InputSize=0/Hash-12         	  500000	      2381 ns/op
BenchmarkMergeHashJoiner/RepeatSide=none/InputSize=0/MergeHash-12    	  300000	      4219 ns/op
BenchmarkMergeHashJoiner/RepeatSide=none/InputSize=0/SortMerge-12    	  200000	      6921 ns/op
BenchmarkMergeHashJoiner/RepeatSide=none/InputSize=4/Hash-12         	  200000	      7877 ns/op	  16.25 MB/s
BenchmarkMergeHashJoiner/RepeatSide=none/InputSize=4/MergeHash-12    	  200000	      9930 ns/op	  12.89 MB/s
BenchmarkMergeHashJoiner/RepeatSide=none/InputSize=4/SortMerge-12    	  100000	     15191 ns/op	   8.43 MB/s
BenchmarkMergeHashJoiner/RepeatSide=none/InputSize=16/Hash-12        	  100000	     16801 ns/op	  30.47 MB/s
BenchmarkMergeHashJoiner/RepeatSide=none/InputSize=16/MergeHash-12   	  100000	     19307 ns/op	  26.52 MB/s
BenchmarkMergeHashJoiner/RepeatSide=none/InputSize=16/SortMerge-12   	   50000	     28235 ns/op	  18.13 MB/s
BenchmarkMergeHashJoiner/RepeatSide=none/InputSize=256/Hash-12       	   10000	    192985 ns/op	  42.45 MB/s
BenchmarkMergeHashJoiner/RepeatSide=none/InputSize=256/MergeHash-12  	   10000	    202445 ns/op	  40.47 MB/s
BenchmarkMergeHashJoiner/RepeatSide=none/InputSize=256/SortMerge-12  	    5000	    267703 ns/op	  30.60 MB/s
BenchmarkMergeHashJoiner/RepeatSide=none/InputSize=4096/Hash-12      	     500	   2957025 ns/op	  44.33 MB/s
BenchmarkMergeHashJoiner/RepeatSide=none/InputSize=4096/MergeHash-12 	     500	   3119846 ns/op	  42.01 MB/s
BenchmarkMergeHashJoiner/RepeatSide=none/InputSize=4096/SortMerge-12 	     300	   4341501 ns/op	  30.19 MB/s
BenchmarkMergeHashJoiner/RepeatSide=none/InputSize=65536/Hash-12     	              20	  53003884 ns/op	  39.57 MB/s
BenchmarkMergeHashJoiner/RepeatSide=none/InputSize=65536/MergeHash-12         	      30	  51256630 ns/op	  40.91 MB/s
BenchmarkMergeHashJoiner/RepeatSide=none/InputSize=65536/SortMerge-12         	      20	  73997835 ns/op	  28.34 MB/s
BenchmarkMergeHashJoiner/RepeatSide=left/InputSize=0/Hash-12                  	  500000	      2409 ns/op
BenchmarkMergeHashJoiner/RepeatSide=left/InputSize=0/MergeHash-12             	  300000	      4296 ns/op
BenchmarkMergeHashJoiner/RepeatSide=left/InputSize=0/SortMerge-12             	  200000	      6952 ns/op
BenchmarkMergeHashJoiner/RepeatSide=left/InputSize=4/Hash-12                  	  200000	      7947 ns/op	  16.11 MB/s
BenchmarkMergeHashJoiner/RepeatSide=left/InputSize=4/MergeHash-12             	  200000	      8892 ns/op	  14.39 MB/s
BenchmarkMergeHashJoiner/RepeatSide=left/InputSize=4/SortMerge-12             	  100000	     15150 ns/op	   8.45 MB/s
BenchmarkMergeHashJoiner/RepeatSide=left/InputSize=16/Hash-12                 	  100000	     16554 ns/op	  30.93 MB/s
BenchmarkMergeHashJoiner/RepeatSide=left/InputSize=16/MergeHash-12            	  100000	     13201 ns/op	  38.78 MB/s
BenchmarkMergeHashJoiner/RepeatSide=left/InputSize=16/SortMerge-12            	   50000	     27163 ns/op	  18.85 MB/s
BenchmarkMergeHashJoiner/RepeatSide=left/InputSize=256/Hash-12                	   10000	    186044 ns/op	  44.03 MB/s
BenchmarkMergeHashJoiner/RepeatSide=left/InputSize=256/MergeHash-12           	   20000	     97618 ns/op	  83.92 MB/s
BenchmarkMergeHashJoiner/RepeatSide=left/InputSize=256/SortMerge-12           	    5000	    255295 ns/op	  32.09 MB/s
BenchmarkMergeHashJoiner/RepeatSide=left/InputSize=4096/Hash-12               	     500	   2909163 ns/op	  45.05 MB/s
BenchmarkMergeHashJoiner/RepeatSide=left/InputSize=4096/MergeHash-12          	    1000	   1484636 ns/op	  88.29 MB/s
BenchmarkMergeHashJoiner/RepeatSide=left/InputSize=4096/SortMerge-12          	     300	   4239914 ns/op	  30.91 MB/s
BenchmarkMergeHashJoiner/RepeatSide=left/InputSize=65536/Hash-12              	      30	  49430330 ns/op	  42.43 MB/s
BenchmarkMergeHashJoiner/RepeatSide=left/InputSize=65536/MergeHash-12         	      50	  26527365 ns/op	  79.06 MB/s
BenchmarkMergeHashJoiner/RepeatSide=left/InputSize=65536/SortMerge-12         	      20	  74207956 ns/op	  28.26 MB/s
BenchmarkMergeHashJoiner/RepeatSide=right/InputSize=0/Hash-12                 	 1000000	      2477 ns/op
BenchmarkMergeHashJoiner/RepeatSide=right/InputSize=0/MergeHash-12            	  300000	      4551 ns/op
BenchmarkMergeHashJoiner/RepeatSide=right/InputSize=0/SortMerge-12            	  200000	      7402 ns/op
BenchmarkMergeHashJoiner/RepeatSide=right/InputSize=4/Hash-12                 	  200000	      8283 ns/op	  15.45 MB/s
BenchmarkMergeHashJoiner/RepeatSide=right/InputSize=4/MergeHash-12            	  200000	      9299 ns/op	  13.76 MB/s
BenchmarkMergeHashJoiner/RepeatSide=right/InputSize=4/SortMerge-12            	  100000	     15388 ns/op	   8.32 MB/s
BenchmarkMergeHashJoiner/RepeatSide=right/InputSize=16/Hash-12                	  100000	     14258 ns/op	  35.91 MB/s
BenchmarkMergeHashJoiner/RepeatSide=right/InputSize=16/MergeHash-12           	  100000	     14369 ns/op	  35.63 MB/s
BenchmarkMergeHashJoiner/RepeatSide=right/InputSize=16/SortMerge-12           	   50000	     27064 ns/op	  18.92 MB/s
BenchmarkMergeHashJoiner/RepeatSide=right/InputSize=256/Hash-12               	   10000	    139231 ns/op	  58.84 MB/s
BenchmarkMergeHashJoiner/RepeatSide=right/InputSize=256/MergeHash-12          	   10000	    113500 ns/op	  72.18 MB/s
BenchmarkMergeHashJoiner/RepeatSide=right/InputSize=256/SortMerge-12          	    5000	    255496 ns/op	  32.06 MB/s
BenchmarkMergeHashJoiner/RepeatSide=right/InputSize=4096/Hash-12              	    1000	   2114705 ns/op	  61.98 MB/s
BenchmarkMergeHashJoiner/RepeatSide=right/InputSize=4096/MergeHash-12         	    1000	   1728603 ns/op	  75.83 MB/s
BenchmarkMergeHashJoiner/RepeatSide=right/InputSize=4096/SortMerge-12         	     300	   4140965 ns/op	  31.65 MB/s
BenchmarkMergeHashJoiner/RepeatSide=right/InputSize=65536/Hash-12             	      50	  34080473 ns/op	  61.54 MB/s
BenchmarkMergeHashJoiner/RepeatSide=right/InputSize=65536/MergeHash-12        	      50	  31785368 ns/op	  65.98 MB/s
BenchmarkMergeHashJoiner/RepeatSide=right/InputSize=65536/SortMerge-12        	      20	  74589548 ns/op	  28.12 MB/s
BenchmarkMergeHashJoiner/RepeatSide=both/InputSize=0/Hash-12                  	 1000000	      2438 ns/op
BenchmarkMergeHashJoiner/RepeatSide=both/InputSize=0/MergeHash-12             	  300000	      4536 ns/op
BenchmarkMergeHashJoiner/RepeatSide=both/InputSize=0/SortMerge-12             	  200000	      7318 ns/op
BenchmarkMergeHashJoiner/RepeatSide=both/InputSize=4/Hash-12                  	  200000	      8915 ns/op	  14.36 MB/s
BenchmarkMergeHashJoiner/RepeatSide=both/InputSize=4/MergeHash-12             	  200000	      9865 ns/op	  12.97 MB/s
BenchmarkMergeHashJoiner/RepeatSide=both/InputSize=4/SortMerge-12             	  100000	     15261 ns/op	   8.39 MB/s
BenchmarkMergeHashJoiner/RepeatSide=both/InputSize=16/Hash-12                 	  100000	     21141 ns/op	  24.22 MB/s
BenchmarkMergeHashJoiner/RepeatSide=both/InputSize=16/MergeHash-12            	  100000	     20010 ns/op	  25.59 MB/s
BenchmarkMergeHashJoiner/RepeatSide=both/InputSize=16/SortMerge-12            	   50000	     31547 ns/op	  16.23 MB/s
BenchmarkMergeHashJoiner/RepeatSide=both/InputSize=256/Hash-12                	    2000	    653659 ns/op	  12.53 MB/s
BenchmarkMergeHashJoiner/RepeatSide=both/InputSize=256/MergeHash-12           	    3000	    418450 ns/op	  19.58 MB/s
BenchmarkMergeHashJoiner/RepeatSide=both/InputSize=256/SortMerge-12           	    5000	    374647 ns/op	  21.87 MB/s
BenchmarkMergeHashJoiner/RepeatSide=both/InputSize=4096/Hash-12               	      50	  33334068 ns/op	   3.93 MB/s
BenchmarkMergeHashJoiner/RepeatSide=both/InputSize=4096/MergeHash-12          	     100	  21438993 ns/op	   6.11 MB/s
BenchmarkMergeHashJoiner/RepeatSide=both/InputSize=4096/SortMerge-12          	     100	  12902043 ns/op	  10.16 MB/s
BenchmarkMergeHashJoiner/RepeatSide=both/InputSize=65536/Hash-12              	       1	2163552192 ns/op	   0.97 MB/s
BenchmarkMergeHashJoiner/RepeatSide=both/InputSize=65536/MergeHash-12         	       1	1180946596 ns/op	   1.78 MB/s
BenchmarkMergeHashJoiner/RepeatSide=both/InputSize=65536/SortMerge-12         	       2	 685544849 ns/op	   3.06 MB/s

It seems like that combo is faster only when we have groups of significant size on both sides. I wonder whether I'm doing something wrong here.

This commit fixes several typos as well as slightly refactors the
code in a few places in order to expose the functions for reuse.

Release note: None
@yuzefovich yuzefovich changed the title distsqlrun: add merge hash join processor rowexec: add merge hash join processor Oct 11, 2019
@yuzefovich
Copy link
Member Author

Rebased it on top of the current master. PTAL.

@RaduBerinde
Copy link
Member

Can you explain what RepeatSide and InputSize mean in the benchmarks?

@yuzefovich
Copy link
Member Author

yuzefovich commented Oct 11, 2019

Sure.

If a source is not repeated, then its rows are constructed as rows[i][j] = i + j. For example, for numRows = 4, numCols = 4, we'll get:

0 1 2 3
1 2 3 4
2 3 4 5
3 4 5 6

If a source is repeated, then its rows are constructed as rows[i][j] = i/numRepeats + j. For example, for numRows = 4, numCols = 4, numRepeats=2, we'll get:

0 1 2 3
0 1 2 3
1 2 3 4
1 2 3 4

Looking into this carefully made me realize that currently I ran the benchmarks on four configs:

0 1 2 3                       with                          0 1 2 3
1 2 3 4                                                     1 2 3 4
2 3 4 5                                                     2 3 4 5
3 4 5 6                                                     3 4 5 6
0 1 2 3                       with                          0 1 2 3
0 1 2 3                                                     1 2 3 4
0 1 2 3                                                     2 3 4 5
0 1 2 3                                                     3 4 5 6
0 1 2 3                       with                          0 1 2 3
1 2 3 4                                                     0 1 2 3
2 3 4 5                                                     0 1 2 3
3 4 5 6                                                     0 1 2 3
0 1 2 3                       with                          0 1 2 3
0 1 2 3                                                     0 1 2 3
0 1 2 3                                                     0 1 2 3
0 1 2 3                                                     0 1 2 3

@RaduBerinde
Copy link
Member

Can you also explain how we are generating rows for the benchmark? There should be a "number of chunks" (or "chunk size") parameter, it seems that this is hardcoded to "1 chunk" for RepeatSide=left or right, or "sqrt(inputSize)" for RepeatSide=both? Why? Also MakeRepeatedIntRows makes identical rows within each group, we would want rows to have random order on the second column.

@yuzefovich
Copy link
Member Author

Yes, you're right. I thought I was doing something wrong in the benchmarks.

@RaduBerinde
Copy link
Member

RaduBerinde commented Oct 11, 2019

Note that in the left or right variants where one side has identical rows the merge-hash join would finish early, we only have to look at the first group on the other side. Also we shouldn't be chunk-sorting the side that is already sorted.

I think it would be better to generate a "group index" as the first column and a random second column. The number of groups should be a parameter so we can benchmark small-group and large-group cases. (I'd expect SortMerge to perform well on many small groups and badly on large groups).

This commit adds a new merge hash join processor which can be used
when we have ordering on the subset of equality columns. It first
applies the merging logic only on the ordering columns to find
merge groups, and then performs a hash join only within those merge
groups. At the moment, only INNER join is supported, and the processor
is not being planned.

Release note: None
@yuzefovich
Copy link
Member Author

I updated the benchmark as you suggested:

  • the first column is "group index", these are non-decreasing indices, with GroupSize determining the number of equals
  • the second column is random int
    Here are the results:
BenchmarkMergeHashJoiner/InputSize=4/GroupSize=2/Hash-12         	  200000	      7006 ns/op	  18.27 MB/s
BenchmarkMergeHashJoiner/InputSize=4/GroupSize=2/MergeHash-12    	  200000	      8488 ns/op	  15.08 MB/s
BenchmarkMergeHashJoiner/InputSize=4/GroupSize=2/SortMerge-12    	  100000	     14454 ns/op	   8.86 MB/s
BenchmarkMergeHashJoiner/InputSize=16/GroupSize=2/Hash-12        	  100000	     13723 ns/op	  37.31 MB/s
BenchmarkMergeHashJoiner/InputSize=16/GroupSize=2/MergeHash-12   	  100000	     15393 ns/op	  33.26 MB/s
BenchmarkMergeHashJoiner/InputSize=16/GroupSize=2/SortMerge-12   	   50000	     26886 ns/op	  19.04 MB/s
BenchmarkMergeHashJoiner/InputSize=16/GroupSize=8/Hash-12        	  100000	     13826 ns/op	  37.03 MB/s
BenchmarkMergeHashJoiner/InputSize=16/GroupSize=8/MergeHash-12   	  100000	     14642 ns/op	  34.97 MB/s
BenchmarkMergeHashJoiner/InputSize=16/GroupSize=8/SortMerge-12   	   50000	     28120 ns/op	  18.21 MB/s
BenchmarkMergeHashJoiner/InputSize=256/GroupSize=2/Hash-12       	   10000	    148325 ns/op	  55.23 MB/s
BenchmarkMergeHashJoiner/InputSize=256/GroupSize=2/MergeHash-12  	   10000	    144208 ns/op	  56.81 MB/s
BenchmarkMergeHashJoiner/InputSize=256/GroupSize=2/SortMerge-12  	    5000	    265236 ns/op	  30.89 MB/s
BenchmarkMergeHashJoiner/InputSize=256/GroupSize=32/Hash-12      	   10000	    150382 ns/op	  54.47 MB/s
BenchmarkMergeHashJoiner/InputSize=256/GroupSize=32/MergeHash-12 	   10000	    147035 ns/op	  55.71 MB/s
BenchmarkMergeHashJoiner/InputSize=256/GroupSize=32/SortMerge-12 	    5000	    310277 ns/op	  26.40 MB/s
BenchmarkMergeHashJoiner/InputSize=256/GroupSize=128/Hash-12     	   10000	    149364 ns/op	  54.85 MB/s
BenchmarkMergeHashJoiner/InputSize=256/GroupSize=128/MergeHash-12         	   10000	    148422 ns/op	  55.19 MB/s
BenchmarkMergeHashJoiner/InputSize=256/GroupSize=128/SortMerge-12         	    5000	    353365 ns/op	  23.18 MB/s
BenchmarkMergeHashJoiner/InputSize=4096/GroupSize=2/Hash-12               	    1000	   2395674 ns/op	  54.71 MB/s
BenchmarkMergeHashJoiner/InputSize=4096/GroupSize=2/MergeHash-12          	    1000	   2227406 ns/op	  58.85 MB/s
BenchmarkMergeHashJoiner/InputSize=4096/GroupSize=2/SortMerge-12          	     300	   4304859 ns/op	  30.45 MB/s
BenchmarkMergeHashJoiner/InputSize=4096/GroupSize=32/Hash-12              	    1000	   2341546 ns/op	  55.98 MB/s
BenchmarkMergeHashJoiner/InputSize=4096/GroupSize=32/MergeHash-12         	    1000	   2256149 ns/op	  58.10 MB/s
BenchmarkMergeHashJoiner/InputSize=4096/GroupSize=32/SortMerge-12         	     300	   5183977 ns/op	  25.28 MB/s
BenchmarkMergeHashJoiner/InputSize=4096/GroupSize=512/Hash-12             	    1000	   2360870 ns/op	  55.52 MB/s
BenchmarkMergeHashJoiner/InputSize=4096/GroupSize=512/MergeHash-12        	    1000	   2219146 ns/op	  59.06 MB/s
BenchmarkMergeHashJoiner/InputSize=4096/GroupSize=512/SortMerge-12        	     200	   6348137 ns/op	  20.65 MB/s
BenchmarkMergeHashJoiner/InputSize=4096/GroupSize=2048/Hash-12            	    1000	   2348396 ns/op	  55.81 MB/s
BenchmarkMergeHashJoiner/InputSize=4096/GroupSize=2048/MergeHash-12       	    1000	   2326982 ns/op	  56.33 MB/s
BenchmarkMergeHashJoiner/InputSize=4096/GroupSize=2048/SortMerge-12       	     200	   7121504 ns/op	  18.41 MB/s
BenchmarkMergeHashJoiner/InputSize=65536/GroupSize=2/Hash-12              	      30	  44558252 ns/op	  47.07 MB/s
BenchmarkMergeHashJoiner/InputSize=65536/GroupSize=2/MergeHash-12         	      30	  41658269 ns/op	  50.34 MB/s
BenchmarkMergeHashJoiner/InputSize=65536/GroupSize=2/SortMerge-12         	      20	  80144087 ns/op	  26.17 MB/s
BenchmarkMergeHashJoiner/InputSize=65536/GroupSize=32/Hash-12             	      30	  46164422 ns/op	  45.43 MB/s
BenchmarkMergeHashJoiner/InputSize=65536/GroupSize=32/MergeHash-12        	      30	  41495468 ns/op	  50.54 MB/s
BenchmarkMergeHashJoiner/InputSize=65536/GroupSize=32/SortMerge-12        	      20	  93896364 ns/op	  22.33 MB/s
BenchmarkMergeHashJoiner/InputSize=65536/GroupSize=512/Hash-12            	      30	  44805495 ns/op	  46.81 MB/s
BenchmarkMergeHashJoiner/InputSize=65536/GroupSize=512/MergeHash-12       	      30	  40971730 ns/op	  51.19 MB/s
BenchmarkMergeHashJoiner/InputSize=65536/GroupSize=512/SortMerge-12       	      10	 114042780 ns/op	  18.39 MB/s
BenchmarkMergeHashJoiner/InputSize=65536/GroupSize=8192/Hash-12           	      30	  44339910 ns/op	  47.30 MB/s
BenchmarkMergeHashJoiner/InputSize=65536/GroupSize=8192/MergeHash-12      	      30	  42413336 ns/op	  49.45 MB/s
BenchmarkMergeHashJoiner/InputSize=65536/GroupSize=8192/SortMerge-12      	      10	 134064709 ns/op	  15.64 MB/s
BenchmarkMergeHashJoiner/InputSize=65536/GroupSize=32768/Hash-12          	      30	  44476056 ns/op	  47.15 MB/s
BenchmarkMergeHashJoiner/InputSize=65536/GroupSize=32768/MergeHash-12     	      30	  46534679 ns/op	  45.07 MB/s
BenchmarkMergeHashJoiner/InputSize=65536/GroupSize=32768/SortMerge-12     	      10	 143607875 ns/op	  14.60 MB/s

Hash and MergeHash are very comparable, but SortMerge strategy is a definite loser. I think this is due to the fact that we actually have two stages of processors with a row buffer in between.

Here is the pprof of MergeHash:
Screen Shot 2019-10-11 at 5 41 14 PM

And of SortMerge:
Screen Shot 2019-10-11 at 5 41 03 PM

@yuzefovich
Copy link
Member Author

The latest benchmarks showed that there was not much of an improvement for MergeHash when comparing against just Hash, so I'll close this PR.

@yuzefovich yuzefovich closed this Dec 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants