HAWQ-1607. This commit implements applying Bloom filter during Scan outer table #1360

linwen · 2018-05-06T13:21:19Z

Pash down Bloom filter structure to outer table scan(only support parquet);
2. Check if the tuple from outer table is found in Bloom filter structure.
3. Add a GUC hawq_hashjoin_bloomfilter_sampling_number. This guc value controls the Bloom filter sampling number, while scanning outer table, for first N tuples of the outer table, if the ratio is larger than hawq_hashjoin_bloomfilter_ratio, the remain tuples will not be checked by Bloom filter.
4. If there is any expression on outer join keys except T_Var(projection), such as, fact.c1 + 1 = dim.c1. 2, if there are multiple join keys, e.g. fact.c1 = dim.c1 and fact.c2 = dim.c2, Bloomfilter won't be created. Since these cases invloves pushing down expression and project information to scan, which will be implemented later.

Please review, thanks!

…uter table, test cases will be added with HAWQ-1608. 1. Pash down Bloom filter structure to outer table scan(only support parquet); 2. Check if the tuple from outer table is found in Bloom filter structure. 3. Add a GUC hawq_hashjoin_bloomfilter_sampling_number. This guc value controls the Bloom filter sampling number, while scanning outer table, for first N tuples of the outer table, if the ratio is larger than hawq_hashjoin_bloomfilter_ratio, the remain tuples will not be checked by Bloom filter. 4. If there is any expression on outer join keys except T_Var(projection), such as, fact.c1 + 1 = dim.c1. 2, if there are multiple join keys, e.g. fact.c1 = dim.c1 and fact.c2 = dim.c2, Bloomfilter won't be created. Since these cases invloves pushing down expression and project information to scan, which will be implemented later.

linwen · 2018-05-07T02:32:13Z

This commit doesn't contain test cases, test cases will be added with HAWQ-1608. After finish HAWQ-1608, users can use "explain analyze" statement to know if the Bloom filter is used for hash join.

wengyanqing · 2018-05-07T07:33:43Z

src/include/nodes/execnodes.h

@@ -1522,6 +1540,9 @@ typedef struct ScanState
 	/* The type of the table that is being scanned */
 	TableType tableType;

+	/* Runtime filter */
+	struct RuntimeFilterState runtimeFilter;


Since ScanState need to be copied some times, it's better to use a point of RuntimeFilterState in the struct and allocate memory dynamically.

wengyanqing · 2018-05-07T07:37:59Z

src/backend/executor/nodeHashjoin.c

+	memcpy(rf->hashfunctions, hjstate->hj_HashTable->hashfunctions, i*sizeof(FmgrInfo));
+	size_t size = offsetof(BloomFilterData, data) + hjstate->hj_HashTable->bloomfilter->data_size;
+	rf->bloomfilter = palloc0(size);
+	memcpy(rf->bloomfilter, hjstate->hj_HashTable->bloomfilter, size);


Why not just assign hjstate->hj_HashTable->bloomfilter to rf->bloomfilter ?

wengyanqing · 2018-05-07T07:41:47Z

src/backend/cdb/cdbparquetrowgroup.c


-		if(hawqAttrToParquetColNum[i] == 1)
+		int colReaderIndex = 0;
+		int16 proj[128];


It's better to use natts instead of 128.

interma

LGTM

wengyanqing

LGTM.

linwen · 2018-05-09T09:21:23Z

merged into master.

Wen Lin added 4 commits May 6, 2018 21:19

Add comments.

72e5bbe

Fix code format.

29163db

Fix code format.

275f210

wengyanqing reviewed May 7, 2018

View reviewed changes

Wen Lin added 2 commits May 7, 2018 17:52

fix review comments.

694ff5a

Fix a bug.

1a90a24

interma approved these changes May 9, 2018

View reviewed changes

wengyanqing approved these changes May 9, 2018

View reviewed changes

linwen closed this May 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HAWQ-1607. This commit implements applying Bloom filter during Scan outer table #1360

HAWQ-1607. This commit implements applying Bloom filter during Scan outer table #1360

linwen commented May 6, 2018

linwen commented May 7, 2018

wengyanqing May 7, 2018

wengyanqing May 7, 2018

wengyanqing May 7, 2018

interma left a comment

wengyanqing left a comment

linwen commented May 9, 2018

HAWQ-1607. This commit implements applying Bloom filter during Scan outer table #1360

HAWQ-1607. This commit implements applying Bloom filter during Scan outer table #1360

Conversation

linwen commented May 6, 2018

linwen commented May 7, 2018

wengyanqing May 7, 2018

Choose a reason for hiding this comment

wengyanqing May 7, 2018

Choose a reason for hiding this comment

wengyanqing May 7, 2018

Choose a reason for hiding this comment

interma left a comment

Choose a reason for hiding this comment

wengyanqing left a comment

Choose a reason for hiding this comment

linwen commented May 9, 2018