Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HAWQ-1607. This commit implements applying Bloom filter during Scan outer table #1360

Closed
wants to merge 6 commits into from

Conversation

linwen
Copy link
Contributor

@linwen linwen commented May 6, 2018

  1. Pash down Bloom filter structure to outer table scan(only support parquet);
    2. Check if the tuple from outer table is found in Bloom filter structure.
    3. Add a GUC hawq_hashjoin_bloomfilter_sampling_number. This guc value controls the Bloom filter sampling number, while scanning outer table, for first N tuples of the outer table, if the ratio is larger than hawq_hashjoin_bloomfilter_ratio, the remain tuples will not be checked by Bloom filter.
    4. If there is any expression on outer join keys except T_Var(projection), such as, fact.c1 + 1 = dim.c1. 2, if there are multiple join keys, e.g. fact.c1 = dim.c1 and fact.c2 = dim.c2, Bloomfilter won't be created. Since these cases invloves pushing down expression and project information to scan, which will be implemented later.

Please review, thanks!

Wen Lin added 4 commits May 6, 2018 21:19
…uter table, test cases will be added with HAWQ-1608.

    1. Pash down Bloom filter structure to outer table scan(only support parquet);
    2. Check if the tuple from outer table is found in Bloom filter structure.
    3. Add a GUC hawq_hashjoin_bloomfilter_sampling_number. This guc value controls the Bloom filter sampling number, while scanning outer table, for first N tuples of the outer table, if the ratio is larger than hawq_hashjoin_bloomfilter_ratio, the remain tuples will not be checked by Bloom filter.
    4. If there is any expression on outer join keys except T_Var(projection), such as, fact.c1 + 1 = dim.c1. 2, if there are multiple join keys, e.g. fact.c1 = dim.c1 and fact.c2 = dim.c2, Bloomfilter won't be created. Since these cases invloves pushing down expression and project information to scan, which will be implemented later.
@linwen
Copy link
Contributor Author

linwen commented May 7, 2018

This commit doesn't contain test cases, test cases will be added with HAWQ-1608. After finish HAWQ-1608, users can use "explain analyze" statement to know if the Bloom filter is used for hash join.

@@ -1522,6 +1540,9 @@ typedef struct ScanState
/* The type of the table that is being scanned */
TableType tableType;

/* Runtime filter */
struct RuntimeFilterState runtimeFilter;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since ScanState need to be copied some times, it's better to use a point of RuntimeFilterState in the struct and allocate memory dynamically.

memcpy(rf->hashfunctions, hjstate->hj_HashTable->hashfunctions, i*sizeof(FmgrInfo));
size_t size = offsetof(BloomFilterData, data) + hjstate->hj_HashTable->bloomfilter->data_size;
rf->bloomfilter = palloc0(size);
memcpy(rf->bloomfilter, hjstate->hj_HashTable->bloomfilter, size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just assign hjstate->hj_HashTable->bloomfilter to rf->bloomfilter ?


if(hawqAttrToParquetColNum[i] == 1)
int colReaderIndex = 0;
int16 proj[128];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to use natts instead of 128.

Copy link
Member

@interma interma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@wengyanqing wengyanqing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@linwen
Copy link
Contributor Author

linwen commented May 9, 2018

merged into master.

@linwen linwen closed this May 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants