-
Notifications
You must be signed in to change notification settings - Fork 2k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Generate In-Clause filters from hash joins (#14864)
Follow-up from #12908 This PR extends the join filters that get generated by hash joins to include an `IN` filter when the hash table is small enough. Rather than generating only a `min-max` filter, we generate an `IN` filter with **all** values in the hash table. This can greatly improve performance over the `min-max` filter when there are few values that are far apart. For example, if the hash table contains the values `1` and `100000`, the `IN` filter might be much more effective as we can prune many more row groups. The threshold for which we generate an `IN` filter is determined by the `dynamic_or_filter_threshold` setting, and defaults to 50 rows. The `IN` filter is pushed as an `OPTIONAL_FILTER`, which is currently only evaluated for zone-map pruning. As a result, the performance impact of pushing this filter is minimal, while the performance improvement from extra zone-map pruning can be significant. ### Benchmark Below is a benchmark that we run over TPC-H SF10 in which we run a join that is generated through an `IN` clause - we join on the `min` and `max` values of `l_orderkey` (meaning the generated min/max filters will not be effective in pruning rows). ```sql SELECT * FROM lineitem WHERE l_orderkey IN ( SELECT UNNEST([MIN(l_orderkey), MAX(l_orderkey)]) FROM lineitem) ORDER BY ALL; ``` | v1.1 | New | |-------|-------| | 0.22s | 0.04s |
- Loading branch information
Showing
31 changed files
with
476 additions
and
186 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
# name: benchmark/tpch/join/join_or_filter_pushdown.benchmark | ||
# description: Join filter pushdown | ||
# group: [join] | ||
|
||
name Join Or Filter Pushdown | ||
group join | ||
subgroup tpch | ||
|
||
require tpch | ||
|
||
cache tpch_sf1.duckdb | ||
|
||
load | ||
CALL dbgen(sf=1); | ||
|
||
run | ||
SELECT * from lineitem WHERE l_orderkey IN (SELECT UNNEST([MIN(l_orderkey), MAX(l_orderkey)]) FROM lineitem) ORDER BY ALL | ||
|
||
result IIIIIIIIIIIIIIII | ||
1 2132 4633 4 28.00 28955.64 0.09 0.06 N O 1996-04-21 1996-03-30 1996-05-16 NONE AIR s cajole busily above t | ||
1 15635 638 6 32.00 49620.16 0.07 0.02 N O 1996-01-30 1996-02-07 1996-02-03 DELIVER IN PERSON MAIL rouches. special | ||
1 24027 1534 5 24.00 22824.48 0.10 0.04 N O 1996-03-30 1996-03-14 1996-04-01 NONE FOB the regular, regular pa | ||
1 63700 3701 3 8.00 13309.60 0.10 0.02 N O 1996-01-29 1996-03-05 1996-01-31 TAKE BACK RETURN REG AIR ourts cajole above the furiou | ||
1 67310 7311 2 36.00 45983.16 0.09 0.06 N O 1996-04-12 1996-02-28 1996-04-20 TAKE BACK RETURN MAIL according to the final foxes. qui | ||
1 155190 7706 1 17.00 21168.23 0.04 0.02 N O 1996-03-13 1996-02-12 1996-03-22 DELIVER IN PERSON TRUCK to beans x-ray carefull | ||
6000000 32255 2256 1 5.00 5936.25 0.04 0.03 N O 1996-11-02 1996-11-19 1996-12-01 TAKE BACK RETURN MAIL riously pe | ||
6000000 96127 6128 2 28.00 31447.36 0.01 0.02 N O 1996-09-22 1996-10-01 1996-10-21 NONE AIR pecial excuses nag evenly f |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
# name: benchmark/tpch/join/join_or_filter_range.benchmark | ||
# description: Join filter pushdown | ||
# group: [join] | ||
|
||
name Join Or Filter Pushdown | ||
group join | ||
subgroup tpch | ||
|
||
require tpch | ||
|
||
cache tpch_sf1.duckdb | ||
|
||
load | ||
CALL dbgen(sf=1); | ||
|
||
run | ||
SELECT * from lineitem WHERE l_orderkey IN (SELECT * FROM range(50)) ORDER BY ALL | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
//===----------------------------------------------------------------------===// | ||
// DuckDB | ||
// | ||
// duckdb/planner/filter/in_filter.hpp | ||
// | ||
// | ||
//===----------------------------------------------------------------------===// | ||
|
||
#pragma once | ||
|
||
#include "duckdb/planner/table_filter.hpp" | ||
#include "duckdb/common/types/value.hpp" | ||
|
||
namespace duckdb { | ||
|
||
class InFilter : public TableFilter { | ||
public: | ||
static constexpr const TableFilterType TYPE = TableFilterType::IN_FILTER; | ||
|
||
public: | ||
explicit InFilter(vector<Value> values); | ||
|
||
vector<Value> values; | ||
|
||
public: | ||
FilterPropagateResult CheckStatistics(BaseStatistics &stats) override; | ||
string ToString(const string &column_name) override; | ||
bool Equals(const TableFilter &other) const override; | ||
unique_ptr<TableFilter> Copy() const override; | ||
unique_ptr<Expression> ToExpression(const Expression &column) const override; | ||
void Serialize(Serializer &serializer) const override; | ||
static unique_ptr<TableFilter> Deserialize(Deserializer &deserializer); | ||
}; | ||
|
||
} // namespace duckdb |
Oops, something went wrong.