refactor: runtime filter #13842

xudong963 · 2023-11-29T04:40:03Z

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

Intro:
Adaptive derivation of new predicates at runtime is used to filter the join probe side to improve performance.

New predicates generated at runtime are pushed down through the processor to the table scan on the probe side for prune, thus improving performance significantly.

Simple benchmark
A simple example that is perfect for runtime filtering

select * from t1 join t2 on t1.a = t2.b;
t1: 1_000_000_000
t2: 9999 (inlist runtime filter will not be generated for >10k)

cluster:

before: 0.04 sec
runtime filter: 0.007 sec.

single node:

before: 0.025s
runtime filter: 0.0046 s

Adaptive:

Currently only inlist filter is generated, and only if the total data size of the build is less than 10k. If the inlist is too heavy, it is not efficient, so we may consider supporting bloom filter for data larger than 10k in the future.
For clusters, only broadcast join supports runtime filter, because the data size gap between the build and probe sides of broadcast join is relatively large, and better filtering may be achieved.

Others:
Runtime filter will be saved into QueryCtx by HashMap, key is the table index, and values are filters for the table. Table will get the corresponding filters from ctx by table index to prune before it starts to read data.

Closes #issue

This change is

src/query/service/src/pipelines/executor/executor_graph.rs

github-actions · 2023-12-05T07:43:00Z

Docker Image for PR

tag: pr-13842-0b7053e

note: this image tag is only available for internal use,
please check the internal doc for more details.

github-actions · 2023-12-05T08:13:34Z

ClickBench Report

Dousir9 · 2023-12-08T03:39:48Z

How to generate the test data of t1 and t2, is it numbers(1_000_000_000)?

xudong963 · 2023-12-08T03:49:13Z

How to generate the test data of t1 and t2, is it numbers(1_000_000_000)?

yeah

github-actions · 2023-12-08T11:39:25Z

Docker Image for PR

tag: pr-13842-faef3fd

note: this image tag is only available for internal use,
please check the internal doc for more details.

src/query/pipeline/core/src/processors/duplicate_processor.rs

src/query/service/src/pipelines/executor/executor_graph.rs

github-actions · 2023-12-09T09:43:33Z

Docker Image for PR

tag: pr-13842-61e0906

note: this image tag is only available for internal use,
please check the internal doc for more details.

github-actions · 2023-12-09T10:15:18Z

ClickBench Report

BohuTANG · 2023-12-10T00:54:39Z

src/query/storages/fuse/src/operations/read/runtime_filter_prunner.rs

+    part: &PartInfoPtr,
+    filters: &Vec<Expr<String>>,
+    func_ctx: &FunctionContext,
+) -> Result<bool> {


Do we need adding the runtime filter stats to explain? Now the stats:

├── pruning stats: [segments: <range pruning: 1 to 1>, blocks: <range pruning: 755 to 755, bloom pruning: 0 to 0>]

current stats are collected before the pipeline runs, but runtime filter stats will be generated during the pipeline running. Maybe we can try to add runtime filter stats to query profile later.

How about adding runtime_filter-related stats to the query log? @BohuTANG

src/query/service/src/pipelines/processors/transforms/hash_join/hash_join_build_state.rs

src/query/service/src/pipelines/executor/executor_graph.rs

src/query/service/src/pipelines/processors/transforms/hash_join/util.rs

src/query/storages/fuse/src/operations/read/runtime_filter_prunner.rs

src/query/service/src/pipelines/processors/transforms/hash_join/hash_join_build_state.rs

Dousir9 · 2023-12-11T04:57:58Z

rest LGTM !

xudong963 marked this pull request as draft November 29, 2023 04:40

github-actions bot added the pr-refactor this PR changes the code base without new features or bugfix label Nov 29, 2023

xudong963 force-pushed the refactor_runtime_filter branch from 45edbc1 to e34ba31 Compare December 1, 2023 03:09

xudong963 mentioned this pull request Dec 1, 2023

chore: remove old runtime filter #13896

Merged

BohuTANG reviewed Dec 1, 2023

View reviewed changes

src/query/service/src/pipelines/executor/executor_graph.rs Outdated Show resolved Hide resolved

xudong963 force-pushed the refactor_runtime_filter branch 2 times, most recently from f616257 to b23bf8a Compare December 1, 2023 16:02

xudong963 added the ci-benchmark Benchmark: run all test label Dec 5, 2023

xudong963 force-pushed the refactor_runtime_filter branch 7 times, most recently from 06ac33c to ae6e2ca Compare December 7, 2023 16:13

xudong963 marked this pull request as ready for review December 8, 2023 03:23

xudong963 requested review from sundy-li, zhang2014, leiysky and Dousir9 December 8, 2023 03:29

BohuTANG added ci-benchmark Benchmark: run all test and removed ci-benchmark Benchmark: run all test labels Dec 8, 2023

zhang2014 reviewed Dec 8, 2023

View reviewed changes

src/query/pipeline/core/src/processors/duplicate_processor.rs Outdated Show resolved Hide resolved

zhang2014 reviewed Dec 8, 2023

View reviewed changes

src/query/service/src/pipelines/executor/executor_graph.rs Outdated Show resolved Hide resolved

xudong963 marked this pull request as draft December 8, 2023 14:49

xudong963 added 9 commits December 9, 2023 17:27

dedup inlist

b970e7e

fix cluster

5de6372

define a RuntimeFilter trait to reduce invade processor core

f8ce002

fix other source

b47c2d2

fix

455de60

broadcast join

697a6c7

fix ut

672ffaf

remove executor logic

e17659a

redesign

b197006

BohuTANG reviewed Dec 10, 2023

View reviewed changes

Dousir9 reviewed Dec 10, 2023

View reviewed changes

src/query/service/src/pipelines/processors/transforms/hash_join/hash_join_build_state.rs Outdated Show resolved Hide resolved

zhang2014 reviewed Dec 11, 2023

View reviewed changes

src/query/service/src/pipelines/executor/executor_graph.rs Show resolved Hide resolved

zhang2014 approved these changes Dec 11, 2023

View reviewed changes

sundy-li reviewed Dec 11, 2023

View reviewed changes

src/query/service/src/pipelines/processors/transforms/hash_join/util.rs Outdated Show resolved Hide resolved

resolve comments

06562a8

xudong963 force-pushed the refactor_runtime_filter branch from 3f8a04b to 06562a8 Compare December 11, 2023 02:48

sundy-li reviewed Dec 11, 2023

View reviewed changes

src/query/storages/fuse/src/operations/read/runtime_filter_prunner.rs Show resolved Hide resolved

sundy-li approved these changes Dec 11, 2023

View reviewed changes

xudong963 added this pull request to the merge queue Dec 11, 2023

Dousir9 reviewed Dec 11, 2023

View reviewed changes

src/query/service/src/pipelines/processors/transforms/hash_join/hash_join_build_state.rs Outdated Show resolved Hide resolved

Dousir9 removed this pull request from the merge queue due to a manual request Dec 11, 2023

Dousir9 approved these changes Dec 11, 2023

View reviewed changes

add check

6791197

Dousir9 added this pull request to the merge queue Dec 11, 2023

BohuTANG removed this pull request from the merge queue due to a manual request Dec 11, 2023

BohuTANG merged commit 4b94823 into datafuselabs:main Dec 11, 2023
68 checks passed

xudong963 deleted the refactor_runtime_filter branch December 11, 2023 05:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: runtime filter #13842

refactor: runtime filter #13842

xudong963 commented Nov 29, 2023 •

edited

github-actions bot commented Dec 5, 2023

github-actions bot commented Dec 5, 2023

Dousir9 commented Dec 8, 2023 •

edited

xudong963 commented Dec 8, 2023

github-actions bot commented Dec 8, 2023

github-actions bot commented Dec 9, 2023

github-actions bot commented Dec 9, 2023

BohuTANG Dec 10, 2023

xudong963 Dec 10, 2023

xudong963 Dec 12, 2023

Dousir9 commented Dec 11, 2023

refactor: runtime filter #13842

refactor: runtime filter #13842

Conversation

xudong963 commented Nov 29, 2023 • edited

Summary

github-actions bot commented Dec 5, 2023

Docker Image for PR

github-actions bot commented Dec 5, 2023

ClickBench Report

Dousir9 commented Dec 8, 2023 • edited

xudong963 commented Dec 8, 2023

github-actions bot commented Dec 8, 2023

Docker Image for PR

github-actions bot commented Dec 9, 2023

Docker Image for PR

github-actions bot commented Dec 9, 2023

ClickBench Report

BohuTANG Dec 10, 2023

Choose a reason for hiding this comment

xudong963 Dec 10, 2023

Choose a reason for hiding this comment

xudong963 Dec 12, 2023

Choose a reason for hiding this comment

Dousir9 commented Dec 11, 2023

xudong963 commented Nov 29, 2023 •

edited

Dousir9 commented Dec 8, 2023 •

edited