Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(query): refine hash join bitmap and support fast path for datablock take #13213

Merged
merged 3 commits into from
Oct 12, 2023

Conversation

Dousir9
Copy link
Member

@Dousir9 Dousir9 commented Oct 11, 2023

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

After jorgecarleitao/arrow2#1574, If a Bitmap is all true or all false, Bitmap.slice() has almost no overhead, we can benefit a lot from it, for example, we can keep an all true bitmap true_validity in hash join ProbeState, when we need to wrap a Column to a NullableColumn, we can just slice the true_validity to avoid creating a new bitmap, it can also save a lot of memory(all bitmaps is refer to the true_validity), besides, we can introduce a fast path in take.rs and take_chunks.rs to avoid iterating column to generate a new bitmap when a bitmap is all true or all false.

  • Closes #issue

This change is Reviewable

@vercel
Copy link

vercel bot commented Oct 11, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
databend ✅ Ready (Inspect) Visit Preview 💬 Add feedback Oct 12, 2023 2:07am

@github-actions github-actions bot added the pr-chore this PR only has small changes that no need to record, like coding styles. label Oct 11, 2023
@Dousir9 Dousir9 added the ci-benchmark Benchmark: run all test label Oct 11, 2023
@github-actions
Copy link
Contributor

Docker Image for PR

  • tag: pr-13213-1ee5742

note: this image tag is only available for internal use,
please check the internal doc for more details.

@github-actions
Copy link
Contributor

@BohuTANG
Copy link
Member

Dev / linux / sqllogic_cluster_tpch:

[Diff] (-expected|+actual)
-   HashJoin: INNER (rows: 25224.93)
+   HashJoin: INNER (rows: 25247.99)
    ├── Build
    │   └── Scan: default.tpch.customer (read rows: 15000)
    └── Probe
-       └── HashJoin: INNER (rows: 25224.93)
+       └── HashJoin: INNER (rows: 25247.99)
            ├── Build
            │   └── Scan: default.tpch.orders (read rows: 150000)
            └── Probe
                └── HashJoin: INNER (rows: 120114.40)
                    ├── Build
                    │   └── HashJoin: INNER (rows: 200.00)
                    │       ├── Build
                    │       │   └── HashJoin: INNER (rows: 5.00)
                    │       │       ├── Build
                    │       │       │   └── Scan: default.tpch.region (read rows: 5)
                    │       │       └── Probe
                    │       │           └── Scan: default.tpch.nation (read rows: 25)
                    │       └── Probe
                    │           └── Scan: default.tpch.supplier (read rows: 1000)
                    └── Probe
                        └── Scan: default.tpch.lineitem (read rows: 600572)
at tests/sqllogictests/suites/tpch/queries.test:1325

@BohuTANG
Copy link
Member

@Dousir9
Copy link
Member Author

Dousir9 commented Oct 12, 2023

There is a same issue here in another PR: https://github.com/datafuselabs/databend/actions/runs/6483645024/job/17606440850?pr=13213#step:4:165

This issue has been addressed by #13212.

@BohuTANG BohuTANG merged commit 598a73a into datafuselabs:main Oct 12, 2023
63 checks passed
andylokandy pushed a commit to andylokandy/databend that referenced this pull request Nov 27, 2023
…lock take (datafuselabs#13213)

* add set_true_validity

* fast path for take

---------

Co-authored-by: sundyli <543950155@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-benchmark Benchmark: run all test pr-chore this PR only has small changes that no need to record, like coding styles.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants