Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add bloom filter for single column equivalent expression #3887

Merged
merged 7 commits into from
Jan 21, 2022

Conversation

junli1026
Copy link
Contributor

@junli1026 junli1026 commented Jan 18, 2022

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

Add bloom filter for single column equivalent expression like "name = 'batman'".

Changelog

  • New Feature

Related Issues

Related #3832

Test Plan

Unit Tests

Stateless Tests

@databend-bot databend-bot added the pr-feature this PR introduces a new feature to the codebase label Jan 18, 2022
@databend-bot
Copy link
Member

Thanks for the contribution!
I have applied any labels matching special text in your PR Changelog.

Please review the labels and make any necessary changes.

@vercel
Copy link

vercel bot commented Jan 18, 2022

This pull request is being automatically deployed with Vercel (learn more).
To see the status of your deployment, click below or on the icon next to each commit.

🔍 Inspect: https://vercel.com/databend/databend/H7j4v1R6uftW3pak8Rsdvgn5yDQ2
✅ Preview: https://databend-git-fork-junli1026-jun-dev-databend.vercel.app

@junli1026
Copy link
Contributor Author

junli1026 commented Jan 18, 2022

Seems fasthash library faied aarch64 build

flier/rust-fasthash#13

@PsiACE
Copy link
Member

PsiACE commented Jan 18, 2022

Perhaps we could implement the bloom filter with another hashing algorithm, such as seahash. Or later I can add a 64-bit CityHash v102 implementation to naive-cityhash.

@PsiACE
Copy link
Member

PsiACE commented Jan 18, 2022

I have just released naive-cityhash v0.2.0, supporting cityhash64, cityhash64_with_seed and cityhash64_with_seeds. The only problem is that City64Hasher and City128Hasher have not been implemented yet :(

@junli1026
Copy link
Contributor Author

I have just released naive-cityhash v0.2.0, supporting cityhash64, cityhash64_with_seed and cityhash64_with_seeds. The only problem is that City64Hasher and City128Hasher have not been implemented yet :(

It is fine, we can use SeaHash for now. Thanks for the following up.

@codecov-commenter
Copy link

Codecov Report

Merging #3887 (ab86e7c) into main (a479e82) will increase coverage by 0%.
The diff coverage is 83%.

Impacted file tree graph

@@          Coverage Diff           @@
##            main   #3887    +/-   ##
======================================
  Coverage     57%     57%            
======================================
  Files        770     774     +4     
  Lines      41103   41430   +327     
======================================
+ Hits       23728   23950   +222     
- Misses     17375   17480   +105     
Impacted Files Coverage Δ
common/datavalues/src/types/data_type.rs 73% <ø> (ø)
query/src/storages/index/mod.rs 50% <ø> (+50%) ⬆️
common/datavalues/src/data_hasher.rs 71% <66%> (+<1%) ⬆️
query/src/storages/index/bloom_filter.rs 84% <84%> (ø)
common/ast/src/parser/ast/mod.rs 21% <0%> (-3%) ⬇️
common/streams/src/stream.rs 66% <0%> (-3%) ⬇️
common/management/src/user/user_mgr.rs 95% <0%> (-3%) ⬇️
common/ast/src/parser/expr/expr_visitor.rs 58% <0%> (-2%) ⬇️
common/meta/sled-store/src/sled_tree.rs 86% <0%> (-2%) ⬇️
common/ast/src/parser/ast/query.rs 47% <0%> (-2%) ⬇️
... and 20 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a479e82...ab86e7c. Read the comment docs.

@sundy-li
Copy link
Member

Docs about the cityhash in ClickHouse:

https://go-faster.org/docs/clickhouse/hash

@junli1026 junli1026 changed the title Add bloom filter for datablock Add bloom filter for single column equivalent expression Jan 21, 2022
@junli1026 junli1026 marked this pull request as ready for review January 21, 2022 03:33
Comment on lines +25 to 28
/// We should have our custom none-state hash functions
/// Tracked work item: https://github.com/datafuselabs/databend/issues/3897
#[derive(Clone)]
pub enum DFHasher {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need-review pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants