Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: add bloom filter class #2791

Merged
merged 2 commits into from
Mar 29, 2024
Merged

chore: add bloom filter class #2791

merged 2 commits into from
Mar 29, 2024

Conversation

romange
Copy link
Collaborator

@romange romange commented Mar 29, 2024

Based on https://github.com/jvirkki/libbloom implementation.

Unlike the original, our implementation uses XXH3 hash function to seed bit index generation. In addition, it assumes mi_malloc interface for dynamic allocation.

@romange romange requested a review from dranikpg March 29, 2024 13:00
Based on https://github.com/jvirkki/libbloom implementation.

Unlike the original, our implementation uses XXH3 hash function to seed bit index generation.
In addition, it assumes mi_malloc interface for dynamic allocation.

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
dranikpg
dranikpg previously approved these changes Mar 29, 2024
Copy link
Contributor

@dranikpg dranikpg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but did you consider vector<bool>? Saves you half (most) of the code 🙂

src/core/bloom.h Outdated Show resolved Hide resolved
Comment on lines +52 to +54

bf_ = (uint8_t*)mi_heap_calloc(heap, length, 1);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a thought, but an utuility that returns a unique_ptr on the mi-heap with a custom deleter would be helpful, we do this currently for some pointers in connection

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe but then we will increase the class size

src/core/bloom.h Outdated
Comment on lines 33 to 36

uint64_t GetMask() const {
return (1ULL << bit_log_) - 1;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's only called to pass it into BitIndex()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you want to move it into cc?

Comment on lines +89 to +103
inline bool Bloom::IsSet(size_t bit_idx) const {
uint64_t byte_idx = bit_idx / 8;
bit_idx %= 8; // index within the byte
uint8_t b = bf_[byte_idx];
return (b & (1 << bit_idx)) != 0;
}

inline bool Bloom::Set(size_t bit_idx) {
uint64_t byte_idx = bit_idx / 8;
bit_idx %= 8;

uint8_t b = bf_[byte_idx];
bf_[byte_idx] |= (1 << bit_idx);
return bf_[byte_idx] != b;
}
Copy link
Contributor

@dranikpg dranikpg Mar 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you actually don't need all of this if you just use vector<bool> 🤔 It has proxies that allow you to reference a specific bit in its internal structure

Copy link
Collaborator Author

@romange romange Mar 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right but then I need to pass mimalloc memory resource etc. it's not a lot of code and I won't object to reducing it in the future if memory usage with vector bool won't increase.

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
@romange romange merged commit 64abe26 into main Mar 29, 2024
10 checks passed
@romange romange deleted the Bloom branch March 29, 2024 15:45
szinn pushed a commit to szinn/k8s-homelab that referenced this pull request Apr 3, 2024
…nfly ( v1.15.1 → v1.16.0 ) (#3354)

This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
|
[docker.dragonflydb.io/dragonflydb/dragonfly](https://togithub.com/dragonflydb/dragonfly)
| minor | `v1.15.1` -> `v1.16.0` |

---

### Release Notes

<details>
<summary>dragonflydb/dragonfly
(docker.dragonflydb.io/dragonflydb/dragonfly)</summary>

###
[`v1.16.0`](https://togithub.com/dragonflydb/dragonfly/releases/tag/v1.16.0)

[Compare
Source](https://togithub.com/dragonflydb/dragonfly/compare/v1.15.1...v1.16.0)

##### Dragonfly v1.16.0

Our spring release. We are getting closer to 2.0 with some very exciting
features ahead. Stay tuned!

Some prominent changes include:

- Improved memory accounting of client connections
([#&#8203;2710](https://togithub.com/dragonflydb/dragonfly/issues/2710)
[#&#8203;2755](https://togithub.com/dragonflydb/dragonfly/issues/2755)
and
[#&#8203;2692](https://togithub.com/dragonflydb/dragonfly/issues/2692) )
- FT.AGGREGATE call
([#&#8203;2413](https://togithub.com/dragonflydb/dragonfly/issues/2413))
- Properly handle and replicate Memcache flags
([#&#8203;2787](https://togithub.com/dragonflydb/dragonfly/issues/2787)
[#&#8203;2807](https://togithub.com/dragonflydb/dragonfly/issues/2807))
- Intoduce BF.AGGREGATE BD.(M)ADD and BF.(M)EXISTS methods
([#&#8203;2801](https://togithub.com/dragonflydb/dragonfly/issues/2801)).
Note, that it does not work with snapshots and replication yet.
- Dragonfly builds natively on MacOS. We would love some help with
extending the release pipeline to create a proper macos binary.
- Following the requests from the Edge developers community, we added a
basic HTTP API support! Try running Dragonfly with:
`--expose_http_api` flag and then call `curl -X POST -d '["ping"]'
localhost:6379/api`. We will follow up with more extensive docs later
this month.
- Lots of stability fixes, especially around Sidekiq and BullMQ
workloads.

##### What's Changed

- chore: make usan asan optional and enable them on CI by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2631
- fix: missing manual trigger for daily sanitizers by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2682
- bug(tiering): fix overflow in page offset calculation and wrong hash
offset calculation by [@&#8203;theyueli](https://togithub.com/theyueli)
in
[dragonflydb/dragonfly#2683
- Chore: Fixed Docker Health Check by
[@&#8203;manojks1999](https://togithub.com/manojks1999) in
[dragonflydb/dragonfly#2659
- chore: Increase disk space in the coverage runs by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2684
- fix(flushall): Decommit memory after releasing tables. by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2691
- feat(server): Account for serializer's temporary buffer size by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2689
- refactor: remove FULL-SYNC-CUT cmd
[#&#8203;2687](https://togithub.com/dragonflydb/dragonfly/issues/2687)
by [@&#8203;BorysTheDev](https://togithub.com/BorysTheDev) in
[dragonflydb/dragonfly#2688
- chore: add malloc-based stats and decommit by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2692
- feat(cluster): automatic slot migration finalization
[#&#8203;2697](https://togithub.com/dragonflydb/dragonfly/issues/2697)
by [@&#8203;BorysTheDev](https://togithub.com/BorysTheDev) in
[dragonflydb/dragonfly#2698
- Basic FT.AGGREGATE by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2413
- chore: little transaction cleanup by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2608
- fix(channel store): add acquire/release pair in fast update path by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2704
- chore: add ubuntu22 devcontainer by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2700
- feat(cluster): Add `--cluster_id` flag by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2695
- feat(server): Use mimalloc in SSL calls by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2710
- chore: Pull helio with new BlockingCounter by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2711
- chore(transaction): Simplify PollExecution by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2712
- chore(transaction): Don't call GetLocalMask from blocking controller
by [@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2715
- chore: improve compatibility of EXPIRE functions with Redis by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2696
- chore: disable flaky fuzzy migration test by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2716
- chore: update sanitizers workflow by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2686
- chore: Use c-ares for resolving hosts in `ProtocolClient` by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2719
- Remove check-fail in ExpireIfNeeded and introduce DFLY LOAD by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2699
- chore: Record cmd stat from invoke by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2720
- fix(transaction): nullptr access on non transactional commands by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2724
- fix(BgSave): async from sync by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2702
- chore: remove core/fibers by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2723
- fix(transaction): Replace with armed sync point by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2708
- feat(json): Deserialize ReJSON format by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2725
- feat: add flag masteruser by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2693
- refactor: block list refactoring
[#&#8203;2580](https://togithub.com/dragonflydb/dragonfly/issues/2580)
by [@&#8203;BorysTheDev](https://togithub.com/BorysTheDev) in
[dragonflydb/dragonfly#2732
- chore: fix DeduceExecMode by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2733
- fix(cluster): Reply with correct `\n` / `\r\n` from `CLUSTER` sub cmd
by [@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2731
- chore: Introduce fiber stack allocator by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2730
- fix(cluster): Save replica ID per replica by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2735
- fix(ssl): Proper cleanup by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2742
- chore: add skeleton files for flat_dfs code by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2738
- chore: better error reporting when connecting to tls with plain socket
by [@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2740
- chore: Support json paths without root selector by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2747
- chore: journal cleanup by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2749
- refactor: remove start-slot-migration cmd
[#&#8203;2727](https://togithub.com/dragonflydb/dragonfly/issues/2727)
by [@&#8203;BorysTheDev](https://togithub.com/BorysTheDev) in
[dragonflydb/dragonfly#2728
- feat(server): Add TLS usage to /metrics and `INFO MEMORY` by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2755
- chore: preparations for adding flat json support by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2752
- chore(transaction): Introduce RunCallback by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2760
- feat(replication): Do not auto replicate different master by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2753
- Improve Helm chart to be rendered locally and on machines where is not
the application target by [@&#8203;fafg](https://togithub.com/fafg) in
[dragonflydb/dragonfly#2706
- chore: preparation for basic http api by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2764
- feat(server): Add metric for RDB save duration. by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2768
- chore: fix flat_dfs read tests. by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2772
- fix(ci): do not overwrite last_log_file among tests by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2759
- feat(server): support cluster replication by
[@&#8203;adiholden](https://togithub.com/adiholden) in
[dragonflydb/dragonfly#2748
- fix: fiber preempts on read path and OnCbFinish() clears
fetched_items\_ by [@&#8203;kostasrim](https://togithub.com/kostasrim)
in
[dragonflydb/dragonfly#2763
- chore(ci): open last_log_file in append mode by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2776
- doc(README): fix outdated expiry ranges description by
[@&#8203;enjoy-binbin](https://togithub.com/enjoy-binbin) in
[dragonflydb/dragonfly#2779
- Benchmark runner by
[@&#8203;adiholden](https://togithub.com/adiholden) in
[dragonflydb/dragonfly#2780
- chore(replication-tests): add cache_mode on test replication all by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2685
- feat(tiering): DiskStorage by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2770
- chore: add a boilerplate for bloom filter family by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2782
- chore: introduce conversion routines between JsonType and FlatJson by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2785
- chore: Fix memcached flags not updated by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2787
- chore: remove duplicate code from dash and simplify by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2765
- chore: disable test_cluster_slot_migration by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2788
- fix: new\[] delete\[] missmatch in disk_storage by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2792
- fix: sanitizers clang build and clean up some warnings by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2793
- chore: add bloom filter class by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2791
- chore: add SBF data structure by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2795
- chore(tiering): Disable compilation for MacOs by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2799
- chore: fix daily build by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2798
- chore: expose SBF via compact_object by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2797
- fix(ci): malloc trim on sanitizers workflow by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2794
- fix(cluster): Don't miss updates in FLUSHSLOTS by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2783
- feat: add bf.(m)add and bf.(m)exists commands by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2801
- fix: SBF memory leaks by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2803
- chore: refactor StringFamily::Set to use CmdArgParser by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2800
- fix: propagate memcached flags to replica by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2807
- DFLYMIGRATE ACK refactoring by
[@&#8203;BorysTheDev](https://togithub.com/BorysTheDev) in
[dragonflydb/dragonfly#2790
- feat: add master lsn and journal_executed dcheck in replica via ping
by [@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2778
- fix: correct json response for errors by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2813
- chore: bloom test - cover corner cases by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2806
- bug(server): do not write lsn opcode to journal by
[@&#8203;adiholden](https://togithub.com/adiholden) in
[dragonflydb/dragonfly#2814
- chore: Fix build by disabling the tests. by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2821
- fix(replication): replication with multi shard sync enabled lagging by
[@&#8203;adiholden](https://togithub.com/adiholden) in
[dragonflydb/dragonfly#2823
- fix: io_uring/fibers bug in DnsResolve by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2825

##### New Contributors

- [@&#8203;manojks1999](https://togithub.com/manojks1999) made their
first contribution in
[dragonflydb/dragonfly#2659
- [@&#8203;fafg](https://togithub.com/fafg) made their first
contribution in
[dragonflydb/dragonfly#2706
- [@&#8203;enjoy-binbin](https://togithub.com/enjoy-binbin) made their
first contribution in
[dragonflydb/dragonfly#2779

##### Huge thanks to all the contributors! ❤️

🇮🇱  🇺🇦

**Full Changelog**:
dragonflydb/dragonfly@v1.15.0...v1.16.0

</details>

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4yNzkuMCIsInVwZGF0ZWRJblZlciI6IjM3LjI3OS4wIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJyZW5vdmF0ZS9jb250YWluZXIiLCJ0eXBlL21pbm9yIl19-->

Co-authored-by: repo-jeeves[bot] <106431701+repo-jeeves[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants