Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: add SBF data structure #2795

Merged
merged 2 commits into from
Mar 29, 2024
Merged

chore: add SBF data structure #2795

merged 2 commits into from
Mar 29, 2024

Conversation

romange
Copy link
Collaborator

@romange romange commented Mar 29, 2024

Based on https://gsd.di.uminho.pt/members/cbm/ps/dbloom.pdf

The data-structure itself is a growing list of bloom filters, where the next filter has exponentially larger capacity with exponentially tighter error bound.

The Exist() goes over all the filters and it's enough that at least one of them returns a positive result. For Add(), we make ensure that all the existing filters do not have the element, as well as making sure that the last filter that is being filled does not cross its maximum designated capacity.

Based on https://gsd.di.uminho.pt/members/cbm/ps/dbloom.pdf

The data-structure itself is a growing list of bloom filters,
where the next filter has exponentially larger capacity with exponentially tighter error bound.

The Exist() goes over all the filters and it's enough that at least one of them returns a positive result.
For Add(), we make ensure that all the existing filters do not have the element, as well as making sure that the last
filter that is being filled does not cross its maximum designated capacity.

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
@romange romange requested a review from dranikpg March 29, 2024 18:53
dranikpg
dranikpg previously approved these changes Mar 29, 2024
src/core/bloom.cc Show resolved Hide resolved
src/core/bloom.h Outdated
Comment on lines 23 to 28
/// @brief Destroy must be called before calling the d'tor
~Bloom();

/**
* @brief Construct a new Bloom object
* @brief Initializes a new Bloom object
*
* @param entries - entries are silently rounded up to the minimum capacity.
* @param error must be in (0, 1) range.
* @param fp_prob - False-positive probability of collision. Must be in (0, 1) range.
* @param heap
*/
Bloom(uint32_t entries, double error, mi_heap_t* heap);
~Bloom();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we have so many different comment styles 😢

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where? I use the same doxygen for multiline and /// for single line

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we use just // everywhere


private:
// multiple filters from the smallest to the largest.
std::vector<Bloom, PMR_NS::polymorphic_allocator<Bloom>> filters_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: there is a typedef as PMR_NS::vector

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no there is not. We must compile for older versions of c++ that do not have pmr::vector alias,

src/core/bloom.cc Outdated Show resolved Hide resolved
src/core/bloom.h Outdated
/**
* @brief Adds an item to the bloom filter.
* @param str -
* @return true if element was not present and was added,
* @return false - if element (or a collision) had already been added previously.
*/
bool Add(std::string_view str);
bool Add(uint64_t fp[2]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so now we need it only for tests 🤔 A TestHash: string -> fp[2] would solve all of it

Copy link
Collaborator Author

@romange romange Mar 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we do not use it for tests. We use it inside SBF. We compute hash once for all the filters

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the string version

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's true that we won't not use Bloom directly in prod code but still it seems weird to omit this API

Signed-off-by: Roman Gershman <roman@dragonflydb.io>
@romange romange merged commit 5d998d0 into main Mar 29, 2024
10 checks passed
@romange romange deleted the Bloom branch March 29, 2024 23:22
szinn pushed a commit to szinn/k8s-homelab that referenced this pull request Apr 3, 2024
…nfly ( v1.15.1 → v1.16.0 ) (#3354)

This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
|
[docker.dragonflydb.io/dragonflydb/dragonfly](https://togithub.com/dragonflydb/dragonfly)
| minor | `v1.15.1` -> `v1.16.0` |

---

### Release Notes

<details>
<summary>dragonflydb/dragonfly
(docker.dragonflydb.io/dragonflydb/dragonfly)</summary>

###
[`v1.16.0`](https://togithub.com/dragonflydb/dragonfly/releases/tag/v1.16.0)

[Compare
Source](https://togithub.com/dragonflydb/dragonfly/compare/v1.15.1...v1.16.0)

##### Dragonfly v1.16.0

Our spring release. We are getting closer to 2.0 with some very exciting
features ahead. Stay tuned!

Some prominent changes include:

- Improved memory accounting of client connections
([#&#8203;2710](https://togithub.com/dragonflydb/dragonfly/issues/2710)
[#&#8203;2755](https://togithub.com/dragonflydb/dragonfly/issues/2755)
and
[#&#8203;2692](https://togithub.com/dragonflydb/dragonfly/issues/2692) )
- FT.AGGREGATE call
([#&#8203;2413](https://togithub.com/dragonflydb/dragonfly/issues/2413))
- Properly handle and replicate Memcache flags
([#&#8203;2787](https://togithub.com/dragonflydb/dragonfly/issues/2787)
[#&#8203;2807](https://togithub.com/dragonflydb/dragonfly/issues/2807))
- Intoduce BF.AGGREGATE BD.(M)ADD and BF.(M)EXISTS methods
([#&#8203;2801](https://togithub.com/dragonflydb/dragonfly/issues/2801)).
Note, that it does not work with snapshots and replication yet.
- Dragonfly builds natively on MacOS. We would love some help with
extending the release pipeline to create a proper macos binary.
- Following the requests from the Edge developers community, we added a
basic HTTP API support! Try running Dragonfly with:
`--expose_http_api` flag and then call `curl -X POST -d '["ping"]'
localhost:6379/api`. We will follow up with more extensive docs later
this month.
- Lots of stability fixes, especially around Sidekiq and BullMQ
workloads.

##### What's Changed

- chore: make usan asan optional and enable them on CI by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2631
- fix: missing manual trigger for daily sanitizers by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2682
- bug(tiering): fix overflow in page offset calculation and wrong hash
offset calculation by [@&#8203;theyueli](https://togithub.com/theyueli)
in
[dragonflydb/dragonfly#2683
- Chore: Fixed Docker Health Check by
[@&#8203;manojks1999](https://togithub.com/manojks1999) in
[dragonflydb/dragonfly#2659
- chore: Increase disk space in the coverage runs by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2684
- fix(flushall): Decommit memory after releasing tables. by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2691
- feat(server): Account for serializer's temporary buffer size by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2689
- refactor: remove FULL-SYNC-CUT cmd
[#&#8203;2687](https://togithub.com/dragonflydb/dragonfly/issues/2687)
by [@&#8203;BorysTheDev](https://togithub.com/BorysTheDev) in
[dragonflydb/dragonfly#2688
- chore: add malloc-based stats and decommit by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2692
- feat(cluster): automatic slot migration finalization
[#&#8203;2697](https://togithub.com/dragonflydb/dragonfly/issues/2697)
by [@&#8203;BorysTheDev](https://togithub.com/BorysTheDev) in
[dragonflydb/dragonfly#2698
- Basic FT.AGGREGATE by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2413
- chore: little transaction cleanup by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2608
- fix(channel store): add acquire/release pair in fast update path by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2704
- chore: add ubuntu22 devcontainer by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2700
- feat(cluster): Add `--cluster_id` flag by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2695
- feat(server): Use mimalloc in SSL calls by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2710
- chore: Pull helio with new BlockingCounter by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2711
- chore(transaction): Simplify PollExecution by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2712
- chore(transaction): Don't call GetLocalMask from blocking controller
by [@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2715
- chore: improve compatibility of EXPIRE functions with Redis by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2696
- chore: disable flaky fuzzy migration test by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2716
- chore: update sanitizers workflow by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2686
- chore: Use c-ares for resolving hosts in `ProtocolClient` by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2719
- Remove check-fail in ExpireIfNeeded and introduce DFLY LOAD by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2699
- chore: Record cmd stat from invoke by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2720
- fix(transaction): nullptr access on non transactional commands by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2724
- fix(BgSave): async from sync by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2702
- chore: remove core/fibers by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2723
- fix(transaction): Replace with armed sync point by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2708
- feat(json): Deserialize ReJSON format by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2725
- feat: add flag masteruser by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2693
- refactor: block list refactoring
[#&#8203;2580](https://togithub.com/dragonflydb/dragonfly/issues/2580)
by [@&#8203;BorysTheDev](https://togithub.com/BorysTheDev) in
[dragonflydb/dragonfly#2732
- chore: fix DeduceExecMode by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2733
- fix(cluster): Reply with correct `\n` / `\r\n` from `CLUSTER` sub cmd
by [@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2731
- chore: Introduce fiber stack allocator by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2730
- fix(cluster): Save replica ID per replica by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2735
- fix(ssl): Proper cleanup by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2742
- chore: add skeleton files for flat_dfs code by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2738
- chore: better error reporting when connecting to tls with plain socket
by [@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2740
- chore: Support json paths without root selector by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2747
- chore: journal cleanup by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2749
- refactor: remove start-slot-migration cmd
[#&#8203;2727](https://togithub.com/dragonflydb/dragonfly/issues/2727)
by [@&#8203;BorysTheDev](https://togithub.com/BorysTheDev) in
[dragonflydb/dragonfly#2728
- feat(server): Add TLS usage to /metrics and `INFO MEMORY` by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2755
- chore: preparations for adding flat json support by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2752
- chore(transaction): Introduce RunCallback by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2760
- feat(replication): Do not auto replicate different master by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2753
- Improve Helm chart to be rendered locally and on machines where is not
the application target by [@&#8203;fafg](https://togithub.com/fafg) in
[dragonflydb/dragonfly#2706
- chore: preparation for basic http api by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2764
- feat(server): Add metric for RDB save duration. by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2768
- chore: fix flat_dfs read tests. by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2772
- fix(ci): do not overwrite last_log_file among tests by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2759
- feat(server): support cluster replication by
[@&#8203;adiholden](https://togithub.com/adiholden) in
[dragonflydb/dragonfly#2748
- fix: fiber preempts on read path and OnCbFinish() clears
fetched_items\_ by [@&#8203;kostasrim](https://togithub.com/kostasrim)
in
[dragonflydb/dragonfly#2763
- chore(ci): open last_log_file in append mode by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2776
- doc(README): fix outdated expiry ranges description by
[@&#8203;enjoy-binbin](https://togithub.com/enjoy-binbin) in
[dragonflydb/dragonfly#2779
- Benchmark runner by
[@&#8203;adiholden](https://togithub.com/adiholden) in
[dragonflydb/dragonfly#2780
- chore(replication-tests): add cache_mode on test replication all by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2685
- feat(tiering): DiskStorage by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2770
- chore: add a boilerplate for bloom filter family by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2782
- chore: introduce conversion routines between JsonType and FlatJson by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2785
- chore: Fix memcached flags not updated by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2787
- chore: remove duplicate code from dash and simplify by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2765
- chore: disable test_cluster_slot_migration by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2788
- fix: new\[] delete\[] missmatch in disk_storage by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2792
- fix: sanitizers clang build and clean up some warnings by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2793
- chore: add bloom filter class by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2791
- chore: add SBF data structure by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2795
- chore(tiering): Disable compilation for MacOs by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2799
- chore: fix daily build by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2798
- chore: expose SBF via compact_object by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2797
- fix(ci): malloc trim on sanitizers workflow by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2794
- fix(cluster): Don't miss updates in FLUSHSLOTS by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2783
- feat: add bf.(m)add and bf.(m)exists commands by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2801
- fix: SBF memory leaks by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2803
- chore: refactor StringFamily::Set to use CmdArgParser by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2800
- fix: propagate memcached flags to replica by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2807
- DFLYMIGRATE ACK refactoring by
[@&#8203;BorysTheDev](https://togithub.com/BorysTheDev) in
[dragonflydb/dragonfly#2790
- feat: add master lsn and journal_executed dcheck in replica via ping
by [@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2778
- fix: correct json response for errors by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2813
- chore: bloom test - cover corner cases by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2806
- bug(server): do not write lsn opcode to journal by
[@&#8203;adiholden](https://togithub.com/adiholden) in
[dragonflydb/dragonfly#2814
- chore: Fix build by disabling the tests. by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2821
- fix(replication): replication with multi shard sync enabled lagging by
[@&#8203;adiholden](https://togithub.com/adiholden) in
[dragonflydb/dragonfly#2823
- fix: io_uring/fibers bug in DnsResolve by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2825

##### New Contributors

- [@&#8203;manojks1999](https://togithub.com/manojks1999) made their
first contribution in
[dragonflydb/dragonfly#2659
- [@&#8203;fafg](https://togithub.com/fafg) made their first
contribution in
[dragonflydb/dragonfly#2706
- [@&#8203;enjoy-binbin](https://togithub.com/enjoy-binbin) made their
first contribution in
[dragonflydb/dragonfly#2779

##### Huge thanks to all the contributors! ❤️

🇮🇱  🇺🇦

**Full Changelog**:
dragonflydb/dragonfly@v1.15.0...v1.16.0

</details>

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4yNzkuMCIsInVwZGF0ZWRJblZlciI6IjM3LjI3OS4wIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJyZW5vdmF0ZS9jb250YWluZXIiLCJ0eXBlL21pbm9yIl19-->

Co-authored-by: repo-jeeves[bot] <106431701+repo-jeeves[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants