Skip to content

Optimise ShardAwareDeduplicateFilter #11819

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jun 26, 2025

Conversation

pracucci
Copy link
Collaborator

@pracucci pracucci commented Jun 23, 2025

What this PR does

This PR has been be authored by @colega and myself.

We had an issue where a tenant compaction was lagging behind, and their number blocks reached the monster number of 250K (from a baseline of 10K). When that happened, we found out that the compactor planning was extremely slow (3 hours), and that the slow down came from ShardAwareDeduplicateFilter.

In this PR we introduce some small optimizations to ShardAwareDeduplicateFilter for tenants with a very large number of blocks. The impact on performance is not impressive, but we got a -33% reduction (from 3h30m to 2h15m) for the real world scenario.

Existing benchmark result:


goos: darwin
goarch: arm64
pkg: github.com/grafana/mimir/pkg/compactor
cpu: Apple M3 Pro
                                            │  before.txt   │ after-with-bloom-filter-and-without-list.txt │
                                            │    sec/op     │        sec/op          vs base               │
DeduplicateFilter_Filter/Block-10/#00-11       17.64µ ± 14%             25.23µ ± 3%  +43.04% (p=0.002 n=6)
DeduplicateFilter_Filter/Block-10/#01-11       53.33µ ±  1%             76.36µ ± 1%  +43.18% (p=0.002 n=6)
DeduplicateFilter_Filter/Block-100/#00-11      383.6µ ±  4%             294.2µ ± 1%  -23.30% (p=0.002 n=6)
DeduplicateFilter_Filter/Block-100/#01-11     1153.2µ ±  1%             882.1µ ± 1%  -23.51% (p=0.002 n=6)
DeduplicateFilter_Filter/Block-1000/#00-11    28.346m ±  1%             7.578m ± 6%  -73.26% (p=0.002 n=6)
DeduplicateFilter_Filter/Block-1000/#01-11     90.15m ±  1%             23.02m ± 1%  -74.46% (p=0.002 n=6)
DeduplicateFilter_Filter/Block-10000/#00-11     6.820 ±  5%              1.237 ± 1%  -81.86% (p=0.002 n=6)
DeduplicateFilter_Filter/Block-10000/#01-11    20.710 ±  2%              3.723 ± 1%  -82.02% (p=0.002 n=6)
geomean                                        10.52m                   5.018m       -52.31%

                                            │  before.txt  │ after-with-bloom-filter-and-without-list.txt │
                                            │     B/op     │         B/op           vs base               │
DeduplicateFilter_Filter/Block-10/#00-11      24.87Ki ± 0%            30.59Ki ± 0%  +23.00% (p=0.002 n=6)
DeduplicateFilter_Filter/Block-10/#01-11      74.61Ki ± 0%            91.76Ki ± 0%  +23.00% (p=0.002 n=6)
DeduplicateFilter_Filter/Block-100/#00-11     247.4Ki ± 0%            299.5Ki ± 0%  +21.07% (p=0.002 n=6)
DeduplicateFilter_Filter/Block-100/#01-11     742.6Ki ± 0%            898.9Ki ± 0%  +21.05% (p=0.002 n=6)
DeduplicateFilter_Filter/Block-1000/#00-11    2.509Mi ± 0%            2.971Mi ± 0%  +18.40% (p=0.002 n=6)
DeduplicateFilter_Filter/Block-1000/#01-11    7.955Mi ± 0%            9.033Mi ± 0%  +13.54% (p=0.002 n=6)
DeduplicateFilter_Filter/Block-10000/#00-11   48.80Mi ± 0%            58.87Mi ± 0%  +20.64% (p=0.002 n=6)
DeduplicateFilter_Filter/Block-10000/#01-11   146.4Mi ± 0%            176.6Mi ± 0%  +20.64% (p=0.002 n=6)
geomean                                       1.606Mi                 1.929Mi       +20.13%

                                            │ before.txt  │ after-with-bloom-filter-and-without-list.txt │
                                            │  allocs/op  │       allocs/op         vs base              │
DeduplicateFilter_Filter/Block-10/#00-11       63.00 ± 0%               64.00 ± 0%  +1.59% (p=0.002 n=6)
DeduplicateFilter_Filter/Block-10/#01-11       189.0 ± 0%               192.0 ± 0%  +1.59% (p=0.002 n=6)
DeduplicateFilter_Filter/Block-100/#00-11      522.0 ± 0%               523.0 ± 0%  +0.19% (p=0.002 n=6)
DeduplicateFilter_Filter/Block-100/#01-11     1.567k ± 0%              1.570k ± 0%  +0.19% (p=0.002 n=6)
DeduplicateFilter_Filter/Block-1000/#00-11    5.162k ± 0%              5.069k ± 0%  -1.80% (p=0.002 n=6)
DeduplicateFilter_Filter/Block-1000/#01-11    16.37k ± 0%              15.41k ± 0%  -5.83% (p=0.002 n=6)
DeduplicateFilter_Filter/Block-10000/#00-11   100.2k ± 0%              100.2k ± 0%  +0.00% (p=0.002 n=6)
DeduplicateFilter_Filter/Block-10000/#01-11   300.6k ± 0%              300.6k ± 0%  +0.00% (p=0.002 n=6)
geomean                                       3.542k                   3.523k       -0.53%

Which issue(s) this PR fixes or relates to

N/A

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]. If changelog entry is not needed, please add the changelog-not-needed label to the PR.
  • about-versioning.md updated with experimental features.

@pracucci pracucci changed the title Optimise shard aware deduplicate filter Optimise ShardAwareDeduplicateFilter Jun 23, 2025
@aknuds1 aknuds1 requested a review from Copilot June 23, 2025 13:25
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the deduplication filter to run per-resolution checks in parallel, enhances blockWithSuccessors to maintain both a map and a sorted list of sources, and updates benchmarks and tests accordingly.

  • Parallelize duplicate-finding per resolution using concurrency.ForEachJob
  • Split findDuplicates into standalone findDuplicatedBlocks and update Filter
  • Store sources in both a map and a sorted list, optimize isIncludedIn, and add a test for newBlockWithSuccessors

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
pkg/compactor/shard_aware_deduplicate_filter.go Parallelize per-resolution work, rename/refactor duplicate logic, add sorted source list and optimized inclusion check
pkg/compactor/shard_aware_deduplicate_filter_test.go Rename benchmark, import assert, and add TestNewBlockWithSuccessors
Comments suppressed due to low confidence (2)

pkg/compactor/shard_aware_deduplicate_filter.go:37

  • If the same ShardAwareDeduplicateFilter instance is reused, f.duplicateIDs may grow across multiple calls; reset f.duplicateIDs (e.g., to nil) at the start of Filter.
	f.duplicateIDs = f.duplicateIDs[:0]

pkg/compactor/shard_aware_deduplicate_filter_test.go:463

  • [nitpick] Consider adding tests for blockWithSuccessors.isIncludedIn to verify the optimized early-exit path when the candidate 'other' block has fewer sources.
func TestNewBlockWithSuccessors(t *testing.T) {

Copy link
Contributor

@aknuds1 aknuds1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good AFAICT.

@pracucci pracucci force-pushed the optimise-ShardAwareDeduplicateFilter branch from 05db42f to a41d828 Compare June 23, 2025 13:50
Signed-off-by: Marco Pracucci <marco@pracucci.com>
@pracucci pracucci force-pushed the optimise-ShardAwareDeduplicateFilter branch from a41d828 to 7b64f43 Compare June 23, 2025 14:34
#### What this PR does

This adds a bloom filter optimization on top of
#11819

Benchmark compared to that branch:
```
$ benchstat -ignore .name shortcuts bloom-16-buckets
goos: darwin
goarch: arm64
pkg: github.com/grafana/mimir/pkg/compactor
cpu: Apple M3 Pro
                     │  shortcuts   │          bloom-16-buckets          │
                     │    sec/op    │   sec/op     vs base               │
*/Block-10/#00-12       26.85µ ± 6%   28.56µ ± 2%   +6.36% (p=0.015 n=6)
*/Block-10/#1-12       81.48µ ± 4%   84.76µ ± 0%   +4.03% (p=0.002 n=6)
*/Block-100/#00-12      365.6µ ± 2%   317.9µ ± 0%  -13.05% (p=0.002 n=6)
*/Block-100/#1-12     1091.5µ ± 1%   958.5µ ± 5%  -12.19% (p=0.002 n=6)
*/Block-1000/#00-12    15.122m ± 8%   6.804m ± 2%  -55.01% (p=0.002 n=6)
*/Block-1000/#1-12     46.47m ± 4%   20.80m ± 5%  -55.23% (p=0.002 n=6)
*/Block-10000/#00-12   2925.2m ± 5%   638.9m ± 1%  -78.16% (p=0.002 n=6)
*/Block-10000/#1-12     8.722 ± 2%    2.576 ± 2%  -70.47% (p=0.002 n=6)
geomean                 7.931m        4.512m       -43.11%
```

Benchmark compared to main:
```
$ benchstat -ignore .name main shortcuts bloom
goos: darwin
goarch: arm64
pkg: github.com/grafana/mimir/pkg/compactor
cpu: Apple M3 Pro
                     │     main      │              shortcuts              │               bloom                │
                     │    sec/op     │    sec/op     vs base               │   sec/op     vs base               │
*/Block-10/#00-12       17.30µ ±  2%    26.85µ ± 6%  +55.22% (p=0.002 n=6)   30.99µ ± 3%  +79.14% (p=0.002 n=6)
*/Block-10/#1-12       50.17µ ±  1%    81.48µ ± 4%  +62.41% (p=0.002 n=6)   90.49µ ± 2%  +80.36% (p=0.002 n=6)
*/Block-100/#00-12      378.8µ ±  0%    365.6µ ± 2%   -3.47% (p=0.002 n=6)   360.7µ ± 2%   -4.77% (p=0.002 n=6)
*/Block-100/#1-12      1.133m ±  1%    1.092m ± 1%   -3.67% (p=0.002 n=6)   1.086m ± 1%   -4.20% (p=0.002 n=6)
*/Block-1000/#00-12    28.244m ±  1%   15.122m ± 8%  -46.46% (p=0.002 n=6)   7.974m ± 2%  -71.77% (p=0.002 n=6)
*/Block-1000/#1-12     90.46m ± 13%    46.47m ± 4%  -48.64% (p=0.002 n=6)   24.95m ± 9%  -72.42% (p=0.002 n=6)
*/Block-10000/#00-12     7.936 ±  7%     2.925 ± 5%  -63.14% (p=0.002 n=6)    1.393 ± 3%  -82.44% (p=0.002 n=6)
*/Block-10000/#1-12    24.905 ±  9%     8.722 ± 2%  -64.98% (p=0.002 n=6)    4.256 ± 3%  -82.91% (p=0.002 n=6)
geomean                 10.82m          7.931m       -26.71%                 5.808m       -46.33%
```

<details>

<summary>Benchmark results with different buckets counts</summary>

I first tried with 256 buckets, and that was slower than 16:

```
$ benchstat -ignore .name shortcuts bloom-16-buckets bloom-256-buckets
goos: darwin
goarch: arm64
pkg: github.com/grafana/mimir/pkg/compactor
cpu: Apple M3 Pro
                     │  shortcuts   │          bloom-16-buckets          │          bloom-256-buckets          │
                     │    sec/op    │   sec/op     vs base               │    sec/op     vs base               │
*/Block-10/#00-12       26.85µ ± 6%   28.56µ ± 2%   +6.36% (p=0.015 n=6)    30.99µ ± 3%  +15.41% (p=0.002 n=6)
*/Block-10/#1-12       81.48µ ± 4%   84.76µ ± 0%   +4.03% (p=0.002 n=6)    90.49µ ± 2%  +11.05% (p=0.002 n=6)
*/Block-100/#00-12      365.6µ ± 2%   317.9µ ± 0%  -13.05% (p=0.002 n=6)    360.7µ ± 2%   -1.35% (p=0.015 n=6)
*/Block-100/#1-12     1091.5µ ± 1%   958.5µ ± 5%  -12.19% (p=0.002 n=6)   1085.5µ ± 1%        ~ (p=0.180 n=6)
*/Block-1000/#00-12    15.122m ± 8%   6.804m ± 2%  -55.01% (p=0.002 n=6)    7.974m ± 2%  -47.27% (p=0.002 n=6)
*/Block-1000/#1-12     46.47m ± 4%   20.80m ± 5%  -55.23% (p=0.002 n=6)    24.95m ± 9%  -46.31% (p=0.002 n=6)
*/Block-10000/#00-12   2925.2m ± 5%   638.9m ± 1%  -78.16% (p=0.002 n=6)   1393.5m ± 3%  -52.36% (p=0.002 n=6)
*/Block-10000/#1-12     8.722 ± 2%    2.576 ± 2%  -70.47% (p=0.002 n=6)     4.256 ± 3%  -51.21% (p=0.002 n=6)
geomean                 7.931m        4.512m       -43.11%                  5.808m       -26.76%
```

I also tried with a single uint, which gave me suprisingly similar
number:

```
$ benchstat -ignore .name shortcuts bloom-16-buckets bloom-single
goos: darwin
goarch: arm64
pkg: github.com/grafana/mimir/pkg/compactor
cpu: Apple M3 Pro
                     │  shortcuts   │          bloom-16-buckets          │            bloom-single            │
                     │    sec/op    │   sec/op     vs base               │   sec/op     vs base               │
*/Block-10/#00-12       26.85µ ± 6%   28.56µ ± 2%   +6.36% (p=0.015 n=6)   27.50µ ± 1%        ~ (p=0.180 n=6)
*/Block-10/#1-12       81.48µ ± 4%   84.76µ ± 0%   +4.03% (p=0.002 n=6)   82.25µ ± 4%        ~ (p=0.394 n=6)
*/Block-100/#00-12      365.6µ ± 2%   317.9µ ± 0%  -13.05% (p=0.002 n=6)   313.4µ ± 4%  -14.28% (p=0.002 n=6)
*/Block-100/#1-12     1091.5µ ± 1%   958.5µ ± 5%  -12.19% (p=0.002 n=6)   947.3µ ± 6%  -13.22% (p=0.002 n=6)
*/Block-1000/#00-12    15.122m ± 8%   6.804m ± 2%  -55.01% (p=0.002 n=6)   6.712m ± 2%  -55.61% (p=0.002 n=6)
*/Block-1000/#1-12     46.47m ± 4%   20.80m ± 5%  -55.23% (p=0.002 n=6)   20.34m ± 2%  -56.23% (p=0.002 n=6)
*/Block-10000/#00-12   2925.2m ± 5%   638.9m ± 1%  -78.16% (p=0.002 n=6)   621.9m ± 0%  -78.74% (p=0.002 n=6)
*/Block-10000/#1-12     8.722 ± 2%    2.576 ± 2%  -70.47% (p=0.002 n=6)    2.504 ± 2%  -71.29% (p=0.002 n=6)
geomean                 7.931m        4.512m       -43.11%                 4.409m       -44.41%
```

It seems that using 4 or 8 buckets doesn't speed up the result, but 32
slows it down, so I've chosen 16 buckets as the magic number for our
fake microbenchmark 🤷

```
$ benchstat -ignore .name bloom-16-buckets bloom-4-buckets bloom-8-buckets bloom-32-buckets
goos: darwin
goarch: arm64
pkg: github.com/grafana/mimir/pkg/compactor
cpu: Apple M3 Pro
                     │ bloom-16-buckets │          bloom-4-buckets          │          bloom-8-buckets          │          bloom-32-buckets           │
                     │      sec/op      │   sec/op     vs base              │   sec/op     vs base              │    sec/op     vs base               │
*/Block-10/#00-12           28.56µ ± 2%   28.38µ ± 7%       ~ (p=0.699 n=6)   28.54µ ± 1%       ~ (p=0.937 n=6)   28.47µ ±  2%        ~ (p=0.240 n=6)
*/Block-10/#1-12           84.76µ ± 0%   84.02µ ± 4%       ~ (p=0.132 n=6)   85.43µ ± 3%  +0.78% (p=0.041 n=6)   84.68µ ±  1%        ~ (p=0.937 n=6)
*/Block-100/#00-12          317.9µ ± 0%   321.5µ ± 3%       ~ (p=0.589 n=6)   319.2µ ± 4%       ~ (p=0.132 n=6)   319.1µ ±  4%   +0.37% (p=0.041 n=6)
*/Block-100/#1-12          958.5µ ± 5%   953.7µ ± 7%       ~ (p=0.394 n=6)   955.2µ ± 2%       ~ (p=0.310 n=6)   959.1µ ±  1%        ~ (p=0.818 n=6)
*/Block-1000/#00-12         6.804m ± 2%   6.721m ± 1%  -1.22% (p=0.002 n=6)   6.762m ± 1%  -0.61% (p=0.026 n=6)   7.022m ±  4%   +3.21% (p=0.002 n=6)
*/Block-1000/#1-12         20.80m ± 5%   20.54m ± 0%  -1.25% (p=0.002 n=6)   20.70m ± 1%       ~ (p=0.093 n=6)   21.72m ±  2%   +4.43% (p=0.041 n=6)
*/Block-10000/#00-12        638.9m ± 1%   633.8m ± 2%       ~ (p=0.093 n=6)   631.3m ± 1%  -1.18% (p=0.009 n=6)   703.4m ± 44%  +10.09% (p=0.002 n=6)
*/Block-10000/#1-12         2.576 ± 2%    2.543 ± 2%       ~ (p=0.180 n=6)    2.562 ± 2%       ~ (p=0.065 n=6)    2.832 ± 23%   +9.95% (p=0.002 n=6)
geomean                     4.512m        4.481m       -0.68%                 4.501m       -0.26%                 4.665m         +3.38%
```


</details>

#### Which issue(s) this PR fixes or relates to

Ref: #11819

---------

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Co-authored-by: Marco Pracucci <marco@pracucci.com>
pracucci added a commit that referenced this pull request Jun 23, 2025
#### What this PR does

This adds a bloom filter optimization on top of
#11819

Benchmark compared to that branch:
```
$ benchstat -ignore .name shortcuts bloom-16-buckets
goos: darwin
goarch: arm64
pkg: github.com/grafana/mimir/pkg/compactor
cpu: Apple M3 Pro
                     │  shortcuts   │          bloom-16-buckets          │
                     │    sec/op    │   sec/op     vs base               │
*/Block-10/#00-12       26.85µ ± 6%   28.56µ ± 2%   +6.36% (p=0.015 n=6)
*/Block-10/#1-12       81.48µ ± 4%   84.76µ ± 0%   +4.03% (p=0.002 n=6)
*/Block-100/#00-12      365.6µ ± 2%   317.9µ ± 0%  -13.05% (p=0.002 n=6)
*/Block-100/#1-12     1091.5µ ± 1%   958.5µ ± 5%  -12.19% (p=0.002 n=6)
*/Block-1000/#00-12    15.122m ± 8%   6.804m ± 2%  -55.01% (p=0.002 n=6)
*/Block-1000/#1-12     46.47m ± 4%   20.80m ± 5%  -55.23% (p=0.002 n=6)
*/Block-10000/#00-12   2925.2m ± 5%   638.9m ± 1%  -78.16% (p=0.002 n=6)
*/Block-10000/#1-12     8.722 ± 2%    2.576 ± 2%  -70.47% (p=0.002 n=6)
geomean                 7.931m        4.512m       -43.11%
```

Benchmark compared to main:
```
$ benchstat -ignore .name main shortcuts bloom
goos: darwin
goarch: arm64
pkg: github.com/grafana/mimir/pkg/compactor
cpu: Apple M3 Pro
                     │     main      │              shortcuts              │               bloom                │
                     │    sec/op     │    sec/op     vs base               │   sec/op     vs base               │
*/Block-10/#00-12       17.30µ ±  2%    26.85µ ± 6%  +55.22% (p=0.002 n=6)   30.99µ ± 3%  +79.14% (p=0.002 n=6)
*/Block-10/#1-12       50.17µ ±  1%    81.48µ ± 4%  +62.41% (p=0.002 n=6)   90.49µ ± 2%  +80.36% (p=0.002 n=6)
*/Block-100/#00-12      378.8µ ±  0%    365.6µ ± 2%   -3.47% (p=0.002 n=6)   360.7µ ± 2%   -4.77% (p=0.002 n=6)
*/Block-100/#1-12      1.133m ±  1%    1.092m ± 1%   -3.67% (p=0.002 n=6)   1.086m ± 1%   -4.20% (p=0.002 n=6)
*/Block-1000/#00-12    28.244m ±  1%   15.122m ± 8%  -46.46% (p=0.002 n=6)   7.974m ± 2%  -71.77% (p=0.002 n=6)
*/Block-1000/#1-12     90.46m ± 13%    46.47m ± 4%  -48.64% (p=0.002 n=6)   24.95m ± 9%  -72.42% (p=0.002 n=6)
*/Block-10000/#00-12     7.936 ±  7%     2.925 ± 5%  -63.14% (p=0.002 n=6)    1.393 ± 3%  -82.44% (p=0.002 n=6)
*/Block-10000/#1-12    24.905 ±  9%     8.722 ± 2%  -64.98% (p=0.002 n=6)    4.256 ± 3%  -82.91% (p=0.002 n=6)
geomean                 10.82m          7.931m       -26.71%                 5.808m       -46.33%
```

<details>

<summary>Benchmark results with different buckets counts</summary>

I first tried with 256 buckets, and that was slower than 16:

```
$ benchstat -ignore .name shortcuts bloom-16-buckets bloom-256-buckets
goos: darwin
goarch: arm64
pkg: github.com/grafana/mimir/pkg/compactor
cpu: Apple M3 Pro
                     │  shortcuts   │          bloom-16-buckets          │          bloom-256-buckets          │
                     │    sec/op    │   sec/op     vs base               │    sec/op     vs base               │
*/Block-10/#00-12       26.85µ ± 6%   28.56µ ± 2%   +6.36% (p=0.015 n=6)    30.99µ ± 3%  +15.41% (p=0.002 n=6)
*/Block-10/#1-12       81.48µ ± 4%   84.76µ ± 0%   +4.03% (p=0.002 n=6)    90.49µ ± 2%  +11.05% (p=0.002 n=6)
*/Block-100/#00-12      365.6µ ± 2%   317.9µ ± 0%  -13.05% (p=0.002 n=6)    360.7µ ± 2%   -1.35% (p=0.015 n=6)
*/Block-100/#1-12     1091.5µ ± 1%   958.5µ ± 5%  -12.19% (p=0.002 n=6)   1085.5µ ± 1%        ~ (p=0.180 n=6)
*/Block-1000/#00-12    15.122m ± 8%   6.804m ± 2%  -55.01% (p=0.002 n=6)    7.974m ± 2%  -47.27% (p=0.002 n=6)
*/Block-1000/#1-12     46.47m ± 4%   20.80m ± 5%  -55.23% (p=0.002 n=6)    24.95m ± 9%  -46.31% (p=0.002 n=6)
*/Block-10000/#00-12   2925.2m ± 5%   638.9m ± 1%  -78.16% (p=0.002 n=6)   1393.5m ± 3%  -52.36% (p=0.002 n=6)
*/Block-10000/#1-12     8.722 ± 2%    2.576 ± 2%  -70.47% (p=0.002 n=6)     4.256 ± 3%  -51.21% (p=0.002 n=6)
geomean                 7.931m        4.512m       -43.11%                  5.808m       -26.76%
```

I also tried with a single uint, which gave me suprisingly similar
number:

```
$ benchstat -ignore .name shortcuts bloom-16-buckets bloom-single
goos: darwin
goarch: arm64
pkg: github.com/grafana/mimir/pkg/compactor
cpu: Apple M3 Pro
                     │  shortcuts   │          bloom-16-buckets          │            bloom-single            │
                     │    sec/op    │   sec/op     vs base               │   sec/op     vs base               │
*/Block-10/#00-12       26.85µ ± 6%   28.56µ ± 2%   +6.36% (p=0.015 n=6)   27.50µ ± 1%        ~ (p=0.180 n=6)
*/Block-10/#1-12       81.48µ ± 4%   84.76µ ± 0%   +4.03% (p=0.002 n=6)   82.25µ ± 4%        ~ (p=0.394 n=6)
*/Block-100/#00-12      365.6µ ± 2%   317.9µ ± 0%  -13.05% (p=0.002 n=6)   313.4µ ± 4%  -14.28% (p=0.002 n=6)
*/Block-100/#1-12     1091.5µ ± 1%   958.5µ ± 5%  -12.19% (p=0.002 n=6)   947.3µ ± 6%  -13.22% (p=0.002 n=6)
*/Block-1000/#00-12    15.122m ± 8%   6.804m ± 2%  -55.01% (p=0.002 n=6)   6.712m ± 2%  -55.61% (p=0.002 n=6)
*/Block-1000/#1-12     46.47m ± 4%   20.80m ± 5%  -55.23% (p=0.002 n=6)   20.34m ± 2%  -56.23% (p=0.002 n=6)
*/Block-10000/#00-12   2925.2m ± 5%   638.9m ± 1%  -78.16% (p=0.002 n=6)   621.9m ± 0%  -78.74% (p=0.002 n=6)
*/Block-10000/#1-12     8.722 ± 2%    2.576 ± 2%  -70.47% (p=0.002 n=6)    2.504 ± 2%  -71.29% (p=0.002 n=6)
geomean                 7.931m        4.512m       -43.11%                 4.409m       -44.41%
```

It seems that using 4 or 8 buckets doesn't speed up the result, but 32
slows it down, so I've chosen 16 buckets as the magic number for our
fake microbenchmark 🤷

```
$ benchstat -ignore .name bloom-16-buckets bloom-4-buckets bloom-8-buckets bloom-32-buckets
goos: darwin
goarch: arm64
pkg: github.com/grafana/mimir/pkg/compactor
cpu: Apple M3 Pro
                     │ bloom-16-buckets │          bloom-4-buckets          │          bloom-8-buckets          │          bloom-32-buckets           │
                     │      sec/op      │   sec/op     vs base              │   sec/op     vs base              │    sec/op     vs base               │
*/Block-10/#00-12           28.56µ ± 2%   28.38µ ± 7%       ~ (p=0.699 n=6)   28.54µ ± 1%       ~ (p=0.937 n=6)   28.47µ ±  2%        ~ (p=0.240 n=6)
*/Block-10/#1-12           84.76µ ± 0%   84.02µ ± 4%       ~ (p=0.132 n=6)   85.43µ ± 3%  +0.78% (p=0.041 n=6)   84.68µ ±  1%        ~ (p=0.937 n=6)
*/Block-100/#00-12          317.9µ ± 0%   321.5µ ± 3%       ~ (p=0.589 n=6)   319.2µ ± 4%       ~ (p=0.132 n=6)   319.1µ ±  4%   +0.37% (p=0.041 n=6)
*/Block-100/#1-12          958.5µ ± 5%   953.7µ ± 7%       ~ (p=0.394 n=6)   955.2µ ± 2%       ~ (p=0.310 n=6)   959.1µ ±  1%        ~ (p=0.818 n=6)
*/Block-1000/#00-12         6.804m ± 2%   6.721m ± 1%  -1.22% (p=0.002 n=6)   6.762m ± 1%  -0.61% (p=0.026 n=6)   7.022m ±  4%   +3.21% (p=0.002 n=6)
*/Block-1000/#1-12         20.80m ± 5%   20.54m ± 0%  -1.25% (p=0.002 n=6)   20.70m ± 1%       ~ (p=0.093 n=6)   21.72m ±  2%   +4.43% (p=0.041 n=6)
*/Block-10000/#00-12        638.9m ± 1%   633.8m ± 2%       ~ (p=0.093 n=6)   631.3m ± 1%  -1.18% (p=0.009 n=6)   703.4m ± 44%  +10.09% (p=0.002 n=6)
*/Block-10000/#1-12         2.576 ± 2%    2.543 ± 2%       ~ (p=0.180 n=6)    2.562 ± 2%       ~ (p=0.065 n=6)    2.832 ± 23%   +9.95% (p=0.002 n=6)
geomean                     4.512m        4.481m       -0.68%                 4.501m       -0.26%                 4.665m         +3.38%
```


</details>

#### Which issue(s) this PR fixes or relates to

Ref: #11819

---------

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Co-authored-by: Marco Pracucci <marco@pracucci.com>
pracucci added 3 commits June 24, 2025 17:47
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
…oom filter

Signed-off-by: Marco Pracucci <marco@pracucci.com>
@pracucci pracucci requested review from colega and aknuds1 June 24, 2025 16:06
@pracucci pracucci marked this pull request as ready for review June 24, 2025 16:06
@pracucci pracucci requested a review from a team as a code owner June 24, 2025 16:06
Co-authored-by: Taylor C <41653732+tacole02@users.noreply.github.com>
@pracucci pracucci enabled auto-merge (squash) June 26, 2025 08:46
@pracucci pracucci merged commit 8bf6bd8 into main Jun 26, 2025
31 checks passed
@pracucci pracucci deleted the optimise-ShardAwareDeduplicateFilter branch June 26, 2025 09:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants