Optimise ShardAwareDeduplicateFilter #11819

pracucci · 2025-06-23T13:06:41Z

What this PR does

This PR has been be authored by @colega and myself.

We had an issue where a tenant compaction was lagging behind, and their number blocks reached the monster number of 250K (from a baseline of 10K). When that happened, we found out that the compactor planning was extremely slow (3 hours), and that the slow down came from ShardAwareDeduplicateFilter.

In this PR we introduce some small optimizations to ShardAwareDeduplicateFilter for tenants with a very large number of blocks. The impact on performance is not impressive, but we got a -33% reduction (from 3h30m to 2h15m) for the real world scenario.

Existing benchmark result:


goos: darwin
goarch: arm64
pkg: github.com/grafana/mimir/pkg/compactor
cpu: Apple M3 Pro
                                            │  before.txt   │ after-with-bloom-filter-and-without-list.txt │
                                            │    sec/op     │        sec/op          vs base               │
DeduplicateFilter_Filter/Block-10/#00-11       17.64µ ± 14%             25.23µ ± 3%  +43.04% (p=0.002 n=6)
DeduplicateFilter_Filter/Block-10/#01-11       53.33µ ±  1%             76.36µ ± 1%  +43.18% (p=0.002 n=6)
DeduplicateFilter_Filter/Block-100/#00-11      383.6µ ±  4%             294.2µ ± 1%  -23.30% (p=0.002 n=6)
DeduplicateFilter_Filter/Block-100/#01-11     1153.2µ ±  1%             882.1µ ± 1%  -23.51% (p=0.002 n=6)
DeduplicateFilter_Filter/Block-1000/#00-11    28.346m ±  1%             7.578m ± 6%  -73.26% (p=0.002 n=6)
DeduplicateFilter_Filter/Block-1000/#01-11     90.15m ±  1%             23.02m ± 1%  -74.46% (p=0.002 n=6)
DeduplicateFilter_Filter/Block-10000/#00-11     6.820 ±  5%              1.237 ± 1%  -81.86% (p=0.002 n=6)
DeduplicateFilter_Filter/Block-10000/#01-11    20.710 ±  2%              3.723 ± 1%  -82.02% (p=0.002 n=6)
geomean                                        10.52m                   5.018m       -52.31%

                                            │  before.txt  │ after-with-bloom-filter-and-without-list.txt │
                                            │     B/op     │         B/op           vs base               │
DeduplicateFilter_Filter/Block-10/#00-11      24.87Ki ± 0%            30.59Ki ± 0%  +23.00% (p=0.002 n=6)
DeduplicateFilter_Filter/Block-10/#01-11      74.61Ki ± 0%            91.76Ki ± 0%  +23.00% (p=0.002 n=6)
DeduplicateFilter_Filter/Block-100/#00-11     247.4Ki ± 0%            299.5Ki ± 0%  +21.07% (p=0.002 n=6)
DeduplicateFilter_Filter/Block-100/#01-11     742.6Ki ± 0%            898.9Ki ± 0%  +21.05% (p=0.002 n=6)
DeduplicateFilter_Filter/Block-1000/#00-11    2.509Mi ± 0%            2.971Mi ± 0%  +18.40% (p=0.002 n=6)
DeduplicateFilter_Filter/Block-1000/#01-11    7.955Mi ± 0%            9.033Mi ± 0%  +13.54% (p=0.002 n=6)
DeduplicateFilter_Filter/Block-10000/#00-11   48.80Mi ± 0%            58.87Mi ± 0%  +20.64% (p=0.002 n=6)
DeduplicateFilter_Filter/Block-10000/#01-11   146.4Mi ± 0%            176.6Mi ± 0%  +20.64% (p=0.002 n=6)
geomean                                       1.606Mi                 1.929Mi       +20.13%

                                            │ before.txt  │ after-with-bloom-filter-and-without-list.txt │
                                            │  allocs/op  │       allocs/op         vs base              │
DeduplicateFilter_Filter/Block-10/#00-11       63.00 ± 0%               64.00 ± 0%  +1.59% (p=0.002 n=6)
DeduplicateFilter_Filter/Block-10/#01-11       189.0 ± 0%               192.0 ± 0%  +1.59% (p=0.002 n=6)
DeduplicateFilter_Filter/Block-100/#00-11      522.0 ± 0%               523.0 ± 0%  +0.19% (p=0.002 n=6)
DeduplicateFilter_Filter/Block-100/#01-11     1.567k ± 0%              1.570k ± 0%  +0.19% (p=0.002 n=6)
DeduplicateFilter_Filter/Block-1000/#00-11    5.162k ± 0%              5.069k ± 0%  -1.80% (p=0.002 n=6)
DeduplicateFilter_Filter/Block-1000/#01-11    16.37k ± 0%              15.41k ± 0%  -5.83% (p=0.002 n=6)
DeduplicateFilter_Filter/Block-10000/#00-11   100.2k ± 0%              100.2k ± 0%  +0.00% (p=0.002 n=6)
DeduplicateFilter_Filter/Block-10000/#01-11   300.6k ± 0%              300.6k ± 0%  +0.00% (p=0.002 n=6)
geomean                                       3.542k                   3.523k       -0.53%

Which issue(s) this PR fixes or relates to

N/A

Checklist

Tests updated.
Documentation added.
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]. If changelog entry is not needed, please add the changelog-not-needed label to the PR.
about-versioning.md updated with experimental features.

Copilot

Pull Request Overview

This PR refactors the deduplication filter to run per-resolution checks in parallel, enhances blockWithSuccessors to maintain both a map and a sorted list of sources, and updates benchmarks and tests accordingly.

Parallelize duplicate-finding per resolution using concurrency.ForEachJob
Split findDuplicates into standalone findDuplicatedBlocks and update Filter
Store sources in both a map and a sorted list, optimize isIncludedIn, and add a test for newBlockWithSuccessors

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
pkg/compactor/shard_aware_deduplicate_filter.go	Parallelize per-resolution work, rename/refactor duplicate logic, add sorted source list and optimized inclusion check
pkg/compactor/shard_aware_deduplicate_filter_test.go	Rename benchmark, import `assert`, and add `TestNewBlockWithSuccessors`

Comments suppressed due to low confidence (2)

pkg/compactor/shard_aware_deduplicate_filter.go:37

If the same ShardAwareDeduplicateFilter instance is reused, f.duplicateIDs may grow across multiple calls; reset f.duplicateIDs (e.g., to nil) at the start of Filter.

	f.duplicateIDs = f.duplicateIDs[:0]

pkg/compactor/shard_aware_deduplicate_filter_test.go:463

[nitpick] Consider adding tests for blockWithSuccessors.isIncludedIn to verify the optimized early-exit path when the candidate 'other' block has fewer sources.

func TestNewBlockWithSuccessors(t *testing.T) {

pkg/compactor/shard_aware_deduplicate_filter.go

aknuds1

Looks good AFAICT.

pkg/compactor/shard_aware_deduplicate_filter.go

Signed-off-by: Marco Pracucci <marco@pracucci.com>

pkg/compactor/shard_aware_deduplicate_filter_test.go

#### What this PR does This adds a bloom filter optimization on top of #11819 Benchmark compared to that branch: ``` $ benchstat -ignore .name shortcuts bloom-16-buckets goos: darwin goarch: arm64 pkg: github.com/grafana/mimir/pkg/compactor cpu: Apple M3 Pro │ shortcuts │ bloom-16-buckets │ │ sec/op │ sec/op vs base │ */Block-10/#00-12 26.85µ ± 6% 28.56µ ± 2% +6.36% (p=0.015 n=6) */Block-10/#1-12 81.48µ ± 4% 84.76µ ± 0% +4.03% (p=0.002 n=6) */Block-100/#00-12 365.6µ ± 2% 317.9µ ± 0% -13.05% (p=0.002 n=6) */Block-100/#1-12 1091.5µ ± 1% 958.5µ ± 5% -12.19% (p=0.002 n=6) */Block-1000/#00-12 15.122m ± 8% 6.804m ± 2% -55.01% (p=0.002 n=6) */Block-1000/#1-12 46.47m ± 4% 20.80m ± 5% -55.23% (p=0.002 n=6) */Block-10000/#00-12 2925.2m ± 5% 638.9m ± 1% -78.16% (p=0.002 n=6) */Block-10000/#1-12 8.722 ± 2% 2.576 ± 2% -70.47% (p=0.002 n=6) geomean 7.931m 4.512m -43.11% ``` Benchmark compared to main: ``` $ benchstat -ignore .name main shortcuts bloom goos: darwin goarch: arm64 pkg: github.com/grafana/mimir/pkg/compactor cpu: Apple M3 Pro │ main │ shortcuts │ bloom │ │ sec/op │ sec/op vs base │ sec/op vs base │ */Block-10/#00-12 17.30µ ± 2% 26.85µ ± 6% +55.22% (p=0.002 n=6) 30.99µ ± 3% +79.14% (p=0.002 n=6) */Block-10/#1-12 50.17µ ± 1% 81.48µ ± 4% +62.41% (p=0.002 n=6) 90.49µ ± 2% +80.36% (p=0.002 n=6) */Block-100/#00-12 378.8µ ± 0% 365.6µ ± 2% -3.47% (p=0.002 n=6) 360.7µ ± 2% -4.77% (p=0.002 n=6) */Block-100/#1-12 1.133m ± 1% 1.092m ± 1% -3.67% (p=0.002 n=6) 1.086m ± 1% -4.20% (p=0.002 n=6) */Block-1000/#00-12 28.244m ± 1% 15.122m ± 8% -46.46% (p=0.002 n=6) 7.974m ± 2% -71.77% (p=0.002 n=6) */Block-1000/#1-12 90.46m ± 13% 46.47m ± 4% -48.64% (p=0.002 n=6) 24.95m ± 9% -72.42% (p=0.002 n=6) */Block-10000/#00-12 7.936 ± 7% 2.925 ± 5% -63.14% (p=0.002 n=6) 1.393 ± 3% -82.44% (p=0.002 n=6) */Block-10000/#1-12 24.905 ± 9% 8.722 ± 2% -64.98% (p=0.002 n=6) 4.256 ± 3% -82.91% (p=0.002 n=6) geomean 10.82m 7.931m -26.71% 5.808m -46.33% ``` <details> <summary>Benchmark results with different buckets counts</summary> I first tried with 256 buckets, and that was slower than 16: ``` $ benchstat -ignore .name shortcuts bloom-16-buckets bloom-256-buckets goos: darwin goarch: arm64 pkg: github.com/grafana/mimir/pkg/compactor cpu: Apple M3 Pro │ shortcuts │ bloom-16-buckets │ bloom-256-buckets │ │ sec/op │ sec/op vs base │ sec/op vs base │ */Block-10/#00-12 26.85µ ± 6% 28.56µ ± 2% +6.36% (p=0.015 n=6) 30.99µ ± 3% +15.41% (p=0.002 n=6) */Block-10/#1-12 81.48µ ± 4% 84.76µ ± 0% +4.03% (p=0.002 n=6) 90.49µ ± 2% +11.05% (p=0.002 n=6) */Block-100/#00-12 365.6µ ± 2% 317.9µ ± 0% -13.05% (p=0.002 n=6) 360.7µ ± 2% -1.35% (p=0.015 n=6) */Block-100/#1-12 1091.5µ ± 1% 958.5µ ± 5% -12.19% (p=0.002 n=6) 1085.5µ ± 1% ~ (p=0.180 n=6) */Block-1000/#00-12 15.122m ± 8% 6.804m ± 2% -55.01% (p=0.002 n=6) 7.974m ± 2% -47.27% (p=0.002 n=6) */Block-1000/#1-12 46.47m ± 4% 20.80m ± 5% -55.23% (p=0.002 n=6) 24.95m ± 9% -46.31% (p=0.002 n=6) */Block-10000/#00-12 2925.2m ± 5% 638.9m ± 1% -78.16% (p=0.002 n=6) 1393.5m ± 3% -52.36% (p=0.002 n=6) */Block-10000/#1-12 8.722 ± 2% 2.576 ± 2% -70.47% (p=0.002 n=6) 4.256 ± 3% -51.21% (p=0.002 n=6) geomean 7.931m 4.512m -43.11% 5.808m -26.76% ``` I also tried with a single uint, which gave me suprisingly similar number: ``` $ benchstat -ignore .name shortcuts bloom-16-buckets bloom-single goos: darwin goarch: arm64 pkg: github.com/grafana/mimir/pkg/compactor cpu: Apple M3 Pro │ shortcuts │ bloom-16-buckets │ bloom-single │ │ sec/op │ sec/op vs base │ sec/op vs base │ */Block-10/#00-12 26.85µ ± 6% 28.56µ ± 2% +6.36% (p=0.015 n=6) 27.50µ ± 1% ~ (p=0.180 n=6) */Block-10/#1-12 81.48µ ± 4% 84.76µ ± 0% +4.03% (p=0.002 n=6) 82.25µ ± 4% ~ (p=0.394 n=6) */Block-100/#00-12 365.6µ ± 2% 317.9µ ± 0% -13.05% (p=0.002 n=6) 313.4µ ± 4% -14.28% (p=0.002 n=6) */Block-100/#1-12 1091.5µ ± 1% 958.5µ ± 5% -12.19% (p=0.002 n=6) 947.3µ ± 6% -13.22% (p=0.002 n=6) */Block-1000/#00-12 15.122m ± 8% 6.804m ± 2% -55.01% (p=0.002 n=6) 6.712m ± 2% -55.61% (p=0.002 n=6) */Block-1000/#1-12 46.47m ± 4% 20.80m ± 5% -55.23% (p=0.002 n=6) 20.34m ± 2% -56.23% (p=0.002 n=6) */Block-10000/#00-12 2925.2m ± 5% 638.9m ± 1% -78.16% (p=0.002 n=6) 621.9m ± 0% -78.74% (p=0.002 n=6) */Block-10000/#1-12 8.722 ± 2% 2.576 ± 2% -70.47% (p=0.002 n=6) 2.504 ± 2% -71.29% (p=0.002 n=6) geomean 7.931m 4.512m -43.11% 4.409m -44.41% ``` It seems that using 4 or 8 buckets doesn't speed up the result, but 32 slows it down, so I've chosen 16 buckets as the magic number for our fake microbenchmark 🤷 ``` $ benchstat -ignore .name bloom-16-buckets bloom-4-buckets bloom-8-buckets bloom-32-buckets goos: darwin goarch: arm64 pkg: github.com/grafana/mimir/pkg/compactor cpu: Apple M3 Pro │ bloom-16-buckets │ bloom-4-buckets │ bloom-8-buckets │ bloom-32-buckets │ │ sec/op │ sec/op vs base │ sec/op vs base │ sec/op vs base │ */Block-10/#00-12 28.56µ ± 2% 28.38µ ± 7% ~ (p=0.699 n=6) 28.54µ ± 1% ~ (p=0.937 n=6) 28.47µ ± 2% ~ (p=0.240 n=6) */Block-10/#1-12 84.76µ ± 0% 84.02µ ± 4% ~ (p=0.132 n=6) 85.43µ ± 3% +0.78% (p=0.041 n=6) 84.68µ ± 1% ~ (p=0.937 n=6) */Block-100/#00-12 317.9µ ± 0% 321.5µ ± 3% ~ (p=0.589 n=6) 319.2µ ± 4% ~ (p=0.132 n=6) 319.1µ ± 4% +0.37% (p=0.041 n=6) */Block-100/#1-12 958.5µ ± 5% 953.7µ ± 7% ~ (p=0.394 n=6) 955.2µ ± 2% ~ (p=0.310 n=6) 959.1µ ± 1% ~ (p=0.818 n=6) */Block-1000/#00-12 6.804m ± 2% 6.721m ± 1% -1.22% (p=0.002 n=6) 6.762m ± 1% -0.61% (p=0.026 n=6) 7.022m ± 4% +3.21% (p=0.002 n=6) */Block-1000/#1-12 20.80m ± 5% 20.54m ± 0% -1.25% (p=0.002 n=6) 20.70m ± 1% ~ (p=0.093 n=6) 21.72m ± 2% +4.43% (p=0.041 n=6) */Block-10000/#00-12 638.9m ± 1% 633.8m ± 2% ~ (p=0.093 n=6) 631.3m ± 1% -1.18% (p=0.009 n=6) 703.4m ± 44% +10.09% (p=0.002 n=6) */Block-10000/#1-12 2.576 ± 2% 2.543 ± 2% ~ (p=0.180 n=6) 2.562 ± 2% ~ (p=0.065 n=6) 2.832 ± 23% +9.95% (p=0.002 n=6) geomean 4.512m 4.481m -0.68% 4.501m -0.26% 4.665m +3.38% ``` </details> #### Which issue(s) this PR fixes or relates to Ref: #11819 --------- Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> Signed-off-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Marco Pracucci <marco@pracucci.com>

Signed-off-by: Marco Pracucci <marco@pracucci.com>

…oom filter Signed-off-by: Marco Pracucci <marco@pracucci.com>

CHANGELOG.md

Co-authored-by: Taylor C <41653732+tacole02@users.noreply.github.com>

pracucci changed the title ~~Optimise shard aware deduplicate filter~~ Optimise ShardAwareDeduplicateFilter Jun 23, 2025

aknuds1 requested a review from Copilot June 23, 2025 13:25

Copilot AI reviewed Jun 23, 2025

View reviewed changes

pkg/compactor/shard_aware_deduplicate_filter.go Outdated Show resolved Hide resolved

pkg/compactor/shard_aware_deduplicate_filter.go Outdated Show resolved Hide resolved

aknuds1 approved these changes Jun 23, 2025

View reviewed changes

pkg/compactor/shard_aware_deduplicate_filter.go Outdated Show resolved Hide resolved

pracucci force-pushed the optimise-ShardAwareDeduplicateFilter branch from 05db42f to a41d828 Compare June 23, 2025 13:50

colega reviewed Jun 23, 2025

View reviewed changes

pkg/compactor/shard_aware_deduplicate_filter.go Outdated Show resolved Hide resolved

Optimise ShardAwareDeduplicateFilter

7b64f43

Signed-off-by: Marco Pracucci <marco@pracucci.com>

pracucci force-pushed the optimise-ShardAwareDeduplicateFilter branch from a41d828 to 7b64f43 Compare June 23, 2025 14:34

colega approved these changes Jun 23, 2025

View reviewed changes

colega reviewed Jun 23, 2025

View reviewed changes

pkg/compactor/shard_aware_deduplicate_filter_test.go Outdated Show resolved Hide resolved

colega mentioned this pull request Jun 23, 2025

Use a bloom filter in ShardAwareDeduplicateFilter #11827

Merged

pracucci added 3 commits June 24, 2025 17:47

Do not rename the benchmark

6868a11

Signed-off-by: Marco Pracucci <marco@pracucci.com>

Update CHANGELOG

bf1cc53

Signed-off-by: Marco Pracucci <marco@pracucci.com>

Removed sources list because doesn't given a real benefit with the bl…

9a84c0f

…oom filter Signed-off-by: Marco Pracucci <marco@pracucci.com>

pracucci requested review from colega and aknuds1 June 24, 2025 16:06

pracucci marked this pull request as ready for review June 24, 2025 16:06

pracucci requested a review from a team as a code owner June 24, 2025 16:06

tacole02 approved these changes Jun 24, 2025

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

Update CHANGELOG.md

749951a

Co-authored-by: Taylor C <41653732+tacole02@users.noreply.github.com>

colega approved these changes Jun 25, 2025

View reviewed changes

Merge branch 'main' into optimise-ShardAwareDeduplicateFilter

252825e

pracucci enabled auto-merge (squash) June 26, 2025 08:46

pracucci merged commit 8bf6bd8 into main Jun 26, 2025
31 checks passed

pracucci deleted the optimise-ShardAwareDeduplicateFilter branch June 26, 2025 09:42

philk mentioned this pull request Jul 14, 2025

store-gateway: Fix goroutine leak in LabelValues when using errgroup … #12048

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimise ShardAwareDeduplicateFilter #11819

Optimise ShardAwareDeduplicateFilter #11819

Uh oh!

pracucci commented Jun 23, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

aknuds1 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Optimise ShardAwareDeduplicateFilter #11819

Optimise ShardAwareDeduplicateFilter #11819

Uh oh!

Conversation

pracucci commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does

Which issue(s) this PR fixes or relates to

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

aknuds1 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pracucci commented Jun 23, 2025 •

edited

Loading