-
Notifications
You must be signed in to change notification settings - Fork 612
Optimise ShardAwareDeduplicateFilter #11819
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the deduplication filter to run per-resolution checks in parallel, enhances blockWithSuccessors
to maintain both a map and a sorted list of sources, and updates benchmarks and tests accordingly.
- Parallelize duplicate-finding per resolution using
concurrency.ForEachJob
- Split
findDuplicates
into standalonefindDuplicatedBlocks
and updateFilter
- Store sources in both a map and a sorted list, optimize
isIncludedIn
, and add a test fornewBlockWithSuccessors
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
File | Description |
---|---|
pkg/compactor/shard_aware_deduplicate_filter.go | Parallelize per-resolution work, rename/refactor duplicate logic, add sorted source list and optimized inclusion check |
pkg/compactor/shard_aware_deduplicate_filter_test.go | Rename benchmark, import assert , and add TestNewBlockWithSuccessors |
Comments suppressed due to low confidence (2)
pkg/compactor/shard_aware_deduplicate_filter.go:37
- If the same
ShardAwareDeduplicateFilter
instance is reused,f.duplicateIDs
may grow across multiple calls; resetf.duplicateIDs
(e.g., tonil
) at the start ofFilter
.
f.duplicateIDs = f.duplicateIDs[:0]
pkg/compactor/shard_aware_deduplicate_filter_test.go:463
- [nitpick] Consider adding tests for
blockWithSuccessors.isIncludedIn
to verify the optimized early-exit path when the candidate 'other' block has fewer sources.
func TestNewBlockWithSuccessors(t *testing.T) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good AFAICT.
05db42f
to
a41d828
Compare
Signed-off-by: Marco Pracucci <marco@pracucci.com>
a41d828
to
7b64f43
Compare
#### What this PR does This adds a bloom filter optimization on top of #11819 Benchmark compared to that branch: ``` $ benchstat -ignore .name shortcuts bloom-16-buckets goos: darwin goarch: arm64 pkg: github.com/grafana/mimir/pkg/compactor cpu: Apple M3 Pro │ shortcuts │ bloom-16-buckets │ │ sec/op │ sec/op vs base │ */Block-10/#00-12 26.85µ ± 6% 28.56µ ± 2% +6.36% (p=0.015 n=6) */Block-10/#1-12 81.48µ ± 4% 84.76µ ± 0% +4.03% (p=0.002 n=6) */Block-100/#00-12 365.6µ ± 2% 317.9µ ± 0% -13.05% (p=0.002 n=6) */Block-100/#1-12 1091.5µ ± 1% 958.5µ ± 5% -12.19% (p=0.002 n=6) */Block-1000/#00-12 15.122m ± 8% 6.804m ± 2% -55.01% (p=0.002 n=6) */Block-1000/#1-12 46.47m ± 4% 20.80m ± 5% -55.23% (p=0.002 n=6) */Block-10000/#00-12 2925.2m ± 5% 638.9m ± 1% -78.16% (p=0.002 n=6) */Block-10000/#1-12 8.722 ± 2% 2.576 ± 2% -70.47% (p=0.002 n=6) geomean 7.931m 4.512m -43.11% ``` Benchmark compared to main: ``` $ benchstat -ignore .name main shortcuts bloom goos: darwin goarch: arm64 pkg: github.com/grafana/mimir/pkg/compactor cpu: Apple M3 Pro │ main │ shortcuts │ bloom │ │ sec/op │ sec/op vs base │ sec/op vs base │ */Block-10/#00-12 17.30µ ± 2% 26.85µ ± 6% +55.22% (p=0.002 n=6) 30.99µ ± 3% +79.14% (p=0.002 n=6) */Block-10/#1-12 50.17µ ± 1% 81.48µ ± 4% +62.41% (p=0.002 n=6) 90.49µ ± 2% +80.36% (p=0.002 n=6) */Block-100/#00-12 378.8µ ± 0% 365.6µ ± 2% -3.47% (p=0.002 n=6) 360.7µ ± 2% -4.77% (p=0.002 n=6) */Block-100/#1-12 1.133m ± 1% 1.092m ± 1% -3.67% (p=0.002 n=6) 1.086m ± 1% -4.20% (p=0.002 n=6) */Block-1000/#00-12 28.244m ± 1% 15.122m ± 8% -46.46% (p=0.002 n=6) 7.974m ± 2% -71.77% (p=0.002 n=6) */Block-1000/#1-12 90.46m ± 13% 46.47m ± 4% -48.64% (p=0.002 n=6) 24.95m ± 9% -72.42% (p=0.002 n=6) */Block-10000/#00-12 7.936 ± 7% 2.925 ± 5% -63.14% (p=0.002 n=6) 1.393 ± 3% -82.44% (p=0.002 n=6) */Block-10000/#1-12 24.905 ± 9% 8.722 ± 2% -64.98% (p=0.002 n=6) 4.256 ± 3% -82.91% (p=0.002 n=6) geomean 10.82m 7.931m -26.71% 5.808m -46.33% ``` <details> <summary>Benchmark results with different buckets counts</summary> I first tried with 256 buckets, and that was slower than 16: ``` $ benchstat -ignore .name shortcuts bloom-16-buckets bloom-256-buckets goos: darwin goarch: arm64 pkg: github.com/grafana/mimir/pkg/compactor cpu: Apple M3 Pro │ shortcuts │ bloom-16-buckets │ bloom-256-buckets │ │ sec/op │ sec/op vs base │ sec/op vs base │ */Block-10/#00-12 26.85µ ± 6% 28.56µ ± 2% +6.36% (p=0.015 n=6) 30.99µ ± 3% +15.41% (p=0.002 n=6) */Block-10/#1-12 81.48µ ± 4% 84.76µ ± 0% +4.03% (p=0.002 n=6) 90.49µ ± 2% +11.05% (p=0.002 n=6) */Block-100/#00-12 365.6µ ± 2% 317.9µ ± 0% -13.05% (p=0.002 n=6) 360.7µ ± 2% -1.35% (p=0.015 n=6) */Block-100/#1-12 1091.5µ ± 1% 958.5µ ± 5% -12.19% (p=0.002 n=6) 1085.5µ ± 1% ~ (p=0.180 n=6) */Block-1000/#00-12 15.122m ± 8% 6.804m ± 2% -55.01% (p=0.002 n=6) 7.974m ± 2% -47.27% (p=0.002 n=6) */Block-1000/#1-12 46.47m ± 4% 20.80m ± 5% -55.23% (p=0.002 n=6) 24.95m ± 9% -46.31% (p=0.002 n=6) */Block-10000/#00-12 2925.2m ± 5% 638.9m ± 1% -78.16% (p=0.002 n=6) 1393.5m ± 3% -52.36% (p=0.002 n=6) */Block-10000/#1-12 8.722 ± 2% 2.576 ± 2% -70.47% (p=0.002 n=6) 4.256 ± 3% -51.21% (p=0.002 n=6) geomean 7.931m 4.512m -43.11% 5.808m -26.76% ``` I also tried with a single uint, which gave me suprisingly similar number: ``` $ benchstat -ignore .name shortcuts bloom-16-buckets bloom-single goos: darwin goarch: arm64 pkg: github.com/grafana/mimir/pkg/compactor cpu: Apple M3 Pro │ shortcuts │ bloom-16-buckets │ bloom-single │ │ sec/op │ sec/op vs base │ sec/op vs base │ */Block-10/#00-12 26.85µ ± 6% 28.56µ ± 2% +6.36% (p=0.015 n=6) 27.50µ ± 1% ~ (p=0.180 n=6) */Block-10/#1-12 81.48µ ± 4% 84.76µ ± 0% +4.03% (p=0.002 n=6) 82.25µ ± 4% ~ (p=0.394 n=6) */Block-100/#00-12 365.6µ ± 2% 317.9µ ± 0% -13.05% (p=0.002 n=6) 313.4µ ± 4% -14.28% (p=0.002 n=6) */Block-100/#1-12 1091.5µ ± 1% 958.5µ ± 5% -12.19% (p=0.002 n=6) 947.3µ ± 6% -13.22% (p=0.002 n=6) */Block-1000/#00-12 15.122m ± 8% 6.804m ± 2% -55.01% (p=0.002 n=6) 6.712m ± 2% -55.61% (p=0.002 n=6) */Block-1000/#1-12 46.47m ± 4% 20.80m ± 5% -55.23% (p=0.002 n=6) 20.34m ± 2% -56.23% (p=0.002 n=6) */Block-10000/#00-12 2925.2m ± 5% 638.9m ± 1% -78.16% (p=0.002 n=6) 621.9m ± 0% -78.74% (p=0.002 n=6) */Block-10000/#1-12 8.722 ± 2% 2.576 ± 2% -70.47% (p=0.002 n=6) 2.504 ± 2% -71.29% (p=0.002 n=6) geomean 7.931m 4.512m -43.11% 4.409m -44.41% ``` It seems that using 4 or 8 buckets doesn't speed up the result, but 32 slows it down, so I've chosen 16 buckets as the magic number for our fake microbenchmark 🤷 ``` $ benchstat -ignore .name bloom-16-buckets bloom-4-buckets bloom-8-buckets bloom-32-buckets goos: darwin goarch: arm64 pkg: github.com/grafana/mimir/pkg/compactor cpu: Apple M3 Pro │ bloom-16-buckets │ bloom-4-buckets │ bloom-8-buckets │ bloom-32-buckets │ │ sec/op │ sec/op vs base │ sec/op vs base │ sec/op vs base │ */Block-10/#00-12 28.56µ ± 2% 28.38µ ± 7% ~ (p=0.699 n=6) 28.54µ ± 1% ~ (p=0.937 n=6) 28.47µ ± 2% ~ (p=0.240 n=6) */Block-10/#1-12 84.76µ ± 0% 84.02µ ± 4% ~ (p=0.132 n=6) 85.43µ ± 3% +0.78% (p=0.041 n=6) 84.68µ ± 1% ~ (p=0.937 n=6) */Block-100/#00-12 317.9µ ± 0% 321.5µ ± 3% ~ (p=0.589 n=6) 319.2µ ± 4% ~ (p=0.132 n=6) 319.1µ ± 4% +0.37% (p=0.041 n=6) */Block-100/#1-12 958.5µ ± 5% 953.7µ ± 7% ~ (p=0.394 n=6) 955.2µ ± 2% ~ (p=0.310 n=6) 959.1µ ± 1% ~ (p=0.818 n=6) */Block-1000/#00-12 6.804m ± 2% 6.721m ± 1% -1.22% (p=0.002 n=6) 6.762m ± 1% -0.61% (p=0.026 n=6) 7.022m ± 4% +3.21% (p=0.002 n=6) */Block-1000/#1-12 20.80m ± 5% 20.54m ± 0% -1.25% (p=0.002 n=6) 20.70m ± 1% ~ (p=0.093 n=6) 21.72m ± 2% +4.43% (p=0.041 n=6) */Block-10000/#00-12 638.9m ± 1% 633.8m ± 2% ~ (p=0.093 n=6) 631.3m ± 1% -1.18% (p=0.009 n=6) 703.4m ± 44% +10.09% (p=0.002 n=6) */Block-10000/#1-12 2.576 ± 2% 2.543 ± 2% ~ (p=0.180 n=6) 2.562 ± 2% ~ (p=0.065 n=6) 2.832 ± 23% +9.95% (p=0.002 n=6) geomean 4.512m 4.481m -0.68% 4.501m -0.26% 4.665m +3.38% ``` </details> #### Which issue(s) this PR fixes or relates to Ref: #11819 --------- Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> Signed-off-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Marco Pracucci <marco@pracucci.com>
#### What this PR does This adds a bloom filter optimization on top of #11819 Benchmark compared to that branch: ``` $ benchstat -ignore .name shortcuts bloom-16-buckets goos: darwin goarch: arm64 pkg: github.com/grafana/mimir/pkg/compactor cpu: Apple M3 Pro │ shortcuts │ bloom-16-buckets │ │ sec/op │ sec/op vs base │ */Block-10/#00-12 26.85µ ± 6% 28.56µ ± 2% +6.36% (p=0.015 n=6) */Block-10/#1-12 81.48µ ± 4% 84.76µ ± 0% +4.03% (p=0.002 n=6) */Block-100/#00-12 365.6µ ± 2% 317.9µ ± 0% -13.05% (p=0.002 n=6) */Block-100/#1-12 1091.5µ ± 1% 958.5µ ± 5% -12.19% (p=0.002 n=6) */Block-1000/#00-12 15.122m ± 8% 6.804m ± 2% -55.01% (p=0.002 n=6) */Block-1000/#1-12 46.47m ± 4% 20.80m ± 5% -55.23% (p=0.002 n=6) */Block-10000/#00-12 2925.2m ± 5% 638.9m ± 1% -78.16% (p=0.002 n=6) */Block-10000/#1-12 8.722 ± 2% 2.576 ± 2% -70.47% (p=0.002 n=6) geomean 7.931m 4.512m -43.11% ``` Benchmark compared to main: ``` $ benchstat -ignore .name main shortcuts bloom goos: darwin goarch: arm64 pkg: github.com/grafana/mimir/pkg/compactor cpu: Apple M3 Pro │ main │ shortcuts │ bloom │ │ sec/op │ sec/op vs base │ sec/op vs base │ */Block-10/#00-12 17.30µ ± 2% 26.85µ ± 6% +55.22% (p=0.002 n=6) 30.99µ ± 3% +79.14% (p=0.002 n=6) */Block-10/#1-12 50.17µ ± 1% 81.48µ ± 4% +62.41% (p=0.002 n=6) 90.49µ ± 2% +80.36% (p=0.002 n=6) */Block-100/#00-12 378.8µ ± 0% 365.6µ ± 2% -3.47% (p=0.002 n=6) 360.7µ ± 2% -4.77% (p=0.002 n=6) */Block-100/#1-12 1.133m ± 1% 1.092m ± 1% -3.67% (p=0.002 n=6) 1.086m ± 1% -4.20% (p=0.002 n=6) */Block-1000/#00-12 28.244m ± 1% 15.122m ± 8% -46.46% (p=0.002 n=6) 7.974m ± 2% -71.77% (p=0.002 n=6) */Block-1000/#1-12 90.46m ± 13% 46.47m ± 4% -48.64% (p=0.002 n=6) 24.95m ± 9% -72.42% (p=0.002 n=6) */Block-10000/#00-12 7.936 ± 7% 2.925 ± 5% -63.14% (p=0.002 n=6) 1.393 ± 3% -82.44% (p=0.002 n=6) */Block-10000/#1-12 24.905 ± 9% 8.722 ± 2% -64.98% (p=0.002 n=6) 4.256 ± 3% -82.91% (p=0.002 n=6) geomean 10.82m 7.931m -26.71% 5.808m -46.33% ``` <details> <summary>Benchmark results with different buckets counts</summary> I first tried with 256 buckets, and that was slower than 16: ``` $ benchstat -ignore .name shortcuts bloom-16-buckets bloom-256-buckets goos: darwin goarch: arm64 pkg: github.com/grafana/mimir/pkg/compactor cpu: Apple M3 Pro │ shortcuts │ bloom-16-buckets │ bloom-256-buckets │ │ sec/op │ sec/op vs base │ sec/op vs base │ */Block-10/#00-12 26.85µ ± 6% 28.56µ ± 2% +6.36% (p=0.015 n=6) 30.99µ ± 3% +15.41% (p=0.002 n=6) */Block-10/#1-12 81.48µ ± 4% 84.76µ ± 0% +4.03% (p=0.002 n=6) 90.49µ ± 2% +11.05% (p=0.002 n=6) */Block-100/#00-12 365.6µ ± 2% 317.9µ ± 0% -13.05% (p=0.002 n=6) 360.7µ ± 2% -1.35% (p=0.015 n=6) */Block-100/#1-12 1091.5µ ± 1% 958.5µ ± 5% -12.19% (p=0.002 n=6) 1085.5µ ± 1% ~ (p=0.180 n=6) */Block-1000/#00-12 15.122m ± 8% 6.804m ± 2% -55.01% (p=0.002 n=6) 7.974m ± 2% -47.27% (p=0.002 n=6) */Block-1000/#1-12 46.47m ± 4% 20.80m ± 5% -55.23% (p=0.002 n=6) 24.95m ± 9% -46.31% (p=0.002 n=6) */Block-10000/#00-12 2925.2m ± 5% 638.9m ± 1% -78.16% (p=0.002 n=6) 1393.5m ± 3% -52.36% (p=0.002 n=6) */Block-10000/#1-12 8.722 ± 2% 2.576 ± 2% -70.47% (p=0.002 n=6) 4.256 ± 3% -51.21% (p=0.002 n=6) geomean 7.931m 4.512m -43.11% 5.808m -26.76% ``` I also tried with a single uint, which gave me suprisingly similar number: ``` $ benchstat -ignore .name shortcuts bloom-16-buckets bloom-single goos: darwin goarch: arm64 pkg: github.com/grafana/mimir/pkg/compactor cpu: Apple M3 Pro │ shortcuts │ bloom-16-buckets │ bloom-single │ │ sec/op │ sec/op vs base │ sec/op vs base │ */Block-10/#00-12 26.85µ ± 6% 28.56µ ± 2% +6.36% (p=0.015 n=6) 27.50µ ± 1% ~ (p=0.180 n=6) */Block-10/#1-12 81.48µ ± 4% 84.76µ ± 0% +4.03% (p=0.002 n=6) 82.25µ ± 4% ~ (p=0.394 n=6) */Block-100/#00-12 365.6µ ± 2% 317.9µ ± 0% -13.05% (p=0.002 n=6) 313.4µ ± 4% -14.28% (p=0.002 n=6) */Block-100/#1-12 1091.5µ ± 1% 958.5µ ± 5% -12.19% (p=0.002 n=6) 947.3µ ± 6% -13.22% (p=0.002 n=6) */Block-1000/#00-12 15.122m ± 8% 6.804m ± 2% -55.01% (p=0.002 n=6) 6.712m ± 2% -55.61% (p=0.002 n=6) */Block-1000/#1-12 46.47m ± 4% 20.80m ± 5% -55.23% (p=0.002 n=6) 20.34m ± 2% -56.23% (p=0.002 n=6) */Block-10000/#00-12 2925.2m ± 5% 638.9m ± 1% -78.16% (p=0.002 n=6) 621.9m ± 0% -78.74% (p=0.002 n=6) */Block-10000/#1-12 8.722 ± 2% 2.576 ± 2% -70.47% (p=0.002 n=6) 2.504 ± 2% -71.29% (p=0.002 n=6) geomean 7.931m 4.512m -43.11% 4.409m -44.41% ``` It seems that using 4 or 8 buckets doesn't speed up the result, but 32 slows it down, so I've chosen 16 buckets as the magic number for our fake microbenchmark 🤷 ``` $ benchstat -ignore .name bloom-16-buckets bloom-4-buckets bloom-8-buckets bloom-32-buckets goos: darwin goarch: arm64 pkg: github.com/grafana/mimir/pkg/compactor cpu: Apple M3 Pro │ bloom-16-buckets │ bloom-4-buckets │ bloom-8-buckets │ bloom-32-buckets │ │ sec/op │ sec/op vs base │ sec/op vs base │ sec/op vs base │ */Block-10/#00-12 28.56µ ± 2% 28.38µ ± 7% ~ (p=0.699 n=6) 28.54µ ± 1% ~ (p=0.937 n=6) 28.47µ ± 2% ~ (p=0.240 n=6) */Block-10/#1-12 84.76µ ± 0% 84.02µ ± 4% ~ (p=0.132 n=6) 85.43µ ± 3% +0.78% (p=0.041 n=6) 84.68µ ± 1% ~ (p=0.937 n=6) */Block-100/#00-12 317.9µ ± 0% 321.5µ ± 3% ~ (p=0.589 n=6) 319.2µ ± 4% ~ (p=0.132 n=6) 319.1µ ± 4% +0.37% (p=0.041 n=6) */Block-100/#1-12 958.5µ ± 5% 953.7µ ± 7% ~ (p=0.394 n=6) 955.2µ ± 2% ~ (p=0.310 n=6) 959.1µ ± 1% ~ (p=0.818 n=6) */Block-1000/#00-12 6.804m ± 2% 6.721m ± 1% -1.22% (p=0.002 n=6) 6.762m ± 1% -0.61% (p=0.026 n=6) 7.022m ± 4% +3.21% (p=0.002 n=6) */Block-1000/#1-12 20.80m ± 5% 20.54m ± 0% -1.25% (p=0.002 n=6) 20.70m ± 1% ~ (p=0.093 n=6) 21.72m ± 2% +4.43% (p=0.041 n=6) */Block-10000/#00-12 638.9m ± 1% 633.8m ± 2% ~ (p=0.093 n=6) 631.3m ± 1% -1.18% (p=0.009 n=6) 703.4m ± 44% +10.09% (p=0.002 n=6) */Block-10000/#1-12 2.576 ± 2% 2.543 ± 2% ~ (p=0.180 n=6) 2.562 ± 2% ~ (p=0.065 n=6) 2.832 ± 23% +9.95% (p=0.002 n=6) geomean 4.512m 4.481m -0.68% 4.501m -0.26% 4.665m +3.38% ``` </details> #### Which issue(s) this PR fixes or relates to Ref: #11819 --------- Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> Signed-off-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
…oom filter Signed-off-by: Marco Pracucci <marco@pracucci.com>
Co-authored-by: Taylor C <41653732+tacole02@users.noreply.github.com>
What this PR does
This PR has been be authored by @colega and myself.
We had an issue where a tenant compaction was lagging behind, and their number blocks reached the monster number of 250K (from a baseline of 10K). When that happened, we found out that the compactor planning was extremely slow (3 hours), and that the slow down came from
ShardAwareDeduplicateFilter
.In this PR we introduce some small optimizations to
ShardAwareDeduplicateFilter
for tenants with a very large number of blocks. The impact on performance is not impressive, but we got a -33% reduction (from 3h30m to 2h15m) for the real world scenario.Existing benchmark result:
Which issue(s) this PR fixes or relates to
N/A
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]
. If changelog entry is not needed, please add thechangelog-not-needed
label to the PR.about-versioning.md
updated with experimental features.