[SPARK-56548][CORE] Replace modulo with bitmask in BloomFilter hot paths#55423
Closed
LuciferYang wants to merge 8 commits intoapache:masterfrom
Closed
[SPARK-56548][CORE] Replace modulo with bitmask in BloomFilter hot paths#55423LuciferYang wants to merge 8 commits intoapache:masterfrom
LuciferYang wants to merge 8 commits intoapache:masterfrom
Conversation
…(JDK 17, Scala 2.13, split 1 of 1)
…(JDK 17, Scala 2.13, split 1 of 1)
…(JDK 21, Scala 2.13, split 1 of 1)
…(JDK 25, Scala 2.13, split 1 of 1)
BitArray now rounds numWords up to the next power of 2 for bitmask optimization, so requesting 64*5=320 bits allocates 8 words (512 bits). Update test expectations accordingly.
LuciferYang
commented
Apr 20, 2026
|
|
||
| BloomFilter filter3 = df.stat().bloomFilter("id", 1000, 64 * 5); | ||
| Assertions.assertEquals(64 * 5, filter3.bitSize()); | ||
| Assertions.assertEquals(64 * 8, filter3.bitSize()); |
Contributor
Author
There was a problem hiding this comment.
This is a breaking change to a public API, so I need to reconsider the feasibility of this PR.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Replace
combinedHash % bitSizewithcombinedHash & (bitSize - 1)in theBloomFilterput/mightContain hot path. The bitmask trick requiresbitSizeto be a power of two, soBitArray's public constructor now rounds the word count up to the next power of two.Concretely:
BitArrayconstructor — roundsnumWordsup to the next power of two viaroundUpToPowerOfTwo. A precomputedbitMaskfield (bitSize - 1for power-of-two,-1Lotherwise) letsindexFor(long hash)pick the fast path (hash & bitMask) or fall back to modulo.BloomFilterImpl/BloomFilterImplV2— callbits.indexFor(combinedHash)instead of inline% bitSize. The existingcombinedHash < 0 ? ~combinedHash : combinedHashsign-normalization stays at the call site so that old readers still compute the same bit indices.Legacy deserialization —
BitArray(long[])(used byreadFrom) does not re-round the word count, so deserialized filters keep their original size and fall back to modulo viabitMask == -1L.Overflow guard: word counts above
2^30are left unrounded so that doubling cannot overflowint.Why are the changes needed?
On x86-64 a 64-bit integer division takes ~20-35 cycles; a bitwise AND takes one. With the default FPP the filter uses 7 hash functions, so every
putormightContaincall pays for 7 modulos. That cost adds up on the probe side of runtime bloom filter joins, wheremightContainruns once per row.SparkBloomFilterBenchmarkon GHA (AMD EPYC 7763, Linux 6.17, JDK 17, ns/row):JDK 21 and JDK 25 results (included in the PR) show the same pattern. Gains are larger on smaller filters where modulo dominates per-item cost, and taper off at 1M items where cache misses take over — still 13-23% there.
Does this PR introduce any user-facing change?
Yes —
BloomFilter.bitSize()(public abstractino.a.s.util.sketch.BloomFilter) may now return a value larger than thenumBitspassed toBloomFilter.create(expectedNumItems, numBits), because the underlying word count is rounded up to the next power of two.Example:
BloomFilter.create(1000, 320)used to givebitSize() == 320(5 words × 64), now givesbitSize() == 512(8 words × 64, since 5 rounds up to 8).What this means in practice:
numHashFunctions— still computed from the caller-suppliednumBits, not the actualbitSize. Slightly sub-optimal for the rounded-up array, but the filter stays correct and the difference is marginal.mergeInPlace/intersectInPlace—isCompatible()requiresbitSize()equality, so a filter built by the new code (512 bits) cannot merge with one built by the old code (320 bits) using the same parameters. Filters built by the same Spark version are always compatible. This only matters if filters are exchanged across Spark versions at runtime, which is not a typical use case.writeTostores the actual (rounded) word count;readFromrestores it verbatim without re-rounding. Old Spark can read new filters and vice versa, since for non-negative hashes on a power-of-two size,hash & (bitSize - 1)andhash % bitSizeproduce the same result.How was this patch tested?
BitArraySuite:roundUpToPowerOfTwoedge cases (including overflow guard at2^30 + 1andInteger.MAX_VALUE), fast-path vs. fallbackindexFor, and a serialize-deserialize round-trip for a legacy non-power-of-2 array.bitSizeassertions inDataFrameStatSuite,JavaDataFrameSuite, andClientDataFrameStatSuiteto expect the rounded-up value.SparkBloomFilterBenchmarkre-run on GHA for JDK 17 / 21 / 25; updated result files included.Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Opus 4.7