[SPARK-56548][CORE] Replace modulo with bitmask in BloomFilter hot paths by LuciferYang · Pull Request #55423 · apache/spark

LuciferYang · 2026-04-20T07:34:09Z

What changes were proposed in this pull request?

Replace combinedHash % bitSize with combinedHash & (bitSize - 1) in the BloomFilter put/mightContain hot path. The bitmask trick requires bitSize to be a power of two, so BitArray's public constructor now rounds the word count up to the next power of two.

Concretely:

BitArray constructor — rounds numWords up to the next power of two via roundUpToPowerOfTwo. A precomputed bitMask field (bitSize - 1 for power-of-two, -1L otherwise) lets indexFor(long hash) pick the fast path (hash & bitMask) or fall back to modulo.
BloomFilterImpl / BloomFilterImplV2 — call bits.indexFor(combinedHash) instead of inline % bitSize. The existing combinedHash < 0 ? ~combinedHash : combinedHash sign-normalization stays at the call site so that old readers still compute the same bit indices.
Legacy deserialization — BitArray(long[]) (used by readFrom) does not re-round the word count, so deserialized filters keep their original size and fall back to modulo via bitMask == -1L.

Overflow guard: word counts above 2^30 are left unrounded so that doubling cannot overflow int.

Why are the changes needed?

On x86-64 a 64-bit integer division takes ~20-35 cycles; a bitwise AND takes one. With the default FPP the filter uses 7 hash functions, so every put or mightContain call pays for 7 modulos. That cost adds up on the probe side of runtime bloom filter joins, where mightContain runs once per row.

SparkBloomFilterBenchmark on GHA (AMD EPYC 7763, Linux 6.17, JDK 17, ns/row):

Workload	V1 before → after	V1 Δ	V2 before → after	V2 Δ
Put — 10K	46.6 → 31.3	−33%	52.9 → 34.8	−34%
Put — 100K	49.1 → 37.7	−23%	57.0 → 40.1	−30%
Put — 1M	54.7 → 46.2	−16%	63.1 → 48.6	−23%
MightContain — 10K, 50% hit	28.1 → 19.6	−30%	29.8 → 22.6	−24%
MightContain — 100K, 50% hit	30.8 → 23.5	−24%	34.3 → 26.8	−22%
MightContain — 1M, 50% hit	35.4 → 30.3	−14%	39.1 → 33.8	−14%

JDK 21 and JDK 25 results (included in the PR) show the same pattern. Gains are larger on smaller filters where modulo dominates per-item cost, and taper off at 1M items where cache misses take over — still 13-23% there.

Does this PR introduce any user-facing change?

Yes — BloomFilter.bitSize() (public abstract in o.a.s.util.sketch.BloomFilter) may now return a value larger than the numBits passed to BloomFilter.create(expectedNumItems, numBits), because the underlying word count is rounded up to the next power of two.

Example: BloomFilter.create(1000, 320) used to give bitSize() == 320 (5 words × 64), now gives bitSize() == 512 (8 words × 64, since 5 rounds up to 8).

What this means in practice:

FPP — slightly better (lower) than requested, because there are more bits for the same number of insertions. Correctness is preserved.
numHashFunctions — still computed from the caller-supplied numBits, not the actual bitSize. Slightly sub-optimal for the rounded-up array, but the filter stays correct and the difference is marginal.
Cross-version mergeInPlace / intersectInPlace — isCompatible() requires bitSize() equality, so a filter built by the new code (512 bits) cannot merge with one built by the old code (320 bits) using the same parameters. Filters built by the same Spark version are always compatible. This only matters if filters are exchanged across Spark versions at runtime, which is not a typical use case.
Serialization — format is unchanged. writeTo stores the actual (rounded) word count; readFrom restores it verbatim without re-rounding. Old Spark can read new filters and vice versa, since for non-negative hashes on a power-of-two size, hash & (bitSize - 1) and hash % bitSize produce the same result.
Memory — worst case ~2× for small filters (e.g., 3 words → 4). Negligible at typical filter sizes.

How was this patch tested?

Five new cases in BitArraySuite: roundUpToPowerOfTwo edge cases (including overflow guard at 2^30 + 1 and Integer.MAX_VALUE), fast-path vs. fallback indexFor, and a serialize-deserialize round-trip for a legacy non-power-of-2 array.
Updated bitSize assertions in DataFrameStatSuite, JavaDataFrameSuite, and ClientDataFrameStatSuite to expect the rounded-up value.
SparkBloomFilterBenchmark re-run on GHA for JDK 17 / 21 / 25; updated result files included.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.7

…(JDK 17, Scala 2.13, split 1 of 1)

…(JDK 21, Scala 2.13, split 1 of 1)

…(JDK 25, Scala 2.13, split 1 of 1)

BitArray now rounds numWords up to the next power of 2 for bitmask optimization, so requesting 64*5=320 bits allocates 8 words (512 bits). Update test expectations accordingly.

LuciferYang · 2026-04-20T16:00:57Z


    BloomFilter filter3 = df.stat().bloomFilter("id", 1000, 64 * 5);
-    Assertions.assertEquals(64 * 5, filter3.bitSize());
+    Assertions.assertEquals(64 * 8, filter3.bitSize());


This is a breaking change to a public API, so I need to reconsider the feasibility of this PR.

LuciferYang and others added 6 commits April 17, 2026 18:30

init

80e5498

fix

36eefcf

Benchmark results for org.apache.spark.sql.SparkBloomFilterBenchmark …

4100c07

…(JDK 17, Scala 2.13, split 1 of 1)

Benchmark results for org.apache.spark.sql.SparkBloomFilterBenchmark …

e86f248

…(JDK 17, Scala 2.13, split 1 of 1)

Benchmark results for org.apache.spark.sql.SparkBloomFilterBenchmark …

1973b79

…(JDK 21, Scala 2.13, split 1 of 1)

Benchmark results for org.apache.spark.sql.SparkBloomFilterBenchmark …

3d961f5

…(JDK 25, Scala 2.13, split 1 of 1)

LuciferYang marked this pull request as draft April 20, 2026 07:35

LuciferYang added 2 commits April 20, 2026 18:03

Merge branch 'apache:master' into sketch-bloomfilter-bitmask-opt

5759a49

fix: Update Bloom filter bitSize assertions to match power-of-2 rounding

4c5cfd0

BitArray now rounds numWords up to the next power of 2 for bitmask optimization, so requesting 64*5=320 bits allocates 8 words (512 bits). Update test expectations accordingly.

LuciferYang commented Apr 20, 2026

View reviewed changes

LuciferYang closed this Apr 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-56548][CORE] Replace modulo with bitmask in BloomFilter hot paths#55423

[SPARK-56548][CORE] Replace modulo with bitmask in BloomFilter hot paths#55423
LuciferYang wants to merge 8 commits intoapache:masterfrom
LuciferYang:sketch-bloomfilter-bitmask-opt

LuciferYang commented Apr 20, 2026 •

edited

Loading

Uh oh!

LuciferYang Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

LuciferYang commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

LuciferYang Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

LuciferYang commented Apr 20, 2026 •

edited

Loading