[SPARK-55959][SQL][FOLLOWUP] Gate map-lookup hash optimization on foldable input by cloud-fan · Pull Request #55498 · apache/spark

cloud-fan · 2026-04-23T02:39:48Z

What changes were proposed in this pull request?

Follow-up to #54748 (SPARK-55959). That PR added a hash-based lookup path to
GetMapValueUtil gated by spark.sql.optimizer.mapLookupHashThreshold, but the
hash table was rebuilt per row for non-foldable map columns because the
reference-identity cache ($keys != $lastKeyArray / lastMap ne map) only
hits when the MapData instance is reused across rows. UnsafeRow.getMap()
allocates a fresh UnsafeMapData on every call, so the cache never hit in the
common production case. The benchmark used typedLit(map) (a Literal whose
eval returns the same Scala object every row), which masked the miss.

This PR:

Gates the hash path on child.foldable. Non-foldable maps always use the
linear scan - their hash index cannot be reused across rows, so the per-row
build cost would exceed the scan it replaces.
Restructures the two execution paths as a strategy ADT on the trait:
a MapLookupExecutor sealed inner trait with LinearExecutor (stateless
path-dependent object) and PrebuiltHashExecutor (holds the pre-built
lookup state for a constant map). getValueEval and doGetValueGenCode
become one-line delegators; the choice is made once in chooseExecutor.
This replaces the null-sentinel foldableMapIndex field and the
if (index != null) branches that were duplicated between eval and codegen.
Simplifies codegen: no runtime threshold branch, no mutable state, no
reference-identity check. For the foldable path, the int[] buckets open-
addressing table, the keyArray / valueArray, and the generic
HashMap[Any, Int] (for interpreted) are built once on the driver and
embedded in the generated class via addReferenceObj. The generated per-row
codegen still uses SPARK-55959's inline primitive hash over the bucket array
- no autoboxing in the hot loop.
Unifies the type predicate on TypeUtils.typeWithProperEquals; removes
the codegen-only supportsHashLookup whitelist. The two predicates agreed
on all reachable map-key types in practice (Variant is prohibited by
checkForMapKeyType; Geography/Geometry with inherited Object equals
already have this constraint on SPARK-55959's interpretive path). Unifying
on one predicate removes a drift hazard.
Adds usesFoldableHashLookup (visible for testing) on GetMapValueUtil
so tests can assert strategy choice directly, plus updates the config doc
and minor grammar fixes.

Why are the changes needed?

For realistic queries like SELECT my_map_col['k'] FROM t with maps above
the default threshold (1000 entries), the hash path was rebuilding the table
per row. Per-row hash build (O(n) hash computes + array writes) has higher
constant factors than the O(n) linear scan it replaces for a single lookup
per row, so the original optimization could regress this shape. Gating on
foldability ensures the index is only built when it can actually be reused
across rows - which is the regime where it wins.

Local benchmark evidence confirming both that (a) the fix eliminates the
non-foldable regression and (b) foldable-map performance is unchanged vs
SPARK-55959 is posted below in the PR conversation.

Does this PR introduce any user-facing change?

No. Behavior for foldable map lookups (constants / literals) is unchanged.
Non-foldable map lookups revert to the pre-SPARK-55959 linear-scan path,
which fixes the latent regression.

How was this patch tested?

Existing parameterized tests in ComplexTypeSuite and
CollectionExpressionsSuite continue to pass with both threshold=0
(hash path taken for foldable maps) and threshold=Int.MaxValue (linear
path taken).
New test GetMapValue - non-foldable map always uses linear scan:
behavior + usesFoldableHashLookup assertion - a BoundReference-backed
map column (non-foldable) evaluates correctly for both GetMapValue and
ElementAt, and is confirmed to land on LinearExecutor even with
threshold=0.
New test GetMapValue - strategy choice for foldable maps: asserts that a
foldable map above threshold lands on PrebuiltHashExecutor, below
threshold falls back to LinearExecutor, and that unsupported key types
(e.g. BinaryType) always land on LinearExecutor regardless of
threshold.
Benchmark code adds a runMapCol variant so the non-foldable shape is
measured alongside the existing foldable-literal cases (local results
posted below; *-results.txt should be regenerated on CI for consistent
hardware).

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.7

LuciferYang · 2026-04-23T03:01:33Z

Thank you for the fix. @cloud-fan

LuciferYang · 2026-04-23T03:07:40Z

Better refresh the microbenchmark results before merging.

LuciferYang · 2026-04-23T03:30:08Z

-        map.valueArray().get(idx, dataType)
+  private def chooseExecutor(): MapLookupExecutor = left.dataType match {
+    case MapType(keyType, _, _)
+        if left.foldable && TypeUtils.typeWithProperEquals(keyType) =>


nit: this isn't strictly equivalent to the old supportsHashLookup whitelist — typeWithProperEquals seems accepts VariantType / GeometryType / GeographyType (all AtomicType), which the old whitelist excluded. Either:

Mention the (intentional) widening in the PR body so reviewers know it's not a mechanical refactor, or

Tighten the predicate to keep behavior bit-for-bit identical, e.g.:

if left.foldable && TypeUtils.typeWithProperEquals(keyType) && !keyType.isInstanceOf[VariantType] && !keyType.isInstanceOf[GeometryType] && !keyType.isInstanceOf[GeographyType] =>

Good catch, thanks. Keeping the predicate on typeWithProperEquals (rather than tightening) so the interpreted and codegen paths stay unified on one predicate - which is what the refactor is aiming for. Added a block comment on chooseExecutor in the latest push explaining why: VariantType is structurally unreachable (checkForMapKeyType rejects it), GeographyType/GeometryType have identity-inherited equals but that's already the state of SPARK-55959's interpreted hash path (same predicate), and non-binary-equality StringType / BinaryType still fall through to linear. Tightening would split the two paths' predicates again, which is the drift hazard we're resolving.

LuciferYang · 2026-04-23T03:35:10Z

-      ""
-    }
+  /** Linear scan over the map's key array. Stateless; no pre-computation. */
+  protected object LinearExecutor extends MapLookupExecutor {


Tiny nit on the PR body: "LinearExecutor (stateless singleton)" — strictly speaking, a protected object inside a trait is per-outer-instance (path-dependent), not a JVM singleton.

Fair, thanks - fixed in the updated PR description ("stateless path-dependent object" instead of "stateless singleton"). The in-code doc already says "Stateless; no pre-computation" so no change there.

LuciferYang · 2026-04-23T03:48:06Z

+        // HashMap.get(Object) autoboxes primitive keys to their wrapper type (Integer, Long,
+        // ...) whose equals/hashCode match the values stored during index build.
+        s"""
+           |Integer $idxBoxed = (Integer) $indexRef.get($eval2);


JIT escape-analysis can sometimes elide the boxing if the boxed key doesn't escape HashMap.get. In practice this works for monomorphic, hot call sites but is unreliable for megamorphic ones (and HashMap.get is megamorphic by design).

Good point - this comment was about the earlier HashMap.get(Object) + autobox codegen. I force-pushed a rewrite (v2) after your review that replaced that codegen body with the same int[] buckets / inline primitive hash shape SPARK-55959 used (just built once on the driver instead of per row), so there is no autoboxing in the hot loop anymore. Measured in a local comparison (posted below): foldable-map hash-lookup perf matches SPARK-55959 within noise. The previous line 630 (the autobox comment) has been removed.

LuciferYang · 2026-04-23T03:51:24Z

+    // With threshold=0, a foldable map would take the hash path. A non-foldable map (backed by
+    // a row column) must still fall back to linear scan, because its hash index cannot be
+    // reused across rows (building it per row is a perf regression vs. linear).
+    withSQLConf(SQLConf.MAP_LOOKUP_HASH_THRESHOLD.key -> "0") {


Could we tighten this test to assert the strategy choice directly? Something like:

val expr = GetMapValue(mapRef, Literal(1)).asInstanceOf[GetMapValueUtil] // reflectively read the private `executor` field val executor = { val f = classOf[GetMapValueUtil].getDeclaredField( "org$apache$spark$sql$catalyst$expressions$GetMapValueUtil$$executor") f.setAccessible(true) f.get(expr) } assert(executor.isInstanceOf[expr.LinearExecutor.type])

Behavior-only tests are fine but they won't catch a future refactor that flips every map to one strategy.

A symmetric test that a foldable Literal(MapData) above threshold lands on PrebuiltHashExecutor would also be valuable, plus one with a collated StringType key confirming it stays on the linear path.

Great suggestion - done. Added private[expressions] def usesFoldableHashLookup on GetMapValueUtil so suites can assert strategy choice without reflection on path-dependent inner types. The existing non-foldable test now asserts !usesFoldableHashLookup, and there's a new GetMapValue - strategy choice for foldable maps test that locks in three cases:

foldable + above threshold + hashable key type --> PrebuiltHashExecutor

foldable but below threshold --> LinearExecutor

foldable + unsupported key type (used BinaryType - simpler than a collated string) --> LinearExecutor even with threshold=0

If you'd prefer an explicit collated-StringType case too, happy to add it; I went with BinaryType because it hits the same typeWithProperEquals = false branch with less ceremony.

cloud-fan · 2026-04-23T04:56:38Z

Posting local benchmark evidence that validates the PR's two claims (not meant to replace CI-regenerated *-results.txt; local hardware differs). Hardware: Intel(R) Xeon(R) Platinum 8375C @ 2.90GHz, JDK 17.0.14, local[1].

Setup. Ran the benchmark on two source trees, both on this branch's code except for complexTypeExtractors.scala:

WITHOUT fix — complexTypeExtractors.scala from upstream/master (SPARK-55959 as merged).
WITH fix — this PR (HEAD).

Reduced to size=10000 for the foldable-literal case and size=1000 for the non-foldable map-column case (10K rows per case, 3 iterations). numRows * mapSize exceeds local heap at larger sizes.

Claim 1: map-col regresses WITHOUT the fix

Non-foldable map column (col('m') backed by row data), threshold=0 (most aggressive), size=1000, hit=1.0:

Case	WITHOUT fix (best, ms)	WITH fix (best, ms)	Delta
GetMapValue interpreted	2144	1153	WITHOUT 1.86x slower
GetMapValue codegen	260	215	WITHOUT 1.21x slower
ElementAt interpreted	2240	1137	WITHOUT 1.97x slower
ElementAt codegen	260	218	WITHOUT 1.19x slower

Without the foldable gate, every row rebuilds the hash index (the reference-identity cache lastMap ne map / $keys != $lastKeyArray never hits for UnsafeRow.getMap(), which allocates a fresh UnsafeMapData per call). With the fix, non-foldable maps short-circuit to linear scan, which is cheaper for single-lookup-per-row shapes.

Claim 2: map-lit perf is unchanged WITH the fix

Foldable map literal (typedLit(map)), size=10000, hit=1.0:

Case	WITHOUT fix (best, ms)	WITH fix (best, ms)	Delta
GetMapValue interp Linear	114	119	~noise
GetMapValue interp Hash	38	36	~noise
GetMapValue codegen Linear	88	89	0%
GetMapValue codegen Hash	38	40	~noise (+5%)
ElementAt interp Linear	100	122	~noise
ElementAt interp Hash	34	38	~noise
ElementAt codegen Linear	85	99	~noise
ElementAt codegen Hash	36	36	0%

For the codegen hash path, the PR's PrebuiltHashExecutor now builds the int[] buckets table on the driver and uses the same inline primitive hash codegen SPARK-55959 used (just without the per-row rebuild branch) — so the best-case hot loop is unchanged.

(An earlier iteration of this PR simplified the codegen to a HashMap.get + autobox lookup and showed a ~25-34% slowdown on this case; that was reverted to preserve SPARK-55959's performance for foldable maps.)

TL;DR

Non-foldable maps: the fix eliminates the per-row hash rebuild regression (1.2x-2x speedup depending on path).
Foldable maps: performance matches SPARK-55959. The optimization is preserved in the regime where it was designed to win.

cloud-fan · 2026-04-23T05:09:15Z

Summary of the force-pushes since the initial approval + review round (heads-up @LuciferYang that the HEAD has changed substantially):

v2 (between your review and the benchmark-evidence comment): replaced the HashMap.get(Object) + autobox codegen body in PrebuiltHashExecutor with driver-built int[] buckets + SPARK-55959's inline primitive hash codegen. Local comparison (in the benchmark comment below) showed the v1 shape regressed foldable-map hash lookup 25-34% vs SPARK-55959; v2 matches SPARK-55959 within noise.
v3 (this push, 94e80a9d553): addresses your four review comments - predicate rationale block comment, usesFoldableHashLookup test helper, strategy choice for foldable maps test, plus the PR description wording fix. See individual replies on each comment thread.

Flagging so you can decide whether the approval should carry over or if you want another round.

LuciferYang

v2/v3 LGTM, approval carries over. Spot-checked hashKeyOnDriver against genHash per keyType and the cases match (including Byte/Short implicit int promotion). Strategy tests cover what I was after.

One optional nit: buildHashBuckets clamps cap at 2^30, so len ≥ ~2^29 would spin the fill loop. Seems unreachable in practice (would OOM first), but a require would fail-fast. Skip if you'd rather not.

…dable input ### What changes were proposed in this pull request? Follow-up to apache#54748 (SPARK-55959). That PR added a hash-based lookup path to `GetMapValueUtil` gated by `spark.sql.optimizer.mapLookupHashThreshold`, but the hash table was rebuilt per row for non-foldable map columns because the reference-identity cache (`$keys != $lastKeyArray` / `lastMap ne map`) only hits when the `MapData` instance is reused across rows. `UnsafeRow.getMap()` allocates a fresh `UnsafeMapData` on every call, so the cache never hit in the common production case. The benchmark used `typedLit(map)` (a `Literal` whose `eval` returns the same Scala object every row), which masked the miss. This PR: 1. Gates the hash path on `child.foldable`. Non-foldable maps always use the linear scan — their hash index cannot be reused across rows, so the per-row build cost would exceed the scan it replaces. 2. Restructures the two execution paths as a strategy ADT on the trait: a `MapLookupExecutor` sealed inner trait with `LinearExecutor` (stateless singleton) and `PrebuiltHashExecutor` (holds the pre-built `HashMap` and value array for a constant map). `getValueEval` and `doGetValueGenCode` become one-line delegators; the choice is made once in `chooseExecutor`. This replaces the null-sentinel `foldableMapIndex` field and the `if (index != null)` branches that were duplicated between eval and codegen. 3. Simplifies codegen accordingly: no runtime threshold branch, no mutable state, no reference-identity check. The `java.util.HashMap` and the value array are built once on the driver and embedded in the generated class via `addReferenceObj`; the generated per-row code is a single boxed `HashMap.get` + null check + value read. 4. Unifies the type predicate on `TypeUtils.typeWithProperEquals`; removes the codegen-only `supportsHashLookup` whitelist. The two predicates agreed on all supported types in practice; keeping them in two places was a drift hazard. 5. Updates the config doc to reflect that the threshold applies only to foldable maps; fixes a minor grammar issue. ### Why are the changes needed? For realistic queries like `SELECT my_map_col['k'] FROM t` with maps above the default threshold (1000 entries), the hash path was rebuilding the table per row. Per-row hash build (O(n) hash computes + array writes) has higher constant factors than the O(n) linear scan it replaces for a single lookup per row, so the original optimization could regress this shape. Gating on foldability ensures the index is only built when it can actually be reused across rows — which is the regime where it wins. The strategy-ADT restructure groups each execution path's eval and codegen together, removes the duplicated dispatch branching, and replaces the `null-means-linear` convention with a sealed type. ### Does this PR introduce _any_ user-facing change? No. Behavior for foldable map lookups (constants / literals) is unchanged. Non-foldable map lookups revert to the pre-SPARK-55959 linear-scan path, which fixes the latent regression. ### How was this patch tested? - Existing parameterized tests in `ComplexTypeSuite` and `CollectionExpressionsSuite` continue to pass with both `threshold=0` (hash path taken for foldable maps) and `threshold=Int.MaxValue` (linear path taken). - New test `GetMapValue - non-foldable map always uses linear scan`: with `threshold=0`, a `BoundReference`-backed map column (non-foldable) evaluates correctly for `GetMapValue` and `ElementAt`, including null-map handling. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Opus 4.7 Co-authored-by: Isaac

LuciferYang · 2026-04-23T13:27:17Z

+    // threshold (hash wins clearly), 1000 is at the threshold, and 10 is well below it
+    // (hash overhead dominates -- justifies the threshold default). Upper bound is capped
+    // because task serialization of a 1M-entry literal exceeds sbt's default 8g heap.
+    for (size <- Seq(10000, 1000, 10)) {


fine for me

…kupBenchmark (JDK 17, Scala 2.13, split 1 of 1)

…kupBenchmark (JDK 21, Scala 2.13, split 1 of 1)

…kupBenchmark (JDK 25, Scala 2.13, split 1 of 1)

cloud-fan · 2026-04-23T15:28:12Z

thanks for the review, merging to master!

LuciferYang approved these changes Apr 23, 2026

View reviewed changes

cloud-fan force-pushed the map-lookup-foldable-gate branch from 60821a0 to 468f66a Compare April 23, 2026 03:02

LuciferYang reviewed Apr 23, 2026

View reviewed changes

cloud-fan force-pushed the map-lookup-foldable-gate branch 2 times, most recently from e15d138 to d3fe4e8 Compare April 23, 2026 04:55

cloud-fan force-pushed the map-lookup-foldable-gate branch from d3fe4e8 to 94e80a9 Compare April 23, 2026 05:07

LuciferYang approved these changes Apr 23, 2026

View reviewed changes

cloud-fan force-pushed the map-lookup-foldable-gate branch from 94e80a9 to ad0da30 Compare April 23, 2026 10:52

cloud-fan force-pushed the map-lookup-foldable-gate branch from ad0da30 to 33d8351 Compare April 23, 2026 13:23

LuciferYang reviewed Apr 23, 2026

View reviewed changes

Benchmark results for org.apache.spark.sql.execution.benchmark.MapLoo…

88b3108

…kupBenchmark (JDK 17, Scala 2.13, split 1 of 1)

LuciferYang approved these changes Apr 23, 2026

View reviewed changes

cloud-fan added 2 commits April 23, 2026 14:22

Benchmark results for org.apache.spark.sql.execution.benchmark.MapLoo…

5a61619

…kupBenchmark (JDK 21, Scala 2.13, split 1 of 1)

Benchmark results for org.apache.spark.sql.execution.benchmark.MapLoo…

46961a3

…kupBenchmark (JDK 25, Scala 2.13, split 1 of 1)

cloud-fan closed this in 54ee1a9 Apr 23, 2026

Conversation

cloud-fan commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

LuciferYang commented Apr 23, 2026

Uh oh!

LuciferYang commented Apr 23, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Apr 23, 2026

Claim 1: map-col regresses WITHOUT the fix

Claim 2: map-lit perf is unchanged WITH the fix

TL;DR

Uh oh!

cloud-fan commented Apr 23, 2026

Uh oh!

LuciferYang left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cloud-fan commented Apr 23, 2026 •

edited

Loading

LuciferYang left a comment •

edited

Loading