Skip to content

[SPARK-55959][SQL][FOLLOWUP] Gate map-lookup hash optimization on foldable input#55498

Closed
cloud-fan wants to merge 4 commits into
apache:masterfrom
cloud-fan:map-lookup-foldable-gate
Closed

[SPARK-55959][SQL][FOLLOWUP] Gate map-lookup hash optimization on foldable input#55498
cloud-fan wants to merge 4 commits into
apache:masterfrom
cloud-fan:map-lookup-foldable-gate

Conversation

@cloud-fan
Copy link
Copy Markdown
Contributor

@cloud-fan cloud-fan commented Apr 23, 2026

What changes were proposed in this pull request?

Follow-up to #54748 (SPARK-55959). That PR added a hash-based lookup path to
GetMapValueUtil gated by spark.sql.optimizer.mapLookupHashThreshold, but the
hash table was rebuilt per row for non-foldable map columns because the
reference-identity cache ($keys != $lastKeyArray / lastMap ne map) only
hits when the MapData instance is reused across rows. UnsafeRow.getMap()
allocates a fresh UnsafeMapData on every call, so the cache never hit in the
common production case. The benchmark used typedLit(map) (a Literal whose
eval returns the same Scala object every row), which masked the miss.

This PR:

  1. Gates the hash path on child.foldable. Non-foldable maps always use the
    linear scan - their hash index cannot be reused across rows, so the per-row
    build cost would exceed the scan it replaces.
  2. Restructures the two execution paths as a strategy ADT on the trait:
    a MapLookupExecutor sealed inner trait with LinearExecutor (stateless
    path-dependent object) and PrebuiltHashExecutor (holds the pre-built
    lookup state for a constant map). getValueEval and doGetValueGenCode
    become one-line delegators; the choice is made once in chooseExecutor.
    This replaces the null-sentinel foldableMapIndex field and the
    if (index != null) branches that were duplicated between eval and codegen.
  3. Simplifies codegen: no runtime threshold branch, no mutable state, no
    reference-identity check. For the foldable path, the int[] buckets open-
    addressing table, the keyArray / valueArray, and the generic
    HashMap[Any, Int] (for interpreted) are built once on the driver and
    embedded in the generated class via addReferenceObj. The generated per-row
    codegen still uses SPARK-55959's inline primitive hash over the bucket array
    • no autoboxing in the hot loop.
  4. Unifies the type predicate on TypeUtils.typeWithProperEquals; removes
    the codegen-only supportsHashLookup whitelist. The two predicates agreed
    on all reachable map-key types in practice (Variant is prohibited by
    checkForMapKeyType; Geography/Geometry with inherited Object equals
    already have this constraint on SPARK-55959's interpretive path). Unifying
    on one predicate removes a drift hazard.
  5. Adds usesFoldableHashLookup (visible for testing) on GetMapValueUtil
    so tests can assert strategy choice directly, plus updates the config doc
    and minor grammar fixes.

Why are the changes needed?

For realistic queries like SELECT my_map_col['k'] FROM t with maps above
the default threshold (1000 entries), the hash path was rebuilding the table
per row. Per-row hash build (O(n) hash computes + array writes) has higher
constant factors than the O(n) linear scan it replaces for a single lookup
per row, so the original optimization could regress this shape. Gating on
foldability ensures the index is only built when it can actually be reused
across rows - which is the regime where it wins.

Local benchmark evidence confirming both that (a) the fix eliminates the
non-foldable regression and (b) foldable-map performance is unchanged vs
SPARK-55959 is posted below in the PR conversation.

Does this PR introduce any user-facing change?

No. Behavior for foldable map lookups (constants / literals) is unchanged.
Non-foldable map lookups revert to the pre-SPARK-55959 linear-scan path,
which fixes the latent regression.

How was this patch tested?

  • Existing parameterized tests in ComplexTypeSuite and
    CollectionExpressionsSuite continue to pass with both threshold=0
    (hash path taken for foldable maps) and threshold=Int.MaxValue (linear
    path taken).
  • New test GetMapValue - non-foldable map always uses linear scan:
    behavior + usesFoldableHashLookup assertion - a BoundReference-backed
    map column (non-foldable) evaluates correctly for both GetMapValue and
    ElementAt, and is confirmed to land on LinearExecutor even with
    threshold=0.
  • New test GetMapValue - strategy choice for foldable maps: asserts that a
    foldable map above threshold lands on PrebuiltHashExecutor, below
    threshold falls back to LinearExecutor, and that unsupported key types
    (e.g. BinaryType) always land on LinearExecutor regardless of
    threshold.
  • Benchmark code adds a runMapCol variant so the non-foldable shape is
    measured alongside the existing foldable-literal cases (local results
    posted below; *-results.txt should be regenerated on CI for consistent
    hardware).

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.7

@LuciferYang
Copy link
Copy Markdown
Contributor

Thank you for the fix. @cloud-fan

@cloud-fan cloud-fan force-pushed the map-lookup-foldable-gate branch from 60821a0 to 468f66a Compare April 23, 2026 03:02
@LuciferYang
Copy link
Copy Markdown
Contributor

Better refresh the microbenchmark results before merging.

map.valueArray().get(idx, dataType)
private def chooseExecutor(): MapLookupExecutor = left.dataType match {
case MapType(keyType, _, _)
if left.foldable && TypeUtils.typeWithProperEquals(keyType) =>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this isn't strictly equivalent to the old supportsHashLookup whitelist — typeWithProperEquals seems accepts VariantType / GeometryType / GeographyType (all AtomicType), which the old whitelist excluded. Either:

  1. Mention the (intentional) widening in the PR body so reviewers know it's not a mechanical refactor, or
  2. Tighten the predicate to keep behavior bit-for-bit identical, e.g.:
  if left.foldable
    && TypeUtils.typeWithProperEquals(keyType)
    && !keyType.isInstanceOf[VariantType]
    && !keyType.isInstanceOf[GeometryType]
    && !keyType.isInstanceOf[GeographyType] =>

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks. Keeping the predicate on typeWithProperEquals (rather than tightening) so the interpreted and codegen paths stay unified on one predicate - which is what the refactor is aiming for. Added a block comment on chooseExecutor in the latest push explaining why: VariantType is structurally unreachable (checkForMapKeyType rejects it), GeographyType/GeometryType have identity-inherited equals but that's already the state of SPARK-55959's interpreted hash path (same predicate), and non-binary-equality StringType / BinaryType still fall through to linear. Tightening would split the two paths' predicates again, which is the drift hazard we're resolving.

""
}
/** Linear scan over the map's key array. Stateless; no pre-computation. */
protected object LinearExecutor extends MapLookupExecutor {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny nit on the PR body: "LinearExecutor (stateless singleton)" — strictly speaking, a protected object inside a trait is per-outer-instance (path-dependent), not a JVM singleton.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair, thanks - fixed in the updated PR description ("stateless path-dependent object" instead of "stateless singleton"). The in-code doc already says "Stateless; no pre-computation" so no change there.

// HashMap.get(Object) autoboxes primitive keys to their wrapper type (Integer, Long,
// ...) whose equals/hashCode match the values stored during index build.
s"""
|Integer $idxBoxed = (Integer) $indexRef.get($eval2);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JIT escape-analysis can sometimes elide the boxing if the boxed key doesn't escape HashMap.get. In practice this works for monomorphic, hot call sites but is unreliable for megamorphic ones (and HashMap.get is megamorphic by design).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - this comment was about the earlier HashMap.get(Object) + autobox codegen. I force-pushed a rewrite (v2) after your review that replaced that codegen body with the same int[] buckets / inline primitive hash shape SPARK-55959 used (just built once on the driver instead of per row), so there is no autoboxing in the hot loop anymore. Measured in a local comparison (posted below): foldable-map hash-lookup perf matches SPARK-55959 within noise. The previous line 630 (the autobox comment) has been removed.

// With threshold=0, a foldable map would take the hash path. A non-foldable map (backed by
// a row column) must still fall back to linear scan, because its hash index cannot be
// reused across rows (building it per row is a perf regression vs. linear).
withSQLConf(SQLConf.MAP_LOOKUP_HASH_THRESHOLD.key -> "0") {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we tighten this test to assert the strategy choice directly? Something like:

val expr = GetMapValue(mapRef, Literal(1)).asInstanceOf[GetMapValueUtil]
// reflectively read the private `executor` field
val executor = {
  val f = classOf[GetMapValueUtil].getDeclaredField(
    "org$apache$spark$sql$catalyst$expressions$GetMapValueUtil$$executor")
  f.setAccessible(true)
  f.get(expr)
}
assert(executor.isInstanceOf[expr.LinearExecutor.type])

Behavior-only tests are fine but they won't catch a future refactor that flips every map to one strategy.

A symmetric test that a foldable Literal(MapData) above threshold lands on PrebuiltHashExecutor would also be valuable, plus one with a collated StringType key confirming it stays on the linear path.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great suggestion - done. Added private[expressions] def usesFoldableHashLookup on GetMapValueUtil so suites can assert strategy choice without reflection on path-dependent inner types. The existing non-foldable test now asserts !usesFoldableHashLookup, and there's a new GetMapValue - strategy choice for foldable maps test that locks in three cases:

  • foldable + above threshold + hashable key type --> PrebuiltHashExecutor
  • foldable but below threshold --> LinearExecutor
  • foldable + unsupported key type (used BinaryType - simpler than a collated string) --> LinearExecutor even with threshold=0

If you'd prefer an explicit collated-StringType case too, happy to add it; I went with BinaryType because it hits the same typeWithProperEquals = false branch with less ceremony.

@cloud-fan cloud-fan force-pushed the map-lookup-foldable-gate branch 2 times, most recently from e15d138 to d3fe4e8 Compare April 23, 2026 04:55
@cloud-fan
Copy link
Copy Markdown
Contributor Author

Posting local benchmark evidence that validates the PR's two claims (not meant to replace CI-regenerated *-results.txt; local hardware differs). Hardware: Intel(R) Xeon(R) Platinum 8375C @ 2.90GHz, JDK 17.0.14, local[1].

Setup. Ran the benchmark on two source trees, both on this branch's code except for complexTypeExtractors.scala:

  • WITHOUT fixcomplexTypeExtractors.scala from upstream/master (SPARK-55959 as merged).
  • WITH fix — this PR (HEAD).

Reduced to size=10000 for the foldable-literal case and size=1000 for the non-foldable map-column case (10K rows per case, 3 iterations). numRows * mapSize exceeds local heap at larger sizes.

Claim 1: map-col regresses WITHOUT the fix

Non-foldable map column (col('m') backed by row data), threshold=0 (most aggressive), size=1000, hit=1.0:

Case WITHOUT fix (best, ms) WITH fix (best, ms) Delta
GetMapValue interpreted 2144 1153 WITHOUT 1.86x slower
GetMapValue codegen 260 215 WITHOUT 1.21x slower
ElementAt interpreted 2240 1137 WITHOUT 1.97x slower
ElementAt codegen 260 218 WITHOUT 1.19x slower

Without the foldable gate, every row rebuilds the hash index (the reference-identity cache lastMap ne map / $keys != $lastKeyArray never hits for UnsafeRow.getMap(), which allocates a fresh UnsafeMapData per call). With the fix, non-foldable maps short-circuit to linear scan, which is cheaper for single-lookup-per-row shapes.

Claim 2: map-lit perf is unchanged WITH the fix

Foldable map literal (typedLit(map)), size=10000, hit=1.0:

Case WITHOUT fix (best, ms) WITH fix (best, ms) Delta
GetMapValue interp Linear 114 119 ~noise
GetMapValue interp Hash 38 36 ~noise
GetMapValue codegen Linear 88 89 0%
GetMapValue codegen Hash 38 40 ~noise (+5%)
ElementAt interp Linear 100 122 ~noise
ElementAt interp Hash 34 38 ~noise
ElementAt codegen Linear 85 99 ~noise
ElementAt codegen Hash 36 36 0%

For the codegen hash path, the PR's PrebuiltHashExecutor now builds the int[] buckets table on the driver and uses the same inline primitive hash codegen SPARK-55959 used (just without the per-row rebuild branch) — so the best-case hot loop is unchanged.

(An earlier iteration of this PR simplified the codegen to a HashMap.get + autobox lookup and showed a ~25-34% slowdown on this case; that was reverted to preserve SPARK-55959's performance for foldable maps.)

TL;DR

  • Non-foldable maps: the fix eliminates the per-row hash rebuild regression (1.2x-2x speedup depending on path).
  • Foldable maps: performance matches SPARK-55959. The optimization is preserved in the regime where it was designed to win.

@cloud-fan cloud-fan force-pushed the map-lookup-foldable-gate branch from d3fe4e8 to 94e80a9 Compare April 23, 2026 05:07
@cloud-fan
Copy link
Copy Markdown
Contributor Author

Summary of the force-pushes since the initial approval + review round (heads-up @LuciferYang that the HEAD has changed substantially):

  • v2 (between your review and the benchmark-evidence comment): replaced the HashMap.get(Object) + autobox codegen body in PrebuiltHashExecutor with driver-built int[] buckets + SPARK-55959's inline primitive hash codegen. Local comparison (in the benchmark comment below) showed the v1 shape regressed foldable-map hash lookup 25-34% vs SPARK-55959; v2 matches SPARK-55959 within noise.
  • v3 (this push, 94e80a9d553): addresses your four review comments - predicate rationale block comment, usesFoldableHashLookup test helper, strategy choice for foldable maps test, plus the PR description wording fix. See individual replies on each comment thread.

Flagging so you can decide whether the approval should carry over or if you want another round.

Copy link
Copy Markdown
Contributor

@LuciferYang LuciferYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v2/v3 LGTM, approval carries over. Spot-checked hashKeyOnDriver against genHash per keyType and the cases match (including Byte/Short implicit int promotion). Strategy tests cover what I was after.

One optional nit: buildHashBuckets clamps cap at 2^30, so len ≥ ~2^29 would spin the fill loop. Seems unreachable in practice (would OOM first), but a require would fail-fast. Skip if you'd rather not.

@cloud-fan cloud-fan force-pushed the map-lookup-foldable-gate branch from 94e80a9 to ad0da30 Compare April 23, 2026 10:52
…dable input

### What changes were proposed in this pull request?

Follow-up to apache#54748 (SPARK-55959). That PR added a hash-based lookup path to
`GetMapValueUtil` gated by `spark.sql.optimizer.mapLookupHashThreshold`, but the
hash table was rebuilt per row for non-foldable map columns because the
reference-identity cache (`$keys != $lastKeyArray` / `lastMap ne map`) only
hits when the `MapData` instance is reused across rows. `UnsafeRow.getMap()`
allocates a fresh `UnsafeMapData` on every call, so the cache never hit in the
common production case. The benchmark used `typedLit(map)` (a `Literal` whose
`eval` returns the same Scala object every row), which masked the miss.

This PR:
1. Gates the hash path on `child.foldable`. Non-foldable maps always use the
   linear scan — their hash index cannot be reused across rows, so the per-row
   build cost would exceed the scan it replaces.
2. Restructures the two execution paths as a strategy ADT on the trait:
   a `MapLookupExecutor` sealed inner trait with `LinearExecutor` (stateless
   singleton) and `PrebuiltHashExecutor` (holds the pre-built `HashMap` and
   value array for a constant map). `getValueEval` and `doGetValueGenCode`
   become one-line delegators; the choice is made once in `chooseExecutor`.
   This replaces the null-sentinel `foldableMapIndex` field and the
   `if (index != null)` branches that were duplicated between eval and codegen.
3. Simplifies codegen accordingly: no runtime threshold branch, no mutable
   state, no reference-identity check. The `java.util.HashMap` and the value
   array are built once on the driver and embedded in the generated class via
   `addReferenceObj`; the generated per-row code is a single boxed
   `HashMap.get` + null check + value read.
4. Unifies the type predicate on `TypeUtils.typeWithProperEquals`; removes
   the codegen-only `supportsHashLookup` whitelist. The two predicates agreed
   on all supported types in practice; keeping them in two places was a
   drift hazard.
5. Updates the config doc to reflect that the threshold applies only to
   foldable maps; fixes a minor grammar issue.

### Why are the changes needed?

For realistic queries like `SELECT my_map_col['k'] FROM t` with maps above
the default threshold (1000 entries), the hash path was rebuilding the table
per row. Per-row hash build (O(n) hash computes + array writes) has higher
constant factors than the O(n) linear scan it replaces for a single lookup
per row, so the original optimization could regress this shape. Gating on
foldability ensures the index is only built when it can actually be reused
across rows — which is the regime where it wins.

The strategy-ADT restructure groups each execution path's eval and codegen
together, removes the duplicated dispatch branching, and replaces the
`null-means-linear` convention with a sealed type.

### Does this PR introduce _any_ user-facing change?

No. Behavior for foldable map lookups (constants / literals) is unchanged.
Non-foldable map lookups revert to the pre-SPARK-55959 linear-scan path,
which fixes the latent regression.

### How was this patch tested?

- Existing parameterized tests in `ComplexTypeSuite` and
  `CollectionExpressionsSuite` continue to pass with both `threshold=0`
  (hash path taken for foldable maps) and `threshold=Int.MaxValue` (linear
  path taken).
- New test `GetMapValue - non-foldable map always uses linear scan`: with
  `threshold=0`, a `BoundReference`-backed map column (non-foldable)
  evaluates correctly for `GetMapValue` and `ElementAt`, including null-map
  handling.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.7

Co-authored-by: Isaac
@cloud-fan cloud-fan force-pushed the map-lookup-foldable-gate branch from ad0da30 to 33d8351 Compare April 23, 2026 13:23
// threshold (hash wins clearly), 1000 is at the threshold, and 10 is well below it
// (hash overhead dominates -- justifies the threshold default). Upper bound is capped
// because task serialization of a 1M-entry literal exceeds sbt's default 8g heap.
for (size <- Seq(10000, 1000, 10)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fine for me

…kupBenchmark (JDK 17, Scala 2.13, split 1 of 1)
…kupBenchmark (JDK 21, Scala 2.13, split 1 of 1)
…kupBenchmark (JDK 25, Scala 2.13, split 1 of 1)
@cloud-fan
Copy link
Copy Markdown
Contributor Author

thanks for the review, merging to master!

@cloud-fan cloud-fan closed this in 54ee1a9 Apr 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants