[SPARK-55959][SQL][FOLLOWUP] Gate map-lookup hash optimization on foldable input#55498
[SPARK-55959][SQL][FOLLOWUP] Gate map-lookup hash optimization on foldable input#55498cloud-fan wants to merge 4 commits into
Conversation
|
Thank you for the fix. @cloud-fan |
60821a0 to
468f66a
Compare
|
Better refresh the microbenchmark results before merging. |
| map.valueArray().get(idx, dataType) | ||
| private def chooseExecutor(): MapLookupExecutor = left.dataType match { | ||
| case MapType(keyType, _, _) | ||
| if left.foldable && TypeUtils.typeWithProperEquals(keyType) => |
There was a problem hiding this comment.
nit: this isn't strictly equivalent to the old supportsHashLookup whitelist — typeWithProperEquals seems accepts VariantType / GeometryType / GeographyType (all AtomicType), which the old whitelist excluded. Either:
- Mention the (intentional) widening in the PR body so reviewers know it's not a mechanical refactor, or
- Tighten the predicate to keep behavior bit-for-bit identical, e.g.:
if left.foldable
&& TypeUtils.typeWithProperEquals(keyType)
&& !keyType.isInstanceOf[VariantType]
&& !keyType.isInstanceOf[GeometryType]
&& !keyType.isInstanceOf[GeographyType] =>There was a problem hiding this comment.
Good catch, thanks. Keeping the predicate on typeWithProperEquals (rather than tightening) so the interpreted and codegen paths stay unified on one predicate - which is what the refactor is aiming for. Added a block comment on chooseExecutor in the latest push explaining why: VariantType is structurally unreachable (checkForMapKeyType rejects it), GeographyType/GeometryType have identity-inherited equals but that's already the state of SPARK-55959's interpreted hash path (same predicate), and non-binary-equality StringType / BinaryType still fall through to linear. Tightening would split the two paths' predicates again, which is the drift hazard we're resolving.
| "" | ||
| } | ||
| /** Linear scan over the map's key array. Stateless; no pre-computation. */ | ||
| protected object LinearExecutor extends MapLookupExecutor { |
There was a problem hiding this comment.
Tiny nit on the PR body: "LinearExecutor (stateless singleton)" — strictly speaking, a protected object inside a trait is per-outer-instance (path-dependent), not a JVM singleton.
There was a problem hiding this comment.
Fair, thanks - fixed in the updated PR description ("stateless path-dependent object" instead of "stateless singleton"). The in-code doc already says "Stateless; no pre-computation" so no change there.
| // HashMap.get(Object) autoboxes primitive keys to their wrapper type (Integer, Long, | ||
| // ...) whose equals/hashCode match the values stored during index build. | ||
| s""" | ||
| |Integer $idxBoxed = (Integer) $indexRef.get($eval2); |
There was a problem hiding this comment.
JIT escape-analysis can sometimes elide the boxing if the boxed key doesn't escape HashMap.get. In practice this works for monomorphic, hot call sites but is unreliable for megamorphic ones (and HashMap.get is megamorphic by design).
There was a problem hiding this comment.
Good point - this comment was about the earlier HashMap.get(Object) + autobox codegen. I force-pushed a rewrite (v2) after your review that replaced that codegen body with the same int[] buckets / inline primitive hash shape SPARK-55959 used (just built once on the driver instead of per row), so there is no autoboxing in the hot loop anymore. Measured in a local comparison (posted below): foldable-map hash-lookup perf matches SPARK-55959 within noise. The previous line 630 (the autobox comment) has been removed.
| // With threshold=0, a foldable map would take the hash path. A non-foldable map (backed by | ||
| // a row column) must still fall back to linear scan, because its hash index cannot be | ||
| // reused across rows (building it per row is a perf regression vs. linear). | ||
| withSQLConf(SQLConf.MAP_LOOKUP_HASH_THRESHOLD.key -> "0") { |
There was a problem hiding this comment.
Could we tighten this test to assert the strategy choice directly? Something like:
val expr = GetMapValue(mapRef, Literal(1)).asInstanceOf[GetMapValueUtil]
// reflectively read the private `executor` field
val executor = {
val f = classOf[GetMapValueUtil].getDeclaredField(
"org$apache$spark$sql$catalyst$expressions$GetMapValueUtil$$executor")
f.setAccessible(true)
f.get(expr)
}
assert(executor.isInstanceOf[expr.LinearExecutor.type])Behavior-only tests are fine but they won't catch a future refactor that flips every map to one strategy.
A symmetric test that a foldable Literal(MapData) above threshold lands on PrebuiltHashExecutor would also be valuable, plus one with a collated StringType key confirming it stays on the linear path.
There was a problem hiding this comment.
Great suggestion - done. Added private[expressions] def usesFoldableHashLookup on GetMapValueUtil so suites can assert strategy choice without reflection on path-dependent inner types. The existing non-foldable test now asserts !usesFoldableHashLookup, and there's a new GetMapValue - strategy choice for foldable maps test that locks in three cases:
- foldable + above threshold + hashable key type -->
PrebuiltHashExecutor - foldable but below threshold -->
LinearExecutor - foldable + unsupported key type (used
BinaryType- simpler than a collated string) -->LinearExecutoreven withthreshold=0
If you'd prefer an explicit collated-StringType case too, happy to add it; I went with BinaryType because it hits the same typeWithProperEquals = false branch with less ceremony.
e15d138 to
d3fe4e8
Compare
|
Posting local benchmark evidence that validates the PR's two claims (not meant to replace CI-regenerated Setup. Ran the benchmark on two source trees, both on this branch's code except for
Reduced to Claim 1: map-col regresses WITHOUT the fixNon-foldable map column (
Without the foldable gate, every row rebuilds the hash index (the reference-identity cache Claim 2: map-lit perf is unchanged WITH the fixFoldable map literal (
For the codegen hash path, the PR's (An earlier iteration of this PR simplified the codegen to a TL;DR
|
d3fe4e8 to
94e80a9
Compare
|
Summary of the force-pushes since the initial approval + review round (heads-up @LuciferYang that the HEAD has changed substantially):
Flagging so you can decide whether the approval should carry over or if you want another round. |
There was a problem hiding this comment.
v2/v3 LGTM, approval carries over. Spot-checked hashKeyOnDriver against genHash per keyType and the cases match (including Byte/Short implicit int promotion). Strategy tests cover what I was after.
One optional nit: buildHashBuckets clamps cap at 2^30, so len ≥ ~2^29 would spin the fill loop. Seems unreachable in practice (would OOM first), but a require would fail-fast. Skip if you'd rather not.
94e80a9 to
ad0da30
Compare
…dable input ### What changes were proposed in this pull request? Follow-up to apache#54748 (SPARK-55959). That PR added a hash-based lookup path to `GetMapValueUtil` gated by `spark.sql.optimizer.mapLookupHashThreshold`, but the hash table was rebuilt per row for non-foldable map columns because the reference-identity cache (`$keys != $lastKeyArray` / `lastMap ne map`) only hits when the `MapData` instance is reused across rows. `UnsafeRow.getMap()` allocates a fresh `UnsafeMapData` on every call, so the cache never hit in the common production case. The benchmark used `typedLit(map)` (a `Literal` whose `eval` returns the same Scala object every row), which masked the miss. This PR: 1. Gates the hash path on `child.foldable`. Non-foldable maps always use the linear scan — their hash index cannot be reused across rows, so the per-row build cost would exceed the scan it replaces. 2. Restructures the two execution paths as a strategy ADT on the trait: a `MapLookupExecutor` sealed inner trait with `LinearExecutor` (stateless singleton) and `PrebuiltHashExecutor` (holds the pre-built `HashMap` and value array for a constant map). `getValueEval` and `doGetValueGenCode` become one-line delegators; the choice is made once in `chooseExecutor`. This replaces the null-sentinel `foldableMapIndex` field and the `if (index != null)` branches that were duplicated between eval and codegen. 3. Simplifies codegen accordingly: no runtime threshold branch, no mutable state, no reference-identity check. The `java.util.HashMap` and the value array are built once on the driver and embedded in the generated class via `addReferenceObj`; the generated per-row code is a single boxed `HashMap.get` + null check + value read. 4. Unifies the type predicate on `TypeUtils.typeWithProperEquals`; removes the codegen-only `supportsHashLookup` whitelist. The two predicates agreed on all supported types in practice; keeping them in two places was a drift hazard. 5. Updates the config doc to reflect that the threshold applies only to foldable maps; fixes a minor grammar issue. ### Why are the changes needed? For realistic queries like `SELECT my_map_col['k'] FROM t` with maps above the default threshold (1000 entries), the hash path was rebuilding the table per row. Per-row hash build (O(n) hash computes + array writes) has higher constant factors than the O(n) linear scan it replaces for a single lookup per row, so the original optimization could regress this shape. Gating on foldability ensures the index is only built when it can actually be reused across rows — which is the regime where it wins. The strategy-ADT restructure groups each execution path's eval and codegen together, removes the duplicated dispatch branching, and replaces the `null-means-linear` convention with a sealed type. ### Does this PR introduce _any_ user-facing change? No. Behavior for foldable map lookups (constants / literals) is unchanged. Non-foldable map lookups revert to the pre-SPARK-55959 linear-scan path, which fixes the latent regression. ### How was this patch tested? - Existing parameterized tests in `ComplexTypeSuite` and `CollectionExpressionsSuite` continue to pass with both `threshold=0` (hash path taken for foldable maps) and `threshold=Int.MaxValue` (linear path taken). - New test `GetMapValue - non-foldable map always uses linear scan`: with `threshold=0`, a `BoundReference`-backed map column (non-foldable) evaluates correctly for `GetMapValue` and `ElementAt`, including null-map handling. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Opus 4.7 Co-authored-by: Isaac
ad0da30 to
33d8351
Compare
| // threshold (hash wins clearly), 1000 is at the threshold, and 10 is well below it | ||
| // (hash overhead dominates -- justifies the threshold default). Upper bound is capped | ||
| // because task serialization of a 1M-entry literal exceeds sbt's default 8g heap. | ||
| for (size <- Seq(10000, 1000, 10)) { |
…kupBenchmark (JDK 17, Scala 2.13, split 1 of 1)
…kupBenchmark (JDK 21, Scala 2.13, split 1 of 1)
…kupBenchmark (JDK 25, Scala 2.13, split 1 of 1)
|
thanks for the review, merging to master! |
What changes were proposed in this pull request?
Follow-up to #54748 (SPARK-55959). That PR added a hash-based lookup path to
GetMapValueUtilgated byspark.sql.optimizer.mapLookupHashThreshold, but thehash table was rebuilt per row for non-foldable map columns because the
reference-identity cache (
$keys != $lastKeyArray/lastMap ne map) onlyhits when the
MapDatainstance is reused across rows.UnsafeRow.getMap()allocates a fresh
UnsafeMapDataon every call, so the cache never hit in thecommon production case. The benchmark used
typedLit(map)(aLiteralwhoseevalreturns the same Scala object every row), which masked the miss.This PR:
child.foldable. Non-foldable maps always use thelinear scan - their hash index cannot be reused across rows, so the per-row
build cost would exceed the scan it replaces.
a
MapLookupExecutorsealed inner trait withLinearExecutor(statelesspath-dependent
object) andPrebuiltHashExecutor(holds the pre-builtlookup state for a constant map).
getValueEvalanddoGetValueGenCodebecome one-line delegators; the choice is made once in
chooseExecutor.This replaces the null-sentinel
foldableMapIndexfield and theif (index != null)branches that were duplicated between eval and codegen.reference-identity check. For the foldable path, the
int[] bucketsopen-addressing table, the
keyArray/valueArray, and the genericHashMap[Any, Int](for interpreted) are built once on the driver andembedded in the generated class via
addReferenceObj. The generated per-rowcodegen still uses SPARK-55959's inline primitive hash over the bucket array
TypeUtils.typeWithProperEquals; removesthe codegen-only
supportsHashLookupwhitelist. The two predicates agreedon all reachable map-key types in practice (Variant is prohibited by
checkForMapKeyType; Geography/Geometry with inherited Object equalsalready have this constraint on SPARK-55959's interpretive path). Unifying
on one predicate removes a drift hazard.
usesFoldableHashLookup(visible for testing) onGetMapValueUtilso tests can assert strategy choice directly, plus updates the config doc
and minor grammar fixes.
Why are the changes needed?
For realistic queries like
SELECT my_map_col['k'] FROM twith maps abovethe default threshold (1000 entries), the hash path was rebuilding the table
per row. Per-row hash build (O(n) hash computes + array writes) has higher
constant factors than the O(n) linear scan it replaces for a single lookup
per row, so the original optimization could regress this shape. Gating on
foldability ensures the index is only built when it can actually be reused
across rows - which is the regime where it wins.
Local benchmark evidence confirming both that (a) the fix eliminates the
non-foldable regression and (b) foldable-map performance is unchanged vs
SPARK-55959 is posted below in the PR conversation.
Does this PR introduce any user-facing change?
No. Behavior for foldable map lookups (constants / literals) is unchanged.
Non-foldable map lookups revert to the pre-SPARK-55959 linear-scan path,
which fixes the latent regression.
How was this patch tested?
ComplexTypeSuiteandCollectionExpressionsSuitecontinue to pass with boththreshold=0(hash path taken for foldable maps) and
threshold=Int.MaxValue(linearpath taken).
GetMapValue - non-foldable map always uses linear scan:behavior +
usesFoldableHashLookupassertion - aBoundReference-backedmap column (non-foldable) evaluates correctly for both
GetMapValueandElementAt, and is confirmed to land onLinearExecutoreven withthreshold=0.GetMapValue - strategy choice for foldable maps: asserts that afoldable map above threshold lands on
PrebuiltHashExecutor, belowthreshold falls back to
LinearExecutor, and that unsupported key types(e.g.
BinaryType) always land onLinearExecutorregardless ofthreshold.
runMapColvariant so the non-foldable shape ismeasured alongside the existing foldable-literal cases (local results
posted below;
*-results.txtshould be regenerated on CI for consistenthardware).
Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Opus 4.7