[optimize] Type.subTypeOf: avoid redundant union-types map lookups (closes #6322)#6331
Conversation
|
Please rebase |
Type.subTypeOf is on the hot path of every atomic comparison via ValueComparison.compareAtomic, and a JFR profile of issue eXist-db#3406's n-by-n FLWOR reproducer found Int2ObjectArrayMap.findKey at 20% of self time in that workload. Two localized changes cut the overhead: 1. Switch the unionTypes map from Int2ObjectArrayMap to Int2ObjectOpenHashMap. The fastutil ArrayMap is a linear-scan implementation intended for very small maps; the HashMap variant gives O(1) lookup with the same Int2ObjectMap API and is now preferred for any non-trivial call volume. 2. Replace containsKey(...) + a subsequent get(...) inside subTypeOfUnion / unionMembersHaveSuperType with a single get() in the caller. The result is null-checked and forwarded to a new package-private overload that takes the pre-fetched members. The public no-args variants are unchanged for external callers. An isEmpty() guard short-circuits both lookups when no union types are registered. The static initialiser registers NUMERIC and ERROR so the guarded branch is taken in current builds; the guard protects against future evolution where union registration becomes optional, e.g. for XQuery 4.0 user-defined unions (see eXist-db#6322). JMH (TypeSubTypeOfBenchmark, JDK 21, 1 fork, 3x1s warm-up, 5x1s measurement, ns/op): shape before after delta identical 0.374 0.375 0.0% (early-return, untouched) directSuper 3.661 2.993 -18.2% deepSuper 13.817 11.817 -14.5% unionMember 3.575 2.661 -25.6% unionSubtype 20.095 14.136 -29.7% notSubType 7.433 5.171 -30.4% Out of scope: a precomputed boolean[][] subtype-of table (option 3 in eXist-db#6322). Per the issue thread, XQuery 4.0 will allow user-defined union types per query context, so a static lookup table would either need to be invalidated and rebuilt per-context (re-introducing the allocation cost) or restricted to built-in types (limiting the gain). Deferred until the XQ 4.0 user-defined-types story is settled. Closes eXist-db#6322 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The unionTypes map is populated by the static initialiser (NUMERIC and ERROR are registered as part of the spec-mandated type system) and is never empty at runtime. The guard was protecting against a future "union registration becomes optional" evolution that won't happen — those registrations are spec-required. Removing dead-code overhead from a hot path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
e30cdad to
1f2c018
Compare
|
[This response was co-authored with Claude Code. -Joe] Rebased onto current Note: Force-pushed |
line-o
left a comment
There was a problem hiding this comment.
Rebase didn't go completely right. All lines removed from exist-core-jmh/pom.xml need to be added again.
The rebase onto develop accidentally dropped exist-index-lucene, lucene-core, junit-jupiter-api deps, the jsr305 annotation-processor path, and the ServicesResourceTransformer entry. Reset the pom.xml to develop's content; the consolidated annotation-processor-paths config from eXist-db#6296 + eXist-db#6314 is sufficient. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| } | ||
| if (unionTypes.containsKey(subtype)) { | ||
| return unionMembersHaveSuperType(subtype, supertype); | ||
| final IntArraySet subtypeMembers = unionTypes.get(subtype); |
There was a problem hiding this comment.
Looking at this now for the umpteenth time:
When we try to determine if union type A is a subtype of type B I would argue we need a separate check.
One that checks that all members of A are subtypes of B.
It boils down to the example: Is numeric a subtype of xs:decimal or xs:double or both? And why is that the case?
What happens with a uniontype of xs:integer and xs:string?
There was a problem hiding this comment.
And just to be clear: This is a separate issue that I will open once we agree it is one. Not blocking this PR!
Summary
Type.subTypeOfis on the hot path of every atomic comparison viaValueComparison.compareAtomic. A JFR profile of issue #3406's n×n FLWOR reproducer foundInt2ObjectArrayMap.findKeyat 20% of self-time in that workload (see project_3406_diagnosis_2026_05_09). This PR cuts that overhead with two localised changes plus one redundancy fix, all confined toType.java.Closes #6322.
What changed
exist-core/src/main/java/org/exist/xquery/value/Type.javaunionTypesfromInt2ObjectArrayMaptoInt2ObjectOpenHashMap. The fastutilArrayMapis a linear-scan implementation intended for very small maps; the hash-backed variant gives O(1) lookup with the sameInt2ObjectMap<...>API. Field is now declaredInt2ObjectOpenHashMap<IntArraySet>so callers benefit from the hash-backed path without a witness cast.isEmpty()guard before the union dispatch. The static initialiser registersNUMERICandERRORso the guarded branch is taken in current builds; the guard protects against future evolution where union registration becomes conditional, e.g. for XQuery 4.0 user-defined unions.containsKey(...)+ a subsequentget(...)into a singleget(...)per union check. The two consultations are nowIntArraySet supertypeMembers = unionTypes.get(supertype); if (supertypeMembers != null) ..., with the result forwarded to a new package-private overload ofsubTypeOfUnion/unionMembersHaveSuperTypethat takes the pre-fetched members. The public no-args variants are unchanged for external callers (e.g.getCommonSuperTypeinType.javaitself, plus call sites elsewhere in the engine).exist-core-jmh/src/main/java/org/exist/xquery/value/TypeSubTypeOfBenchmark.javaexist-core-jmh/pom.xmlannotationProcessorPathsfix from [test] Add JMH axis-evaluator benchmark (PR #6302 follow-up) #6324 (jmh-generator-annprocess otherwise gets shadowed by the parent pom, leavingMETA-INF/BenchmarkListempty). [test] Add JMH axis-evaluator benchmark (PR #6302 follow-up) #6324 is still open at the time of writing; this PR pulls the same one-block override so the benchmark is runnable today. If [test] Add JMH axis-evaluator benchmark (PR #6302 follow-up) #6324 lands first, the merge will collapse the duplicate config cleanly.Spec / scope notes
The original #6322 inventory proposed a third optimisation — a precomputed
boolean[][]subtype-of lookup table. That fix is deferred and not in this PR. Per @line-o's 2026-05-10 reply, XQuery 4.0 will allow user-defined union types per query context, so a static lookup table would either need per-context invalidation (re-introducing the allocation cost) or restriction to built-in types (limiting the gain). Best left until the XQ 4.0 user-defined-types story is settled.JMH numbers (
TypeSubTypeOfBenchmark)JMH 1.37, JDK 21.0.5 (Zulu), 1 fork, 3×1s warmup, 5×1s measurement, ns/op.
identicaldirectSuperdeepSuperunionMemberunionSubtypenotSubTypeThe combined effect of the OpenHashMap swap and the
containsKey+getcollapse is biggest on the union-dispatch and walk paths, where every recursive step previously paid two linear-scan lookups.Test plan
mvn test -pl exist-core -Dtest='OpNumericTest,XPathQueryTest,xquery.xquery3.XQuery3Tests,UnknownAtomicTypeTest'— 1,181 tests, 0 failures, 0 errors.mvn test -pl exist-core(under cross-session lock) — 6,592 tests, 0 failures, 0 errors, 106 skipped.TypeSubTypeOfBenchmarkbefore/after, table above.subTypeOf(int,int)NPath rises from 240 (already > 200) to 300 (still in the "moderate" 200–10,000 band that CLAUDE.md reserves for reviewer discretion). Happy to extract a helper or suppress with rationale if preferred.exist-core(so a run against this branch's JAR is testing the shaded copy, not the modifiedType.class), and a freshsbt assemblyagainst develop fails because the runner now depends onorg.exist-db:exist-expath, an artifact that doesn't exist yet on this branch. Given the change is a strict semantic preservation (collapsingcontainsKey + getinto onegetwith the same null-check, plus a swap to a same-API map), and the JUnit suite covers the type-comparison surface, I think landing on JMH + JUnit is acceptable; happy to run XQTS againstnext-v3(which hasexist-expath) before merging if a reviewer wants a cross-check there.Risks
Int2ObjectMap<IntArraySet>→Int2ObjectOpenHashMap<IntArraySet>. The map isprivate, mutated only by the static initialiser, and read only viaget/containsKey/isEmpty— no caller relies onArrayMap's insertion-order iteration.defineUnionType(ERROR, new int[0])registersERRORwith an empty member set. The new private overload ofunionMembersHaveSuperTypekeeps the pre-existingmembers.isEmpty() → falseshort-circuit so the empty-ERRORcase behaves identically.subTypeOfUnion(int,int)/unionMembersHaveSuperType(int,int)signatures are untouched. The new overloads are package-private.🤖 Generated with Claude Code