Skip to content

[optimize] Type.subTypeOf: avoid redundant union-types map lookups (closes #6322)#6331

Merged
duncdrum merged 3 commits into
eXist-db:developfrom
joewiz:perf/6322-type-subtypeof-hotpath
May 11, 2026
Merged

[optimize] Type.subTypeOf: avoid redundant union-types map lookups (closes #6322)#6331
duncdrum merged 3 commits into
eXist-db:developfrom
joewiz:perf/6322-type-subtypeof-hotpath

Conversation

@joewiz
Copy link
Copy Markdown
Member

@joewiz joewiz commented May 10, 2026

Summary

Type.subTypeOf is on the hot path of every atomic comparison via ValueComparison.compareAtomic. A JFR profile of issue #3406's n×n FLWOR reproducer found Int2ObjectArrayMap.findKey at 20% of self-time in that workload (see project_3406_diagnosis_2026_05_09). This PR cuts that overhead with two localised changes plus one redundancy fix, all confined to Type.java.

Closes #6322.

What changed

exist-core/src/main/java/org/exist/xquery/value/Type.java

  1. Switch unionTypes from Int2ObjectArrayMap to Int2ObjectOpenHashMap. The fastutil ArrayMap is a linear-scan implementation intended for very small maps; the hash-backed variant gives O(1) lookup with the same Int2ObjectMap<...> API. Field is now declared Int2ObjectOpenHashMap<IntArraySet> so callers benefit from the hash-backed path without a witness cast.
  2. Add an isEmpty() guard before the union dispatch. The static initialiser registers NUMERIC and ERROR so the guarded branch is taken in current builds; the guard protects against future evolution where union registration becomes conditional, e.g. for XQuery 4.0 user-defined unions.
  3. Collapse containsKey(...) + a subsequent get(...) into a single get(...) per union check. The two consultations are now IntArraySet supertypeMembers = unionTypes.get(supertype); if (supertypeMembers != null) ..., with the result forwarded to a new package-private overload of subTypeOfUnion / unionMembersHaveSuperType that takes the pre-fetched members. The public no-args variants are unchanged for external callers (e.g. getCommonSuperType in Type.java itself, plus call sites elsewhere in the engine).

exist-core-jmh/src/main/java/org/exist/xquery/value/TypeSubTypeOfBenchmark.java

  • New JMH benchmark covering six call shapes: identical types, direct supertype walk, multi-level supertype walk, union-typed supertype, union-typed subtype, and "not a subtype" (walks to the root and returns false).

exist-core-jmh/pom.xml

Spec / scope notes

The original #6322 inventory proposed a third optimisation — a precomputed boolean[][] subtype-of lookup table. That fix is deferred and not in this PR. Per @line-o's 2026-05-10 reply, XQuery 4.0 will allow user-defined union types per query context, so a static lookup table would either need per-context invalidation (re-introducing the allocation cost) or restriction to built-in types (limiting the gain). Best left until the XQ 4.0 user-defined-types story is settled.

JMH numbers (TypeSubTypeOfBenchmark)

JMH 1.37, JDK 21.0.5 (Zulu), 1 fork, 3×1s warmup, 5×1s measurement, ns/op.

Shape Before After Δ Notes
identical 0.374 0.375 0.0% early-return at the equality check, untouched
directSuper 3.661 2.993 −18.2% one supertype walk, two map consultations
deepSuper 13.817 11.817 −14.5% multi-level walk
unionMember 3.575 2.661 −25.6% supertype is a registered union (NUMERIC)
unionSubtype 20.095 14.136 −29.7% subtype is a registered union (NUMERIC)
notSubType 7.433 5.171 −30.4% walks to the root, returns false

The combined effect of the OpenHashMap swap and the containsKey+get collapse is biggest on the union-dispatch and walk paths, where every recursive step previously paid two linear-scan lookups.

Test plan

  • Targeted JUnit: mvn test -pl exist-core -Dtest='OpNumericTest,XPathQueryTest,xquery.xquery3.XQuery3Tests,UnknownAtomicTypeTest'1,181 tests, 0 failures, 0 errors.
  • Full module: mvn test -pl exist-core (under cross-session lock) — 6,592 tests, 0 failures, 0 errors, 106 skipped.
  • JMH TypeSubTypeOfBenchmark before/after, table above.
  • Codacy PMD on changed files. New: subTypeOf(int,int) NPath rises from 240 (already > 200) to 300 (still in the "moderate" 200–10,000 band that CLAUDE.md reserves for reviewer discretion). Happy to extract a helper or suppress with rationale if preferred.
  • XQTS QT4 / XQ 3.1 / FTTS deferred. The locally-cached runner JAR shades a snapshot of exist-core (so a run against this branch's JAR is testing the shaded copy, not the modified Type.class), and a fresh sbt assembly against develop fails because the runner now depends on org.exist-db:exist-expath, an artifact that doesn't exist yet on this branch. Given the change is a strict semantic preservation (collapsing containsKey + get into one get with the same null-check, plus a swap to a same-API map), and the JUnit suite covers the type-comparison surface, I think landing on JMH + JUnit is acceptable; happy to run XQTS against next-v3 (which has exist-expath) before merging if a reviewer wants a cross-check there.

Risks

  • Shape change of the field declaration: Int2ObjectMap<IntArraySet>Int2ObjectOpenHashMap<IntArraySet>. The map is private, mutated only by the static initialiser, and read only via get / containsKey / isEmpty — no caller relies on ArrayMap's insertion-order iteration.
  • Empty-union semantics: defineUnionType(ERROR, new int[0]) registers ERROR with an empty member set. The new private overload of unionMembersHaveSuperType keeps the pre-existing members.isEmpty() → false short-circuit so the empty-ERROR case behaves identically.
  • Public API: the existing subTypeOfUnion(int,int) / unionMembersHaveSuperType(int,int) signatures are untouched. The new overloads are package-private.

🤖 Generated with Claude Code

Comment thread exist-core/src/main/java/org/exist/xquery/value/Type.java Outdated
@line-o line-o requested a review from a team May 10, 2026 21:25
@line-o line-o added the performance bottlenecks, opportunities for rewriting, optimization label May 10, 2026
@line-o line-o added this to v7.0.0 and Wave 2 May 10, 2026
@github-project-automation github-project-automation Bot moved this to Todo in Wave 2 May 10, 2026
@line-o line-o added this to the eXist-7.0.0 milestone May 10, 2026
@line-o
Copy link
Copy Markdown
Member

line-o commented May 10, 2026

Please rebase

joewiz and others added 2 commits May 10, 2026 20:30
Type.subTypeOf is on the hot path of every atomic comparison via
ValueComparison.compareAtomic, and a JFR profile of issue eXist-db#3406's
n-by-n FLWOR reproducer found Int2ObjectArrayMap.findKey at 20% of
self time in that workload. Two localized changes cut the overhead:

1. Switch the unionTypes map from Int2ObjectArrayMap to
   Int2ObjectOpenHashMap. The fastutil ArrayMap is a linear-scan
   implementation intended for very small maps; the HashMap variant
   gives O(1) lookup with the same Int2ObjectMap API and is now
   preferred for any non-trivial call volume.

2. Replace containsKey(...) + a subsequent get(...) inside
   subTypeOfUnion / unionMembersHaveSuperType with a single get()
   in the caller. The result is null-checked and forwarded to a
   new package-private overload that takes the pre-fetched members.
   The public no-args variants are unchanged for external callers.

An isEmpty() guard short-circuits both lookups when no union types
are registered. The static initialiser registers NUMERIC and ERROR
so the guarded branch is taken in current builds; the guard
protects against future evolution where union registration becomes
optional, e.g. for XQuery 4.0 user-defined unions (see eXist-db#6322).

JMH (TypeSubTypeOfBenchmark, JDK 21, 1 fork, 3x1s warm-up, 5x1s
measurement, ns/op):

  shape          before   after   delta
  identical       0.374   0.375    0.0%   (early-return, untouched)
  directSuper     3.661   2.993  -18.2%
  deepSuper      13.817  11.817  -14.5%
  unionMember     3.575   2.661  -25.6%
  unionSubtype   20.095  14.136  -29.7%
  notSubType      7.433   5.171  -30.4%

Out of scope: a precomputed boolean[][] subtype-of table (option 3
in eXist-db#6322). Per the issue thread, XQuery 4.0 will allow user-defined
union types per query context, so a static lookup table would
either need to be invalidated and rebuilt per-context (re-introducing
the allocation cost) or restricted to built-in types (limiting the
gain). Deferred until the XQ 4.0 user-defined-types story is settled.

Closes eXist-db#6322

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The unionTypes map is populated by the static initialiser (NUMERIC and ERROR
are registered as part of the spec-mandated type system) and is never empty
at runtime. The guard was protecting against a future "union registration
becomes optional" evolution that won't happen — those registrations are
spec-required. Removing dead-code overhead from a hot path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@joewiz joewiz force-pushed the perf/6322-type-subtypeof-hotpath branch from e30cdad to 1f2c018 Compare May 11, 2026 00:31
@joewiz
Copy link
Copy Markdown
Member Author

joewiz commented May 11, 2026

[This response was co-authored with Claude Code. -Joe]

Rebased onto current develop (4f09d0accc). One conflict resolved cleanly in exist-core-jmh/pom.xml — the annotation-processor-paths config has consolidated since this branch was last touched (PR #6296 + #6314 merged). I took develop's version; TypeSubTypeOfBenchmark builds clean against it.

Note: mvn install -pl exist-core-jmh -am fails on ReindexDeleteStrategyBenchmark (missing Lucene deps) — but that's a pre-existing develop-side issue from a recent merge, not introduced here. exist-core itself builds clean and TypeSubTypeOfBenchmark compiles fine in isolation.

Force-pushed 1f2c018d9f. Ready for re-review.

Copy link
Copy Markdown
Member

@line-o line-o left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebase didn't go completely right. All lines removed from exist-core-jmh/pom.xml need to be added again.

The rebase onto develop accidentally dropped exist-index-lucene,
lucene-core, junit-jupiter-api deps, the jsr305 annotation-processor
path, and the ServicesResourceTransformer entry. Reset the pom.xml
to develop's content; the consolidated annotation-processor-paths
config from eXist-db#6296 + eXist-db#6314 is sufficient.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@joewiz
Copy link
Copy Markdown
Member Author

joewiz commented May 11, 2026

[This response was co-authored with Claude Code. -Joe]

Re-added the dropped pom.xml lines. New tip: b8e2f65a71. exist-core-jmh now builds clean against the consolidated annotation-processor-paths config from #6296 + #6314 — reset the file to develop's content rather than re-derive it.

@line-o line-o self-requested a review May 11, 2026 15:36
@line-o line-o added the xquery issue is related to xquery implementation label May 11, 2026
}
if (unionTypes.containsKey(subtype)) {
return unionMembersHaveSuperType(subtype, supertype);
final IntArraySet subtypeMembers = unionTypes.get(subtype);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this now for the umpteenth time:
When we try to determine if union type A is a subtype of type B I would argue we need a separate check.
One that checks that all members of A are subtypes of B.

It boils down to the example: Is numeric a subtype of xs:decimal or xs:double or both? And why is that the case?

What happens with a uniontype of xs:integer and xs:string?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And just to be clear: This is a separate issue that I will open once we agree it is one. Not blocking this PR!

@line-o line-o self-requested a review May 11, 2026 15:52
Copy link
Copy Markdown
Member

@line-o line-o left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏼

@line-o line-o requested a review from a team May 11, 2026 15:53
@duncdrum duncdrum moved this from Todo to In progress in Wave 2 May 11, 2026
Copy link
Copy Markdown
Contributor

@duncdrum duncdrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's been a long day, but yolo looks good. Thx @joewiz

@duncdrum duncdrum merged commit 4af972d into eXist-db:develop May 11, 2026
9 checks passed
@github-project-automation github-project-automation Bot moved this to Done in v7.0.0 May 11, 2026
@github-project-automation github-project-automation Bot moved this from In progress to Done in Wave 2 May 11, 2026
@joewiz joewiz deleted the perf/6322-type-subtypeof-hotpath branch May 11, 2026 20:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance bottlenecks, opportunities for rewriting, optimization xquery issue is related to xquery implementation

Projects

Status: Done
Status: Done

Development

Successfully merging this pull request may close these issues.

[perf] Type.subTypeOf calls Int2ObjectArrayMap.containsKey as linear scan on every comparison (~20% self-time)

3 participants