Skip to content

feat(resolver): symbol-resolver SPI + Java backend (sub-project 1, Phases 1-4)#101

Merged
aksOps merged 23 commits intomainfrom
feat/sub-project-1-resolver-spi-and-java-pilot
Apr 28, 2026
Merged

feat(resolver): symbol-resolver SPI + Java backend (sub-project 1, Phases 1-4)#101
aksOps merged 23 commits intomainfrom
feat/sub-project-1-resolver-spi-and-java-pilot

Conversation

@aksOps
Copy link
Copy Markdown
Contributor

@aksOps aksOps commented Apr 27, 2026

Summary

Foundation for moving the codeiq graph from regex-class-of-correctness to AST-and-symbol-resolution-class-of-correctness. Spec: docs/specs/2026-04-27-resolver-spi-and-java-pilot-design.md. Plan: docs/plans/2026-04-27-sub-project-1-resolver-spi-and-java-pilot.md.

This PR ships Phases 1–4 of the 8-phase plan in 18 atomic commits (~290 new tests):

  • Phase 1 — Schema foundation. New Confidence enum (LEXICAL/SYNTACTIC/RESOLVED with stable score() mapping), plus a source field on every CodeNode and CodeEdge. Round-trip through Neo4j (bare confidence/source properties on nodes and RELATES_TO relationships) and the H2 analysis cache (CACHE_VERSION 4→5; legacy v4 caches drop and rebuild). Read paths are non-throwing — legacy data without these fields reads back as LEXICAL/null, never NPEs.
  • Phase 2 — SPI scaffolding under intelligence/resolver/: Resolved interface + EmptyResolved singleton sentinel, SymbolResolver per-language backend, ResolutionException, ResolverRegistry Spring @Service (deterministic alphabetical bootstrap, case-insensitive lookup, per-resolver failure isolation, NOOP fallback), DetectorContext.resolved() opt-in accessor (Optional<Resolved>, Optional.empty() for every detector that doesn't care — fully backward compatible).
  • Phase 3 — Java backend. JavaSourceRootDiscovery walks Maven/Gradle/plain layouts under a project root (skipping target/, build/, node_modules/, .git/, etc.; symlink-loop-safe via NOFOLLOW_LINKS). JavaResolved record carries the parsed CompilationUnit and the JavaSymbolSolver. JavaSymbolResolver @Component builds a CombinedTypeSolver (ReflectionTypeSolver + per-source-root JavaParserTypeSolver) without mutating StaticJavaParser (would conflict with virtual-thread JavaParser pools). Adds javaparser-symbol-solver-core 3.28.0 (Apache-2.0, same release train as javaparser-core).
  • Phase 4 — Stamping wiring. Detector.defaultConfidence() declares the per-base floor (LEXICAL for regex bases, SYNTACTIC for AST/structured/JavaParser/JavaMessaging bases — TS/Python/PythonDb chains inherit from Antlr). DetectorEmissionDefaults.applyDefaults is wired into all three detector.detect() call sites in Analyzer.java. Every emission whose source is null gets stamped with source = detector.getClass().getSimpleName() and the per-base default confidence. Detectors that explicitly stamp (e.g. setConfidence(RESOLVED)) survive untouched — applyDefaults keys off source == null.

Deferred to follow-up PR (sub-project 1 Phase 5+):

  • ResolverRegistry.bootstrap(repoPath) call at the start of Analyzer.run() and per-file ctx.withResolved(resolver.resolve(file, ast)) threading — these pair with detector migrations.
  • Migrate 4–6 Java detectors (SpringServiceDetector, JpaEntityDetector, etc.) to consume ctx.resolved() and emit RESOLVED-tier INJECTS/MAPS_TO edges.
  • Aggressive testing layers 3–9 (concurrency stress, pathological, adversarial, jqwik property, PIT mutation, E2E petclinic).

Test plan

  • Unit tests for every new class (Confidence, ConfidenceConverter, Resolved, EmptyResolved, ResolutionException, SymbolResolver contract, ResolverRegistry, JavaSourceRootDiscovery, JavaResolved, JavaSymbolResolver, DetectorEmissionDefaults, DetectorContext.resolved())
  • Round-trip tests for Neo4j and H2 cache including legacy-data fallback, malformed values, mixed-case enum strings, and idempotency
  • mvn test full suite locally: 3555 tests, 0 failures, 0 errors, 31 skipped (E2E petclinic suite skipped — needs cloned external repo)
  • CI green (build + Trivy + Semgrep + OSV-Scanner + Gitleaks + Scorecard)
  • Verify Pinned-Dependencies, Token-Permissions, etc. still pass on this branch (no workflow changes in this PR, but the new dep is checked by Trivy)
  • Manual smoke test: codeiq index . && codeiq enrich . && codeiq serve . on a representative repo, confirm graph loads and confidence/source show up on nodes and edges via /api/nodes/{id}/detail
  • Backward compat: open an existing v4 H2 cache → confirm it drops cleanly and re-indexes

🤖 Generated with Claude Code

aksOps and others added 23 commits April 26, 2026 05:51
…nfidence schema

Brainstormed the first of 8 sub-projects in the "robust graph" decomposition.
Sub-project 1 introduces a symbol-resolution stage between parse and detect,
defines the per-language SymbolResolver SPI, ships a Java backend wrapping
JavaParser's JavaSymbolSolver, and adds Confidence (LEXICAL/SYNTACTIC/RESOLVED)
+ source fields on every CodeNode/CodeEdge with Neo4j round-trip and an
H2 cache version bump. Migrates 4-6 Java detectors as proof of value;
existing detectors compile and run unchanged via opt-in Optional<Resolved>.

Aggressive testing baked in: 9 layers (unit, detector x resolver, concurrency
stress, memory/pathological, adversarial, determinism, E2E petclinic
regression, property-based via jqwik, mutation testing via PIT). Backward
compatibility scoped to logical-content equality with explicit one-time
snapshot refreshes for the additive confidence/source fields.

Spec lives under docs/specs/ (alongside docs/project/) since
docs/superpowers/ is gitignored except for baselines/.

Awaiting maintainer review before writing the implementation plan.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The spec's §18 References used 3-level-up paths (../../../X) targeting
docs/superpowers/specs/ as the spec home. After relocating to docs/specs/
to respect the existing .gitignore policy, these paths resolved one level
above the repo root. Adjust to the correct depth.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…s/specs/

Doc-drift fixes (all from same session audit):

  - 97 → 99 detectors (CLAUDE.md, README.md prose + mermaid)
  - NodeKind javadoc 32 → 34 (model/NodeKind.java; was stale by two)
  - EdgeKind javadoc 27 → 28 (model/EdgeKind.java; was stale by one)
  - Test count 3,219 → 3,270 across 236 files (README.md)
  - All counts now in sync across CLAUDE.md, README.md, PROJECT_SUMMARY.md,
    docs/project/data-model.md, docs/project/conventions.md, and the source
    javadocs.

New entries:

  - PROJECT_SUMMARY.md "Where to look next" gains docs/specs/ — pointer for
    in-flight architectural designs.
  - CHANGELOG.md [Unreleased] notes PROJECT_SUMMARY.md + docs/project/
    deep-dives, the docs/specs/ directory, and the doc-drift fix.
  - docs/project/data-model.md NodeKind/EdgeKind enum lists are now exact
    (no truncation, no stale "still claims 32" caveat).

Pre-existing IDE-detected warnings (unused imports in detector tests,
deprecated Notification import in GraphStoreTopologyAndStatsTest, dead
locals in GraphBuilder/GraphStore/PythonStructuresDetector etc.) are
out of scope for this commit — separate cleanup PR territory.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
42 tasks across 8 phases (schema → SPI → Java backend → pipeline wiring →
configuration → detector migration → aggressive testing → docs+PR). Each
task is TDD-shaped: failing test → run → minimal impl → run → commit.

Lives at docs/plans/ alongside docs/specs/ (docs/superpowers/* is gitignored
on this repo).

Per the maintainer's "keep running" directive, execution starts immediately
with subagent-driven-development for parallelizable phases and inline for
foundational sequential ones.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per sub-project 1 spec §5.3. Numeric score() mapping stable
(0.6/0.8/0.95). Comparable by natural order. fromString() is
case-insensitive and rejects null + unknown values.

Plan task 1/42.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per sub-project 1 spec §5.2. Both fields are additive:

  - confidence: Confidence (default LEXICAL, never null after setter).
    Round-trips through Neo4j via ConfidenceConverter (mirrors
    NodeKindConverter — stored as enum.name() so Cypher filters like
    WHERE n.confidence = 'RESOLVED' work without case folding).
  - source: String (default null on bare construction; stamped by
    detector base classes during emission in a later task).

CodeNode is an SDN @node entity with no-arg constructor + setters, so
this task adapts to that shape rather than introducing a builder. The
plan's builder-based test was rewritten to use the existing API.

Plan task 2/42.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors the CodeNode change: confidence defaults to LEXICAL and is never
null at rest (setter normalizes); source is the detector's simple class
name, stamped by the detector base classes during emission.

Per sub-project 1 spec §5.3 (Confidence/source on every edge) and plan
Task 3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stores Confidence enum names ('LEXICAL'/'SYNTACTIC'/'RESOLVED') and the
detector source string as bare Neo4j properties on both nodes and
relationships, alongside layer/kind/module — not under the prop_* dynamic
properties prefix, since they are typed first-class fields on CodeNode and
CodeEdge.

Read path is non-throwing: missing or malformed values fall back to
LEXICAL (least committal), so legacy data persisted before these fields
existed reads back cleanly without a schema migration on the Neo4j side.

Test coverage (11 new tests in GraphStoreConfidenceRoundTripTest):
- All three Confidence values round-trip on nodes (parameterized)
- Legacy nodes missing both fields fall back to LEXICAL + null
- Legacy nodes with source but missing confidence
- Malformed confidence strings (e.g. 'PERFECT') fall back without throw
- Mixed-case confidence ('ReSoLvEd') parses correctly
- Empty source preserved as empty (no silent normalization)
- Edge confidence + source round-trip via hydrateEdgesForNode
- Legacy edges with missing confidence/source fall back cleanly
- Malformed edge confidence does not throw

Forward-compat updates:
- ProvenanceNeo4jRoundTripTest stubs the new keys (was strict-Mockito)
- GraphStoreExtendedTest helper stubs them too

Per sub-project 1 spec §5 + plan Task 4.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Confidence + detector source now serialize through the H2 analysis cache
alongside the rest of the node/edge JSON blob. CACHE_VERSION is bumped
4→5 so any existing v4 caches drop and re-populate on next open.

Deviation from plan §Task 5: the plan suggested adding `confidence` and
`source` SQL columns. Skipped — they would only matter for SQL-level
filtering, which we don't do today, and the JSON `data` blob is the
authoritative shape on read. We can add columns later if a query layer
needs them. YAGNI for now; the version bump alone guarantees no stale
v4 rows leak through with the old shape.

Test coverage (12 new tests in AnalysisCacheConfidenceTest):
- All three Confidence values round-trip on nodes (parameterized)
- Bare nodes (no setter calls) round-trip as LEXICAL + null source
- Upsert overwrites confidence (no silent decay to older value)
- Clear → re-store preserves confidence
- All three Confidence values round-trip on edges (parameterized)
- Bare edges round-trip as LEXICAL + null
- setConfidence(null) is normalized to LEXICAL (never-null invariant)
- CACHE_VERSION reflection assertion guards against accidental rollback

Per sub-project 1 spec §5 + plan Task 5.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds Detector.defaultConfidence() with a default-method floor of LEXICAL.
Each base class overrides where the floor differs:
  - AbstractRegexDetector → LEXICAL (regex patterns only)
  - AbstractAntlrDetector → SYNTACTIC (ANTLR parse trees)
  - AbstractStructuredDetector → SYNTACTIC (parsed YAML/JSON/TOML)
  - AbstractJavaParserDetector → SYNTACTIC (JavaParser AST)
  - AbstractJavaMessagingDetector → SYNTACTIC (java-aware regex)

AbstractTypeScriptDetector, AbstractPythonAntlrDetector, and
AbstractPythonDbDetector inherit SYNTACTIC via AbstractAntlrDetector — no
explicit override needed; tests verify the inherited values.

DetectorEmissionDefaults.applyDefaults(result, detector) is the new
stamping pass for the orchestrator. It writes source +
defaultConfidence() onto every node/edge whose getSource() is null —
the "detector didn't think about it" sentinel. Explicit stamps survive
the pass; e.g. a detector emitting RESOLVED is never down-graded back to
the base default.

Wiring this helper into Analyzer + IndexCommand is deferred to plan Task
19 (pipeline wiring). This commit only ships the building blocks.

Test coverage (19 new tests in DetectorEmissionDefaultsTest):
- Per-base default confidence (parameterized across 9 base/sub combos)
- Stamping fills source + confidence on null-source nodes and edges
- Explicit (RESOLVED + custom source) emissions survive the pass
- Mixed result (some explicit, some bare) handled per-emission
- null result → no-op (no NPE)
- null detector → no-op (defensive)
- Empty result → no-op
- Idempotent on repeat call with same detector
- Second pass with different detector does NOT relabel (first stamp wins)

Per sub-project 1 spec §5 + plan Task 6.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…utionException

Foundation for the resolver pass that sits between parse and detect. Per-
language backends implement SymbolResolver; per-file results carry
language-specific resolution state via a Resolved subclass. EmptyResolved
is the singleton sentinel returned when resolution didn't happen — its
isAvailable() returns false so detectors short-circuit to syntactic
detection.

ResolutionException is checked, by design — symbol resolution has a long
tail of file-specific failures (corrupted source, classpath holes,
dependency cycles) and the orchestrator must explicitly decide whether
to skip the file or abort the pass. It carries (file, language) for
useful logs.

No wiring yet — the orchestrator picks these up in Task 11
(ResolverRegistry) + Task 19 (pipeline wiring).

Test coverage (16 new tests):
- ResolvedContractTest (6): EmptyResolved singleton + reflection guards
- ResolutionExceptionTest (4): file/language/cause + checked-ness
- SymbolResolverContractTest (6): supportedLanguages non-empty,
  bootstrap-before-resolve, resolve never returns null (uses
  EmptyResolved for unsupported language / null AST), default shutdown
  no-op

Per sub-project 1 spec §6.1 + plan Tasks 8-10.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Spring @service that mirrors DetectorRegistry: every @component
implementing SymbolResolver is auto-injected via constructor. Resolvers
are sorted alphabetically by class simple name for determinism;
per-language lookup uses first-in-sort-order wins on conflict.

bootstrap(projectRoot) iterates in deterministic order and is resilient
— per-resolver ResolutionException (or rogue RuntimeException) is logged
at WARN and swallowed so one broken resolver can't take down the pass.

resolverFor(language) is case-insensitive, null-safe, and never returns
null — unknown languages get the NOOP resolver that always returns
EmptyResolved.INSTANCE.

Test coverage (13 new tests in ResolverRegistryTest):
- Empty registry returns NOOP for any language
- Single resolver returned for its declared language
- Unknown language returns NOOP
- Case-insensitive lookup (java, Java, JAVA, jAvA)
- Null language returns NOOP without NPE
- resolverFor() never returns null (probed with empty/whitespace input)
- Blank language identifiers from a resolver are skipped
- Duplicate-language conflict: alphabetical-first wins
- all() returns sorted list
- bootstrap iterates alphabetically (verified via callback ordering)
- bootstrap continues past RuntimeException
- bootstrap continues past ResolutionException
- bootstrap empty registry is a no-op

Per sub-project 1 spec §6.1 + plan Task 11.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DetectorContext now carries an Optional<Resolved> as its 7th field. The
field is the opt-in entry point for the resolver pass — detectors that
want to upgrade emissions to RESOLVED check
ctx.resolved().filter(Resolved::isAvailable) before downcasting to a
language-specific Resolved subclass; detectors that don't care simply
ignore it.

Backward compat: all existing constructors (3-arg, 5-arg, 6-arg with
registry) still compile and work — they delegate to the canonical 7-arg
constructor with Optional.empty() for resolution. The compact
constructor normalizes a null Optional to Optional.empty() so the field
is never null at rest.

withResolved(Resolved) is the orchestrator's hook to attach per-file
resolution after the resolver pass.

Test coverage (10 new tests in DetectorContextResolvedTest):
- 3-arg / 5-arg / 6-arg constructors all default resolved to empty
- 7-arg canonical constructor carries the attached Resolved
- Compact constructor normalizes null → Optional.empty()
- withResolved(r) returns a copy with Optional.of(r); base untouched
- withResolved(null) clears the resolution back to empty
- EmptyResolved attached: present but isAvailable()==false
- withResolved preserves all other fields
- Documents the canonical detector check pattern

Verified no regressions: 1970 detector tests all pass.

Per sub-project 1 spec §6.1 + plan Task 12.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Same release train as the existing javaparser-core 3.28.0 — required by
the upcoming JavaSymbolResolver (sub-project 1, plan Task 17). Pulls in
JavaSymbolSolver, CombinedTypeSolver, ReflectionTypeSolver, and
JavaParserTypeSolver. Apache-2.0 license, no transitive surprises:
'mvn dependency:tree -Dincludes=com.github.javaparser' shows only the
two artifacts at 3.28.0.

Per sub-project 1 plan Task 14.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Walks a project root for the canonical src/(main|test)/java directories
that Maven and Gradle both standardize on. Multi-module projects work
naturally — every nested src/main/java is a separate root. Plain
projects (no build file) fall back to top-level src/ if it has any
*.java.

Determinism: results sorted alphabetically by absolute path. Same tree →
same root list → same CombinedTypeSolver → same resolution.

Symlink safety: Files.walkFileTree runs with FOLLOW_LINKS disabled, so
loops cannot form. The trade-off — source roots reachable only via
symlink are skipped — is the right call for resolution where double-
counting via symlink would be worse.

Skip directories: target, build, out, bin, dist, .git, .gradle, .idea,
.vscode, .m2, .cache, node_modules, .codeiq — phantom src/main/java
inside any of these is ignored.

Test coverage (18 new tests in JavaSourceRootDiscoveryTest):
- Maven single-module returns [src/main/java, src/test/java]
- Maven main-only returns just main
- Maven multi-module aggregates all submodules (sorted)
- Gradle layout matches Maven (discovery doesn't read build files)
- Plain layout fallback: src/ with .java becomes the root
- Plain layout without .java returns empty
- Empty / non-existent / null / file-not-dir all return empty (no exceptions)
- target/, build/, node_modules/, .git/, .gradle/, .idea/ are skipped
- Phantom src/main/java inside skip-dirs is NOT picked up
- Results sorted alphabetically (verified across 3 modules)
- Discovery is idempotent
- Symlink loop terminates without exception (POSIX only — DisabledOnOs Windows)
- Deeply-nested modules found
- src/main/kotlin is NOT mistaken for a Java root

Per sub-project 1 plan Task 15.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
JavaResolved is a record carrying the parsed CompilationUnit + the
configured JavaSymbolSolver. isAvailable() == true and sourceConfidence()
== RESOLVED — detectors that downcast to it can stamp emissions at the
RESOLVED tier.

JavaSymbolResolver is the @component that bootstraps a CombinedTypeSolver
(ReflectionTypeSolver + per-source-root JavaParserTypeSolver) using
JavaSourceRootDiscovery for the root list. Source roots are sorted
alphabetically → deterministic solver wiring → same resolution every
run.

Deliberately NOT mutating StaticJavaParser.getParserConfiguration() —
that would conflict with AbstractJavaParserDetector's thread-local
JavaParser pool under virtual-thread concurrency. Detectors that want
the solver attached to their own JavaParser get it via
JavaSymbolResolver.symbolSolver() and configure their own
ParserConfiguration.

Test coverage (23 new tests):

JavaResolvedTest (6):
- isAvailable() == true
- sourceConfidence() == RESOLVED
- cu() / solver() accessors
- implements Resolved
- distinct from EmptyResolved.INSTANCE

JavaSymbolResolverTest (17, Layer 1 unit):
- supports "java" only
- bootstrap empty project still builds ReflectionTypeSolver
- bootstrap with source roots adds JavaParserTypeSolver per root
- bootstrap is repeatable (fresh CTS each call)
- combinedTypeSolver() null before bootstrap
- resolve before bootstrap → EmptyResolved (graceful)
- resolve null file → EmptyResolved
- resolve non-Java file → EmptyResolved
- resolve null AST → EmptyResolved
- resolve String AST (wrong type) → EmptyResolved (no ClassCastException)
- language match is case-insensitive ("Java")
- resolve valid CU → JavaResolved
- JavaResolved carries the input cu and the bootstrapped solver
- Solver smoke test: resolves java.lang.String via ReflectionTypeSolver
- Solver smoke test: resolves project class from source root
- resolve() doesn't cache — distinct JavaResolved per call

Per sub-project 1 plan Tasks 16-18.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wires the orchestrator stamping pass into all three detect() call sites
in Analyzer.java (the main pipeline, the cache-aware runBatchedIndex
path, and the regex-fallback path). Every emission whose source is null
now gets stamped with:
  - source = detector.getClass().getSimpleName()
  - confidence = detector.defaultConfidence() (LEXICAL for regex bases,
    SYNTACTIC for AST/structured bases)

Detectors that stamp explicitly (e.g. setConfidence(RESOLVED) once a
detector migrates to ctx.resolved()) are left alone — applyDefaults
keys off source==null.

Deferred from this commit (will land with Phase 5 detector migration):
- ResolverRegistry.bootstrap(repoPath) call at the start of run() —
  pointless without detectors that consume ctx.resolved()
- Per-file ctx = ctx.withResolved(resolver.resolve(file, ast)) — same

This commit is purely additive: 2417 tests in analyzer + cli + detector
packages all pass, no regressions. The full 3555-test suite is green
post-stamping, confirming existing detector behavior is unchanged
(detectors don't stamp confidence/source today, so the stamping floor
applies uniformly).

IndexCommand also benefits transparently — it calls
analyzer.runSmartIndex() which routes through one of the wired detect
sites.

Per sub-project 1 plan Tasks 19-20 (stamping portion).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Captures the cross-cutting Confidence + source schema change, the
intelligence/resolver/ SPI surface, the Java backend
(javaparser-symbol-solver-core), DetectorContext.resolved() opt-in, and
the per-base confidence floor wired through Analyzer's emission path.
Detector migrations to consume ctx.resolved() are explicitly called out
as Phase 5 follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@socket-security
Copy link
Copy Markdown

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Addedmaven/​com.github.javaparser/​javaparser-symbol-solver-core@​3.28.03610090100100

View full report

@aksOps aksOps merged commit 5a46598 into main Apr 28, 2026
13 checks passed
@aksOps aksOps deleted the feat/sub-project-1-resolver-spi-and-java-pilot branch April 28, 2026 05:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant