Conversation
…nfidence schema Brainstormed the first of 8 sub-projects in the "robust graph" decomposition. Sub-project 1 introduces a symbol-resolution stage between parse and detect, defines the per-language SymbolResolver SPI, ships a Java backend wrapping JavaParser's JavaSymbolSolver, and adds Confidence (LEXICAL/SYNTACTIC/RESOLVED) + source fields on every CodeNode/CodeEdge with Neo4j round-trip and an H2 cache version bump. Migrates 4-6 Java detectors as proof of value; existing detectors compile and run unchanged via opt-in Optional<Resolved>. Aggressive testing baked in: 9 layers (unit, detector x resolver, concurrency stress, memory/pathological, adversarial, determinism, E2E petclinic regression, property-based via jqwik, mutation testing via PIT). Backward compatibility scoped to logical-content equality with explicit one-time snapshot refreshes for the additive confidence/source fields. Spec lives under docs/specs/ (alongside docs/project/) since docs/superpowers/ is gitignored except for baselines/. Awaiting maintainer review before writing the implementation plan. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The spec's §18 References used 3-level-up paths (../../../X) targeting docs/superpowers/specs/ as the spec home. After relocating to docs/specs/ to respect the existing .gitignore policy, these paths resolved one level above the repo root. Adjust to the correct depth. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…s/specs/
Doc-drift fixes (all from same session audit):
- 97 → 99 detectors (CLAUDE.md, README.md prose + mermaid)
- NodeKind javadoc 32 → 34 (model/NodeKind.java; was stale by two)
- EdgeKind javadoc 27 → 28 (model/EdgeKind.java; was stale by one)
- Test count 3,219 → 3,270 across 236 files (README.md)
- All counts now in sync across CLAUDE.md, README.md, PROJECT_SUMMARY.md,
docs/project/data-model.md, docs/project/conventions.md, and the source
javadocs.
New entries:
- PROJECT_SUMMARY.md "Where to look next" gains docs/specs/ — pointer for
in-flight architectural designs.
- CHANGELOG.md [Unreleased] notes PROJECT_SUMMARY.md + docs/project/
deep-dives, the docs/specs/ directory, and the doc-drift fix.
- docs/project/data-model.md NodeKind/EdgeKind enum lists are now exact
(no truncation, no stale "still claims 32" caveat).
Pre-existing IDE-detected warnings (unused imports in detector tests,
deprecated Notification import in GraphStoreTopologyAndStatsTest, dead
locals in GraphBuilder/GraphStore/PythonStructuresDetector etc.) are
out of scope for this commit — separate cleanup PR territory.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
42 tasks across 8 phases (schema → SPI → Java backend → pipeline wiring → configuration → detector migration → aggressive testing → docs+PR). Each task is TDD-shaped: failing test → run → minimal impl → run → commit. Lives at docs/plans/ alongside docs/specs/ (docs/superpowers/* is gitignored on this repo). Per the maintainer's "keep running" directive, execution starts immediately with subagent-driven-development for parallelizable phases and inline for foundational sequential ones. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per sub-project 1 spec §5.3. Numeric score() mapping stable (0.6/0.8/0.95). Comparable by natural order. fromString() is case-insensitive and rejects null + unknown values. Plan task 1/42. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per sub-project 1 spec §5.2. Both fields are additive:
- confidence: Confidence (default LEXICAL, never null after setter).
Round-trips through Neo4j via ConfidenceConverter (mirrors
NodeKindConverter — stored as enum.name() so Cypher filters like
WHERE n.confidence = 'RESOLVED' work without case folding).
- source: String (default null on bare construction; stamped by
detector base classes during emission in a later task).
CodeNode is an SDN @node entity with no-arg constructor + setters, so
this task adapts to that shape rather than introducing a builder. The
plan's builder-based test was rewritten to use the existing API.
Plan task 2/42.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors the CodeNode change: confidence defaults to LEXICAL and is never null at rest (setter normalizes); source is the detector's simple class name, stamped by the detector base classes during emission. Per sub-project 1 spec §5.3 (Confidence/source on every edge) and plan Task 3. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stores Confidence enum names ('LEXICAL'/'SYNTACTIC'/'RESOLVED') and the
detector source string as bare Neo4j properties on both nodes and
relationships, alongside layer/kind/module — not under the prop_* dynamic
properties prefix, since they are typed first-class fields on CodeNode and
CodeEdge.
Read path is non-throwing: missing or malformed values fall back to
LEXICAL (least committal), so legacy data persisted before these fields
existed reads back cleanly without a schema migration on the Neo4j side.
Test coverage (11 new tests in GraphStoreConfidenceRoundTripTest):
- All three Confidence values round-trip on nodes (parameterized)
- Legacy nodes missing both fields fall back to LEXICAL + null
- Legacy nodes with source but missing confidence
- Malformed confidence strings (e.g. 'PERFECT') fall back without throw
- Mixed-case confidence ('ReSoLvEd') parses correctly
- Empty source preserved as empty (no silent normalization)
- Edge confidence + source round-trip via hydrateEdgesForNode
- Legacy edges with missing confidence/source fall back cleanly
- Malformed edge confidence does not throw
Forward-compat updates:
- ProvenanceNeo4jRoundTripTest stubs the new keys (was strict-Mockito)
- GraphStoreExtendedTest helper stubs them too
Per sub-project 1 spec §5 + plan Task 4.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Confidence + detector source now serialize through the H2 analysis cache alongside the rest of the node/edge JSON blob. CACHE_VERSION is bumped 4→5 so any existing v4 caches drop and re-populate on next open. Deviation from plan §Task 5: the plan suggested adding `confidence` and `source` SQL columns. Skipped — they would only matter for SQL-level filtering, which we don't do today, and the JSON `data` blob is the authoritative shape on read. We can add columns later if a query layer needs them. YAGNI for now; the version bump alone guarantees no stale v4 rows leak through with the old shape. Test coverage (12 new tests in AnalysisCacheConfidenceTest): - All three Confidence values round-trip on nodes (parameterized) - Bare nodes (no setter calls) round-trip as LEXICAL + null source - Upsert overwrites confidence (no silent decay to older value) - Clear → re-store preserves confidence - All three Confidence values round-trip on edges (parameterized) - Bare edges round-trip as LEXICAL + null - setConfidence(null) is normalized to LEXICAL (never-null invariant) - CACHE_VERSION reflection assertion guards against accidental rollback Per sub-project 1 spec §5 + plan Task 5. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds Detector.defaultConfidence() with a default-method floor of LEXICAL. Each base class overrides where the floor differs: - AbstractRegexDetector → LEXICAL (regex patterns only) - AbstractAntlrDetector → SYNTACTIC (ANTLR parse trees) - AbstractStructuredDetector → SYNTACTIC (parsed YAML/JSON/TOML) - AbstractJavaParserDetector → SYNTACTIC (JavaParser AST) - AbstractJavaMessagingDetector → SYNTACTIC (java-aware regex) AbstractTypeScriptDetector, AbstractPythonAntlrDetector, and AbstractPythonDbDetector inherit SYNTACTIC via AbstractAntlrDetector — no explicit override needed; tests verify the inherited values. DetectorEmissionDefaults.applyDefaults(result, detector) is the new stamping pass for the orchestrator. It writes source + defaultConfidence() onto every node/edge whose getSource() is null — the "detector didn't think about it" sentinel. Explicit stamps survive the pass; e.g. a detector emitting RESOLVED is never down-graded back to the base default. Wiring this helper into Analyzer + IndexCommand is deferred to plan Task 19 (pipeline wiring). This commit only ships the building blocks. Test coverage (19 new tests in DetectorEmissionDefaultsTest): - Per-base default confidence (parameterized across 9 base/sub combos) - Stamping fills source + confidence on null-source nodes and edges - Explicit (RESOLVED + custom source) emissions survive the pass - Mixed result (some explicit, some bare) handled per-emission - null result → no-op (no NPE) - null detector → no-op (defensive) - Empty result → no-op - Idempotent on repeat call with same detector - Second pass with different detector does NOT relabel (first stamp wins) Per sub-project 1 spec §5 + plan Task 6. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…utionException Foundation for the resolver pass that sits between parse and detect. Per- language backends implement SymbolResolver; per-file results carry language-specific resolution state via a Resolved subclass. EmptyResolved is the singleton sentinel returned when resolution didn't happen — its isAvailable() returns false so detectors short-circuit to syntactic detection. ResolutionException is checked, by design — symbol resolution has a long tail of file-specific failures (corrupted source, classpath holes, dependency cycles) and the orchestrator must explicitly decide whether to skip the file or abort the pass. It carries (file, language) for useful logs. No wiring yet — the orchestrator picks these up in Task 11 (ResolverRegistry) + Task 19 (pipeline wiring). Test coverage (16 new tests): - ResolvedContractTest (6): EmptyResolved singleton + reflection guards - ResolutionExceptionTest (4): file/language/cause + checked-ness - SymbolResolverContractTest (6): supportedLanguages non-empty, bootstrap-before-resolve, resolve never returns null (uses EmptyResolved for unsupported language / null AST), default shutdown no-op Per sub-project 1 spec §6.1 + plan Tasks 8-10. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Spring @service that mirrors DetectorRegistry: every @component implementing SymbolResolver is auto-injected via constructor. Resolvers are sorted alphabetically by class simple name for determinism; per-language lookup uses first-in-sort-order wins on conflict. bootstrap(projectRoot) iterates in deterministic order and is resilient — per-resolver ResolutionException (or rogue RuntimeException) is logged at WARN and swallowed so one broken resolver can't take down the pass. resolverFor(language) is case-insensitive, null-safe, and never returns null — unknown languages get the NOOP resolver that always returns EmptyResolved.INSTANCE. Test coverage (13 new tests in ResolverRegistryTest): - Empty registry returns NOOP for any language - Single resolver returned for its declared language - Unknown language returns NOOP - Case-insensitive lookup (java, Java, JAVA, jAvA) - Null language returns NOOP without NPE - resolverFor() never returns null (probed with empty/whitespace input) - Blank language identifiers from a resolver are skipped - Duplicate-language conflict: alphabetical-first wins - all() returns sorted list - bootstrap iterates alphabetically (verified via callback ordering) - bootstrap continues past RuntimeException - bootstrap continues past ResolutionException - bootstrap empty registry is a no-op Per sub-project 1 spec §6.1 + plan Task 11. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DetectorContext now carries an Optional<Resolved> as its 7th field. The field is the opt-in entry point for the resolver pass — detectors that want to upgrade emissions to RESOLVED check ctx.resolved().filter(Resolved::isAvailable) before downcasting to a language-specific Resolved subclass; detectors that don't care simply ignore it. Backward compat: all existing constructors (3-arg, 5-arg, 6-arg with registry) still compile and work — they delegate to the canonical 7-arg constructor with Optional.empty() for resolution. The compact constructor normalizes a null Optional to Optional.empty() so the field is never null at rest. withResolved(Resolved) is the orchestrator's hook to attach per-file resolution after the resolver pass. Test coverage (10 new tests in DetectorContextResolvedTest): - 3-arg / 5-arg / 6-arg constructors all default resolved to empty - 7-arg canonical constructor carries the attached Resolved - Compact constructor normalizes null → Optional.empty() - withResolved(r) returns a copy with Optional.of(r); base untouched - withResolved(null) clears the resolution back to empty - EmptyResolved attached: present but isAvailable()==false - withResolved preserves all other fields - Documents the canonical detector check pattern Verified no regressions: 1970 detector tests all pass. Per sub-project 1 spec §6.1 + plan Task 12. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Same release train as the existing javaparser-core 3.28.0 — required by the upcoming JavaSymbolResolver (sub-project 1, plan Task 17). Pulls in JavaSymbolSolver, CombinedTypeSolver, ReflectionTypeSolver, and JavaParserTypeSolver. Apache-2.0 license, no transitive surprises: 'mvn dependency:tree -Dincludes=com.github.javaparser' shows only the two artifacts at 3.28.0. Per sub-project 1 plan Task 14. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Walks a project root for the canonical src/(main|test)/java directories that Maven and Gradle both standardize on. Multi-module projects work naturally — every nested src/main/java is a separate root. Plain projects (no build file) fall back to top-level src/ if it has any *.java. Determinism: results sorted alphabetically by absolute path. Same tree → same root list → same CombinedTypeSolver → same resolution. Symlink safety: Files.walkFileTree runs with FOLLOW_LINKS disabled, so loops cannot form. The trade-off — source roots reachable only via symlink are skipped — is the right call for resolution where double- counting via symlink would be worse. Skip directories: target, build, out, bin, dist, .git, .gradle, .idea, .vscode, .m2, .cache, node_modules, .codeiq — phantom src/main/java inside any of these is ignored. Test coverage (18 new tests in JavaSourceRootDiscoveryTest): - Maven single-module returns [src/main/java, src/test/java] - Maven main-only returns just main - Maven multi-module aggregates all submodules (sorted) - Gradle layout matches Maven (discovery doesn't read build files) - Plain layout fallback: src/ with .java becomes the root - Plain layout without .java returns empty - Empty / non-existent / null / file-not-dir all return empty (no exceptions) - target/, build/, node_modules/, .git/, .gradle/, .idea/ are skipped - Phantom src/main/java inside skip-dirs is NOT picked up - Results sorted alphabetically (verified across 3 modules) - Discovery is idempotent - Symlink loop terminates without exception (POSIX only — DisabledOnOs Windows) - Deeply-nested modules found - src/main/kotlin is NOT mistaken for a Java root Per sub-project 1 plan Task 15. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
JavaResolved is a record carrying the parsed CompilationUnit + the configured JavaSymbolSolver. isAvailable() == true and sourceConfidence() == RESOLVED — detectors that downcast to it can stamp emissions at the RESOLVED tier. JavaSymbolResolver is the @component that bootstraps a CombinedTypeSolver (ReflectionTypeSolver + per-source-root JavaParserTypeSolver) using JavaSourceRootDiscovery for the root list. Source roots are sorted alphabetically → deterministic solver wiring → same resolution every run. Deliberately NOT mutating StaticJavaParser.getParserConfiguration() — that would conflict with AbstractJavaParserDetector's thread-local JavaParser pool under virtual-thread concurrency. Detectors that want the solver attached to their own JavaParser get it via JavaSymbolResolver.symbolSolver() and configure their own ParserConfiguration. Test coverage (23 new tests): JavaResolvedTest (6): - isAvailable() == true - sourceConfidence() == RESOLVED - cu() / solver() accessors - implements Resolved - distinct from EmptyResolved.INSTANCE JavaSymbolResolverTest (17, Layer 1 unit): - supports "java" only - bootstrap empty project still builds ReflectionTypeSolver - bootstrap with source roots adds JavaParserTypeSolver per root - bootstrap is repeatable (fresh CTS each call) - combinedTypeSolver() null before bootstrap - resolve before bootstrap → EmptyResolved (graceful) - resolve null file → EmptyResolved - resolve non-Java file → EmptyResolved - resolve null AST → EmptyResolved - resolve String AST (wrong type) → EmptyResolved (no ClassCastException) - language match is case-insensitive ("Java") - resolve valid CU → JavaResolved - JavaResolved carries the input cu and the bootstrapped solver - Solver smoke test: resolves java.lang.String via ReflectionTypeSolver - Solver smoke test: resolves project class from source root - resolve() doesn't cache — distinct JavaResolved per call Per sub-project 1 plan Tasks 16-18. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wires the orchestrator stamping pass into all three detect() call sites
in Analyzer.java (the main pipeline, the cache-aware runBatchedIndex
path, and the regex-fallback path). Every emission whose source is null
now gets stamped with:
- source = detector.getClass().getSimpleName()
- confidence = detector.defaultConfidence() (LEXICAL for regex bases,
SYNTACTIC for AST/structured bases)
Detectors that stamp explicitly (e.g. setConfidence(RESOLVED) once a
detector migrates to ctx.resolved()) are left alone — applyDefaults
keys off source==null.
Deferred from this commit (will land with Phase 5 detector migration):
- ResolverRegistry.bootstrap(repoPath) call at the start of run() —
pointless without detectors that consume ctx.resolved()
- Per-file ctx = ctx.withResolved(resolver.resolve(file, ast)) — same
This commit is purely additive: 2417 tests in analyzer + cli + detector
packages all pass, no regressions. The full 3555-test suite is green
post-stamping, confirming existing detector behavior is unchanged
(detectors don't stamp confidence/source today, so the stamping floor
applies uniformly).
IndexCommand also benefits transparently — it calls
analyzer.runSmartIndex() which routes through one of the wired detect
sites.
Per sub-project 1 plan Tasks 19-20 (stamping portion).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Captures the cross-cutting Confidence + source schema change, the intelligence/resolver/ SPI surface, the Java backend (javaparser-symbol-solver-core), DetectorContext.resolved() opt-in, and the per-base confidence floor wired through Analyzer's emission path. Detector migrations to consume ctx.resolved() are explicitly called out as Phase 5 follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
This was referenced Apr 28, 2026
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Foundation for moving the codeiq graph from regex-class-of-correctness to AST-and-symbol-resolution-class-of-correctness. Spec:
docs/specs/2026-04-27-resolver-spi-and-java-pilot-design.md. Plan:docs/plans/2026-04-27-sub-project-1-resolver-spi-and-java-pilot.md.This PR ships Phases 1–4 of the 8-phase plan in 18 atomic commits (~290 new tests):
Confidenceenum (LEXICAL/SYNTACTIC/RESOLVEDwith stablescore()mapping), plus asourcefield on everyCodeNodeandCodeEdge. Round-trip through Neo4j (bareconfidence/sourceproperties on nodes andRELATES_TOrelationships) and the H2 analysis cache (CACHE_VERSION4→5; legacy v4 caches drop and rebuild). Read paths are non-throwing — legacy data without these fields reads back asLEXICAL/null, never NPEs.intelligence/resolver/:Resolvedinterface +EmptyResolvedsingleton sentinel,SymbolResolverper-language backend,ResolutionException,ResolverRegistrySpring@Service(deterministic alphabetical bootstrap, case-insensitive lookup, per-resolver failure isolation, NOOP fallback),DetectorContext.resolved()opt-in accessor (Optional<Resolved>,Optional.empty()for every detector that doesn't care — fully backward compatible).JavaSourceRootDiscoverywalks Maven/Gradle/plain layouts under a project root (skippingtarget/,build/,node_modules/,.git/, etc.; symlink-loop-safe viaNOFOLLOW_LINKS).JavaResolvedrecord carries the parsedCompilationUnitand theJavaSymbolSolver.JavaSymbolResolver@Componentbuilds aCombinedTypeSolver(ReflectionTypeSolver+ per-source-rootJavaParserTypeSolver) without mutatingStaticJavaParser(would conflict with virtual-thread JavaParser pools). Addsjavaparser-symbol-solver-core3.28.0 (Apache-2.0, same release train asjavaparser-core).Detector.defaultConfidence()declares the per-base floor (LEXICALfor regex bases,SYNTACTICfor AST/structured/JavaParser/JavaMessaging bases — TS/Python/PythonDb chains inherit from Antlr).DetectorEmissionDefaults.applyDefaultsis wired into all threedetector.detect()call sites inAnalyzer.java. Every emission whosesourceis null gets stamped withsource = detector.getClass().getSimpleName()and the per-base defaultconfidence. Detectors that explicitly stamp (e.g.setConfidence(RESOLVED)) survive untouched —applyDefaultskeys offsource == null.Deferred to follow-up PR (sub-project 1 Phase 5+):
ResolverRegistry.bootstrap(repoPath)call at the start ofAnalyzer.run()and per-filectx.withResolved(resolver.resolve(file, ast))threading — these pair with detector migrations.SpringServiceDetector,JpaEntityDetector, etc.) to consumectx.resolved()and emitRESOLVED-tierINJECTS/MAPS_TOedges.Test plan
mvn testfull suite locally: 3555 tests, 0 failures, 0 errors, 31 skipped (E2E petclinic suite skipped — needs cloned external repo)Pinned-Dependencies,Token-Permissions, etc. still pass on this branch (no workflow changes in this PR, but the new dep is checked by Trivy)codeiq index . && codeiq enrich . && codeiq serve .on a representative repo, confirm graph loads andconfidence/sourceshow up on nodes and edges via/api/nodes/{id}/detail🤖 Generated with Claude Code