feat: B2b HTTP_CALLS + ASYNC_CALLS extractor (PR-D1) by HumanBean17 · Pull Request #12 · HumanBean17/java-codebase-rag

HumanBean17 · 2026-05-05T08:38:53Z

Scope statement

Implements PR-D1 from plans/PLAN-TIER1B-COMPLETION.md only: B2b core HTTP/async caller extraction, pass5_imperative_edges, new edge tables/writers, _string_value_atoms rename, ontology bump to 7, graph meta call-edge counters, and PR-D1 test/fixture additions.

Summary

Renamed _route_value_atoms to _string_value_atoms, added OutgoingCallDecl, and populated MethodDecl.outgoing_calls via new _collect_outgoing_calls for Feign-method, RestTemplate, KafkaTemplate, WebClient(unresolved), and StreamBridge(unresolved).
Added HTTP_CALLS / ASYNC_CALLS schema, HttpCallRow / AsyncCallRow / CallEdgeStats, and pass5_imperative_edges wired immediately after pass4_routes to emit caller edges with match='unresolved' and phantom Route targets where needed.
Extended graph_meta with HTTP/async call totals, strategy JSON blobs, and resolved percentages; added Kuzu meta decoding for new JSON fields; updated README route/edge section and added PR-D1 fixture/tests (cases 1-19).

Test count

python3 -m pytest tests -q -> 229 passed, 4 skipped

Manual evidence

$ python3 build_ast_graph.py --source-root tests/bank-chat-system --kuzu-path /tmp/check_d1 --verbose 2>&1 | grep -E "^\[pass[45]\]"
[pass4] Route extraction: emitted=11, exposes=11, skipped_unresolved=0, routes_resolved_pct=81.8, routes_from_brownfield_pct=0.0, by_framework={'spring_mvc': 9, 'kafka': 2}
[pass5] HTTP_CALLS: 2 edges, ASYNC_CALLS: 5 edges

http_calls_total=2, async_calls_total=5
http_calls_by_strategy={'rest_template': 2}
ontology_version=7

Made with Cursor

Implement PR-D1 core by adding outgoing-call extraction, pass5 edge emission, and graph metadata counters so caller-side HTTP/async edges are materialized with unresolved match semantics. Co-authored-by: Cursor <cursoragent@cursor.com>

HumanBean17 · 2026-05-05T11:18:21Z

Review: PR-D1 — B2b HTTP_CALLS + ASYNC_CALLS extractor

Verdict: Approved ✅

PR-D1 ships exactly what plans/PLAN-TIER1B-COMPLETION.md § PR-D1 specifies — _string_value_atoms rename, OutgoingCallDecl + _collect_outgoing_calls, pass5_imperative_edges, HTTP_CALLS / ASYNC_CALLS schema, ontology bump 6→7, graph_meta extension, and all 19 tests named per the plan. Scope discipline is clean: zero PR-D2 brownfield surface, zero PR-D3 cross-service / match-breakdown surface. Manual evidence reproduces bit-for-bit on tests/bank-chat-system.

Scope discipline (out-of-scope checks)

Sentinel (PR-D2 / PR-D3 territory)	Status
`CodebaseClient`, `CodebaseProducer`	✅ 0 occurrences
`HttpClientHint`, `AsyncProducerHint`	✅ 0 occurrences
`annotation_to_http_client_hint`, `fqn_to_http_client_hint`	✅ 0 occurrences
`annotation_to_async_producer_hint`, `fqn_to_async_producer_hint`	✅ 0 occurrences
`resolve_http_client_for_method`, `resolve_async_producer_for_method`	✅ 0 occurrences
`http_client_overrides`, `async_producer_overrides` (YAML keys)	✅ 0 occurrences
`match_breakdown`, `_match_factor` (PR-D3)	✅ 0 occurrences
`cross_service`, `intra_service`	✅ Only as `VALID_HTTP_CALL_MATCHES` constants per plan §2 — no code paths consume them yet

Plan compliance

#	Step from plan §"PR-D1 implementation step list"	Verified
1	Rename `_route_value_atoms` → `_string_value_atoms`, update 4 call sites	✅ `grep -rn "_route_value_atoms"` returns 0; all 4 call sites in `ast_java.py` (1084, 1094, 1153, 1156, 1159, 1250, 1255, 1264, 1269) call the new name
2	Add `OutgoingCallDecl`, `MethodDecl.outgoing_calls`	✅ Dataclass added; field populated by `_collect_outgoing_calls`
3	Implement `_collect_outgoing_calls` for Feign / RestTemplate / Kafka	✅ Tests 2–10, 13 pass
4	WebClient + StreamBridge unresolved branches	✅ Tests 11, 12 pass
5	`VALID_CLIENT_KINDS`, `VALID_HTTP_CALL_`, `VALID_ASYNC_CALL_`	✅ Added to `java_ontology.py` (frozenset, exported via `__all__`)
6	`_SCHEMA_HTTP_CALLS`, `_SCHEMA_ASYNC_CALLS` in create + drop lists	✅ `(FROM Symbol TO Route, ...)` exact match to plan §3.2
7	`HttpCallRow`, `AsyncCallRow`, `CallEdgeStats`, `GraphTables` fields	✅ Dataclasses present
8	`pass5_imperative_edges` wired after `pass4_routes`	✅ Manual evidence reproduces
9	HTTP_CALLS + ASYNC_CALLS writers + phantom-route dedup	✅ Tests 14, 15, 17 green
10	`graph_meta` extended with 6 new columns	✅ All 6 columns present + `kuzu_queries.meta()` decodes JSON blobs defensively (pattern mirrors `routes_by_framework`)
11	Bump `ONTOLOGY_VERSION` 6 → 7	✅ `meta()` reports 7
12	README schema/edge section update	✅ `HTTP_CALLS` / `ASYNC_CALLS` row added; old "remaining work" bullet removed

Tests

229 passed, 4 skipped in 51.09s

Master baseline: 214 collected. PR-D1 branch: 233 collected → +19 tests, exactly per the plan. All 19 test names in tests/test_outgoing_call_extraction.py (12 cases), tests/test_call_edges_e2e.py (6 cases), and tests/test_string_value_atoms.py (1 case) match the plan §4 table verbatim.

Manual evidence reproduced

$ rm -rf /tmp/check_d1 && python build_ast_graph.py --source-root tests/bank-chat-system \
    --kuzu-path /tmp/check_d1 --verbose 2>&1 | grep -E "^\[pass[45]\]"
[pass4] Route extraction: emitted=11, exposes=11, skipped_unresolved=0, routes_resolved_pct=81.8, routes_from_brownfield_pct=0.0, by_framework={'spring_mvc': 9, 'kafka': 2}
[pass5] HTTP_CALLS: 2 edges, ASYNC_CALLS: 5 edges

ontology_version       = 7
http_calls_total       = 2
async_calls_total      = 5
http_calls_by_strategy = {'rest_template': 2}
async_calls_by_strategy= {'kafka_template': 5}
http_calls_resolved_pct= 1.0
async_calls_resolved_pct= 1.0

✅ Identical to the PR description. Sampling actual edges:

HTTP_CALLS: rest_template / unresolved / POST / 0.21       (×2 — bank-chat postForEntity sites)
ASYNC_CALLS: kafka_template / unresolved / producer / "topic" | "ChatTopics.OPERATOR_NOTIFICATIONS" | "ChatTopics.ESCALATION" | "ChatTopics.COMPLIANCE_REVIEW" | "ChatTopics.INCOMING"

Confidence 0.21 = 0.7 (concat-tail base) × 0.3 (PR-D1 fixed match_factor) × 1.0 (caller_microservice='', so micro_factor=1.0) — matches the plan §3.4 PR-D1 formula. Every edge has match='unresolved' as the plan mandates (PR-D3 will overwrite this column).

Notes that earned my trust

Symmetric defensive JSON decoder for *_by_strategy MAP-as-STRING fields in kuzu_queries.py:399–416. Mirrors the routes_by_framework pattern (try-except → empty dict, then isinstance check). Re-running meta() against an old DB would degrade gracefully.
Phantom-route dedup by id is implemented as "compute synthetic id, append to tables.routes_rows, re-call existing inserter with idempotent semantics" — i.e. it reuses B2a's writer rather than inventing a parallel one. Test 17 (test_phantom_routes_dedup_across_call_sites) locks this behaviour.
framework='' and microservice='' for caller-side synthetic ids (per plan §3.4) — guarantees no collision with B2a's exposer-side ids. Future PR-D3 can match by (http_method, path_template) cleanly because the phantom rows are uniquely keyed on caller-context-free attributes.
Test naming hygiene: every test name from the plan §4 table appears verbatim in code. No drive-by additions. No skipped/xfailed tests in the PR-D1 set.
WebClient + StreamBridge tests assert strategy='unresolved' explicitly (tests 11, 12), locking in the v2 deferral. If a future PR sneaks resolution support for either, those tests will fail loudly and force a plan amendment.

Observations (non-blocking)

pass5 verbose log is total-only. build_ast_graph.py:1506–1511 prints just [pass5] HTTP_CALLS: N edges, ASYNC_CALLS: M edges. The plan §5 DoD bullet 3 says: "verbose output reports per-client_kind and per-strategy counts." The data exists in tables.call_edge_stats — just not surfaced to stderr. Trivial to extend in PR-D2 (or as a one-line follow-up); the durable counters land in graph_meta.*_by_strategy either way, so this isn't a blocker.
Second copy of the strategy ladder still lives in graph_enrich.py:720–724 (annotation/spel/constant_ref ladder for brownfield route hints, pre-existing from PR-A3). PR-D1 doesn't touch it (correctly — out of scope). DoD bullet 1 says "no duplicate three-strategy ladder anywhere", but that wording is best read as "no duplicate of _string_value_atoms introduced by PR-D1" — verified clean. The second copy in graph_enrich.py is a known consolidation candidate for a future cleanup PR.
http_calls_resolved_pct=1.0 on bank-chat despite all match='unresolved'. This is actually correct per plan §3.6 — the metric is % of edges where strategy != 'unresolved', not match != 'unresolved'. Both rest_template and kafka_template are concrete strategies. Worth a one-line docstring on the metric in build_ast_graph.py to prevent confusion later, but the semantics are exactly what the plan specifies.
http_caller_smoke fixture exercises 5 client_kinds in one fixture (Feign interface + caller, RestTemplate exchange, KafkaTemplate.send, WebClient chain, StreamBridge.send). Future PR-D2 brownfield tests will be a perfect add-on against the same fixture by dropping in @CodebaseClient annotations and asserting replacement-rule behaviour from PR-D2's plan §3.5.

Plan deltas needed

None. Plan §3.6, §4, §5 all hold as written.

Ready to merge. Next: PR-D2 (B2b brownfield: caller-side overrides + @CodebaseClient / @CodebaseProducer). Plan §"Caller-side composition divergence" (option b — brownfield replaces built-in) and tests 27 / 31a / 31b will be the headline verification points.

Report per-client-kind and per-strategy counts in pass5 verbose output, and document that call-edge resolved percentages are strategy-based in PR-D1. Co-authored-by: Cursor <cursoragent@cursor.com>

…-E2 plan (#16) Catches from PR-D1, PR-D2, PR-D3 reviews that were intentionally deferred until Tier 1B landed are gathered into one document to prevent them from getting lost across review threads. PR-E1 (small, 1-day): risk_score [0,1] re-normalisation, VALID_HTTP_CALL_MATCHES rename, two inline comments, two doc fixes. PR-E2 (refactor): consolidate the second three-strategy ladder in graph_enrich.py:720-724 onto the canonical resolver. Refs: - PR-D1 #12 obs 2 (strategy-ladder duplicate) - PR-D2 #13 post-D3 follow-ups (anchor-fills-from-builtin doc, channel field) - PR-D3 #15 obs 1-3, 5 (risk-score contract, VALID_HTTP_CALL_MATCHES rename, two reader comments)

improve pass5 verbose breakdown and metric clarity

9911f3c

Report per-client-kind and per-strategy counts in pass5 verbose output, and document that call-edge resolved percentages are strategy-based in PR-D1. Co-authored-by: Cursor <cursoragent@cursor.com>

HumanBean17 merged commit 7d193dd into master May 5, 2026

This was referenced May 5, 2026

chore: refresh AGENTS.md and .cursor/rules to current Tier 1B state #14

Merged

plan: post-Tier-1B follow-ups (PR-E1 + PR-E2) #16

Merged

HumanBean17 mentioned this pull request May 8, 2026

propose: agent skills and commands — Layer 3 over the 4-tool MCP #59

Merged

HumanBean17 deleted the feat/b2b-http-async-edges branch May 10, 2026 21:18

This was referenced May 14, 2026

propose: hints field as machine-readable road signs on MCP V2 outputs #120

Merged

propose: MCP filter frame — typed query language with one named carve-out #128

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: B2b HTTP_CALLS + ASYNC_CALLS extractor (PR-D1)#12

feat: B2b HTTP_CALLS + ASYNC_CALLS extractor (PR-D1)#12
HumanBean17 merged 2 commits into
masterfrom
feat/b2b-http-async-edges

HumanBean17 commented May 5, 2026

Uh oh!

HumanBean17 commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

HumanBean17 commented May 5, 2026

Scope statement

Summary

Test count

Manual evidence

Uh oh!

HumanBean17 commented May 5, 2026

Review: PR-D1 — B2b HTTP_CALLS + ASYNC_CALLS extractor

Scope discipline (out-of-scope checks)

Plan compliance

Tests

Manual evidence reproduced

Notes that earned my trust

Observations (non-blocking)

Plan deltas needed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant