Skip to content

Feat/postgres multi step chains#87

Merged
smunini merged 5 commits into
mainfrom
feat/postgres-multi-step-chains
Apr 27, 2026
Merged

Feat/postgres multi step chains#87
smunini merged 5 commits into
mainfrom
feat/postgres-multi-step-chains

Conversation

@smunini
Copy link
Copy Markdown
Contributor

@smunini smunini commented Apr 27, 2026

No description provided.

smunini added 5 commits April 27, 2026 12:58
The REST extractor's hardcoded "well-known param name → SearchParamType"
match silently misclassified anything outside the list (e.g. Goal.target-date
fell through to a value-shape heuristic, which mongodb's strict typed query
builder then rejected with HTTP 400). Three other call sites carried similar
duplicate tables.

Replace the lot with a single deterministic resolver:

  resolve_param_type(registry, resource_type, name, values)
    1. registry.get_param(resource_type, name)
    2. registry.get_param("Resource", name)            ← global params
    3. value-shape heuristic (unregistered custom only)

Plumbed via a new SearchProvider::search_param_registry() trait method that
every backend already had as a concrete accessor; REST handlers acquire it
from state.storage(). Hardcoded match tables in mongodb's build_search_params
and sqlite's chain_builder are gone; sqlite's dead-code infer_target_type
also removed.

Effect on the failing inferno mongodb test: Goal?patient&target-date now
resolves target-date as Date deterministically, build_date_filter parses the
value correctly, returns 200. The two DiagnosticReport patient&status 400s
were never caused by type inference (status was already Token) and need
separate triage.

Net: −150 lines of redundant hardcoded mappings, +30 lines of resolver,
1 trait method, 8 new unit tests covering registry hit / Resource-base
fallback / value heuristic / target lookup. All 642 persistence and 184
rest lib tests pass; clippy clean with project CI flags.
Postgres' ChainedSearchProvider was returning Ok(Vec::new()) for any forward
chain longer than 2 segments and for nested _has reverse chains, silently
producing empty result sets where SQLite returned correct matches. Backend-
behavior divergence with no error signal — the kind of "simplified for now"
stub PR #80 set out to find.

Port the proven SQLite ChainQueryBuilder pattern to Postgres: nested SELECTs
over search_index, registry-driven target-type resolution via
resolve_param_targets(), the same Patient-default for ambiguous multi-target
references SQLite uses. Postgres-specific syntax: $N placeholders, ILIKE for
case-insensitive match, SUBSTRING(... FROM POSITION('/' IN ...) + 1) in
place of SUBSTR + INSTR for the reverse-chain reference-id extraction.

Verification:
- 6 new unit tests in chain_builder covering parsing, SQL shape, explicit
  type modifier, ambiguous-target inference, empty-chain error, and
  reverse-chain SUBSTRING/POSITION emission
- 2 new testcontainers-backed integration tests porting the SQLite three-
  level chain and reverse-chain terminal cases — both pass against a real
  Postgres instance
- All 609 helios-persistence lib tests still green; clippy clean
Composite's resolve_chain was delegating to a stub (resolve_chain_iterative)
that returned Ok(Vec::new()) for any forward chain — silent wrong-empty
results regardless of backend capability. Same anti-pattern as PR #80
("simplified implementation...for now, just return empty").

Replace with iterative SearchQuery resolution against composite's regular
search routing (the analog of how resolve_reverse_chain already works):

  1. Parse the chain via the registry to resolve target type per link.
     Falls back to the same hardcoded inference table SQLite/Postgres use
     for ambiguous multi-target references — keeps behaviour aligned.
  2. Search the deepest target type for the terminal param.
  3. Walk back: at each step, search the parent type for resources whose
     reference param matches the accumulated refs (multi-value OR).
  4. Outermost step returns raw IDs.

Cost is O(chain_depth) queries; the inner backends apply OR semantics on
multi-value reference params natively.

Verification:
- 3 new unit tests using MockStorage (with search provider registered) prove
  the implementation runs to completion for 3-segment, 2-segment, and
  invalid (single-part) chains. End-to-end correctness comes from inferno
  against real backends.
- 612 helios-persistence lib tests pass; clippy clean.

Drops the resolve_chain_iterative stub.
The merge from main dropped the SearchParamType import in
search_query_builder.rs and left an unformatted let binding in
composite/storage.rs, both flagged by CI lint.
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 27, 2026

Codecov Report

❌ Patch coverage is 66.91176% with 45 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
crates/persistence/src/composite/storage.rs 66.91% 45 Missing ⚠️

📢 Thoughts on this report? Let us know!

@smunini smunini merged commit 6c0691e into main Apr 27, 2026
14 of 15 checks passed
mauripunzueta added a commit that referenced this pull request May 5, 2026
…expand hot path

Reverts the two spawn_blocking additions to process_expand that caused the
run #86 regression (264K → vs 368K wRPS baseline):

  - populate_concept_flags(): per-system concept flag lookup (abstract/inactive)
  - used-codesystem emission: code_system_version_for_url call per system

The SystemVersionCache / ConceptFlagsCache infrastructure added in run #87
to partially mitigate the pool pressure is also removed from mod.rs since
it is no longer needed.

The trait methods (concept_expansion_flags, code_system_version_for_url)
and their SQLite implementations are retained but simplified back to
direct spawn_blocking form (no caching) for future use.

Restores run #84 performance baseline (~368K wRPS).
mauripunzueta added a commit that referenced this pull request May 9, 2026
Two parallel fixes to recover bench parity with run #87 (commit 6492389):

1. SNOMED implicit ValueSet preflight (VC01-03, EX01-03, EX08):
   When packages.fhir.org ships a CodeSystem stub for SNOMED CT
   (content=not-present) before the RF2 import lands a second row at
   (url, version, content=complete), the old resolver could pick the
   stub and every ?fhir_vs / ?fhir_vs=isa query keyed on its empty
   system_id silently returned zero rows.

   - resolve_system_id_with_version_cached / resolve_compose_system_id:
     primary sort by content tier (complete/supplement < fragment <
     not-present), then EXISTS(concepts), then version DESC, id ASC.
   - validate_fhir_vs, bfs_expand_page, bfs_isa_page,
     ensure_implicit_cache: JOIN concepts/concept_closure/concepts_fts
     through code_systems.url so the lookup is robust even if the
     resolver still picks a stub row.
   - ensure_concepts_fts_for_url: build the FTS5 index for every row
     matching the URL so URL-keyed FTS joins always have rows.

2. LK01-04 \$lookup perf (was -77 to -86% RPS @50VU):
   - operations/lookup.rs: replaced search(url, count=1) blob-load and
     full serde_json::from_str just to read .language with a dedicated
     trait method code_system_language() that runs ONE json_extract
     query and is memoised in a process-wide cache (invalidated
     alongside cs_id_cache on CRUD).
   - backends/sqlite/code_system.rs: gated fetch_synthesised_properties
     behind needs_synth (only runs when caller asked for parent/child/
     inactive or *), and folded its 3 queries into one prepare_cached
     UNION ALL with a kind_ord sort key. Same wire output, 3x fewer
     round-trips per \$lookup.
   - traits + postgres backend: parallel code_system_language() impl
     for trait uniformity.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant