Skip to content

feat(hts): Postgres backend parity → 100% + performance iterations#112

Open
mauripunzueta wants to merge 474 commits into
mainfrom
integration/hts-pg-parity-perf
Open

feat(hts): Postgres backend parity → 100% + performance iterations#112
mauripunzueta wants to merge 474 commits into
mainfrom
integration/hts-pg-parity-perf

Conversation

@mauripunzueta
Copy link
Copy Markdown
Contributor

Summary

Lands the post-#93 work from feat/hts-terminology-service onto current main: the helios-hts Postgres backend reaching full parity with the SQLite backend on the HL7 Tx Ecosystem IG bench, plus the performance-iteration series that closed the PG-vs-SQLite throughput gap.

Net delta vs main: 13 files, +6,959 / −377 — confined to crates/hts/** and the two PG CI workflows. No re-introduction of the already-merged terminology server.

Why this is a fresh integration branch (not a direct PR from the feature branch)

#93 squash-merged the Helios Terminology Server into main. Work then continued on feat/hts-terminology-service using its pre-squash history, so that branch diverged from main — a direct PR would have shown ~37k lines / 473 commits, re-adding the entire merged server.

This branch is main + a single merge of feat/hts-terminology-service, with conflicts resolved so the diff is only the real new work.

Conflict resolution (9 files)

  • crates/hts/src/** (7 files) — took the branch's versions. Verified no main commit since #93 touched these files, so the branch versions are strictly newer and lose nothing.
  • .github/workflows/{tx-ecosystem,hts-benchmark}-postgres.yml — took the branch's versions: remote-Docker-daemon wiring, shared_buffers/work_mem tuning for the iter-7j concept_closure, and the DB-dump debug artifact — all validated in the perf-iteration CI runs. main's copies were the older #105 scaffolding.

main's later non-conflicting additions are preserved automatically: testcontainer-retry fix, #105 PG workflows, #109/#110 bulk-export workflows, and the cargo update.

What's included

  • PG parity → 100% (587/587) on the Tx Ecosystem IG suite: VS/CS $validate-code shape parity, compose filter operators (=/is-a/regex), inline ValueSet on $expand, force-system-version/system-version handling, ?fhir_vs implicit ValueSets, contained-VS (#fragment) refs, version-mismatch detection.
  • Performance iterations (iter 1–7m): per-instance $lookup/resolve_code_system caches, $subsumes/$translate response caches, single-flight OnceCell expand cache, UNNEST batch insert, concept_closure table + paginated is-a fast path, closure COUNT(*) cache, $lookup negative cache, in-process hierarchy edge cache.
  • The ecl evaluator sqlite-feature gate (latent compile bug under --features postgres).

Known follow-up

The PG CI uses cargo check (not cargo test) under --features postgres: several #[cfg(test)] modules reference SqliteTerminologyBackend without a #[cfg(feature = "sqlite")] gate and don't compile under postgres. End-to-end PG coverage comes from the tx-ecosystem job (validator → HTS over HTTP → PG). Restore cargo test once that gating is fixed.

Test plan

  • tx-ecosystem-postgres.yml reaches 100% pass on this branch
  • hts-benchmark-postgres.yml build + benchmark complete with 0% error rate
  • SQLite workflows (tx-ecosystem.yml, hts-benchmark.yml) untouched, still 100%

History

Supersedes #106 (closed, stale duplicate) and #107 (closed). Both carried the earlier 80.5% milestone from a now-diverged branch.

smunini and others added 30 commits May 4, 2026 20:09
The IG search tests pass `filter` and expect it back in
expansion.parameter[]. We were skipping it as a "discriminator" but
it's a knob like the others — drop it from the skip list.
…expand pool pressure

Every $expand cache miss previously incurred two extra spawn_blocking calls
after the backend returned: code_system_version_for_url (1 pool connection)
and concept_expansion_flags (1 pool connection). At 50VU with pool=20 this
tripled effective pool demand and caused severe contention — the primary driver
of the run #86 regression on all expand tests (EX01 −87%, EX03 −91%, etc.).

Fix 1 — SystemVersionCache (Arc<RwLock<HashMap<String, Option<String>>>>):
  CodeSystem versions never change during a server run. Cache the first DB
  lookup per system URL; all subsequent calls return from memory with no pool
  access, eliminating spawn_blocking #3 on every expand cache miss.

Fix 2 — ConceptFlagsCache (Arc<RwLock<HashMap<String, Arc<HashMap<…>>>>>):
  On the first concept_expansion_flags call for a system, load ALL abstract /
  inactive codes for that system in one query (filtering on notSelectable /
  status properties, returning only the flagged minority). Store the result
  keyed by system URL. Subsequent calls for the same system look up each code
  in the Arc<HashMap> entirely in memory — no spawn_blocking, no pool
  connection, O(1) per code.

Both caches are cleared on import_bundle so fresh data is always reflected.
The IG search fixtures pin filter as valueString in expansion.parameter[]
regardless of whether the request used valueString or valueUri. Echoing
the input verbatim (yesterday's commit) was triggering "unexpected
property valueUri" diffs.

Drop `filter` from the bare echo and emit it explicitly as
{name: filter, valueString: <input>} after the echo block.
The IG parameters fixtures don't echo the `property` input back in
expansion.parameter[] — it's a request-side filter for what
contains[].property entries to surface, not a knob whose value the
response confirms. Echoing it produced "array item count Expected:2
Actual:3" diffs across parameters-expand-*-property tests.
The IG parameters/* and extensions/* fixtures pin a non-existent
useSupplement and expect a 4xx error. We were ignoring useSupplement
and returning 200 with a normal expansion. ~10 single-line tests
failed for this reason.

Walk every useSupplement parameter at the top of process_expand,
look up each via code_system_version_for_url, and return
InvalidRequest if any are missing — surfaces as a 400 OperationOutcome
with text "Required supplement not found: <url>".
Extend the supplement check from $expand to $validate-code and $lookup
so the parameters/parameters-{validate,lookup}-supplement-bad and
extensions/* counterparts also produce 4xx instead of 200. Use
HtsError::NotFound (issue.code=not-found) which matches the IG
fixture's `business-rule | not-found` choice.
The IG parameters-expand-*-property fixtures pass `property=<name>` and
expect each contains[] entry to carry the named property in a
`property` array. Without it 4 single-line tests failed with
"missing property property at .expansion.contains[0]".

Add CodeSystemOperations::concept_property_values(system, codes, props)
— a per-system batched lookup mirroring concept_designations. Add
`properties: Vec<ExpansionContainsProperty>` field to ExpansionContains
(with skip_serializing_if=Vec::is_empty so the default response shape
is unchanged). Default the new field at all 30 backend construction
sites via mechanical sed.

In process_expand, collect every `property` parameter (valueString or
valueCode) and run populate_properties post-expansion. Serializer maps
the internal value_type to the right FHIR `value[x]` field
(valueBoolean / valueInteger / valueCode / etc.).
…roperty

The IG fixtures expect expansion.property[] to declare each requested
property's code (and uri). Without it the R5 test
parameters-expand-enum-property failed with "missing property property
at .expansion".

When the caller passed `property=<name>` parameters, emit a parallel
expansion.property entry per requested name. The uri is synthesised
from the first contributing CodeSystem URL — close enough for the
IG fixture pattern of `<system>/properties#<code>`.

Note: the corresponding R4 test will still fail because R4
expansion.contains.property doesn't exist as a field and the R4
parser drops it; only R5 unlocks here.
The IG fixtures use CS-defined property URIs (e.g.
http://hl7.org/fhir/test/CodeSystem/properties#prop), not URIs derived
from the system URL. My previous synthetic URI didn't match. Drop the
uri field; the validator's $optional$ pattern allows it to be absent.
The IG fixture for parameters-expand-enum-property requires
expansion.property[].uri to match the CS-defined property URI (e.g.
http://hl7.org/fhir/test/CodeSystem/properties#prop), which is
sibling-relative to the CS URL — not derivable from it.

Look up the source CodeSystem via the existing search() API and read
the matching property's uri from its property[] declarations. Falls
back to omitting uri when the CS isn't found or doesn't declare one.
The IG validation/simple-coding-no-system test passes a Coding with
only `code` (no system) and expects 2xx. extract_coding required
system, so we returned 400 (no valid input form). Treat absent system
as empty string and pass None to the backend, which then matches by
code alone scoped to the VS.
…code

Previously a Coding with no `system` either returned 400 (extract_coding
required system) or 2xx with result=true (when I loosened the
requirement). The IG fixture wants 2xx with result=false plus a
message: a Coding without system isn't validatable, but the request
itself is well-formed.

Add an explicit branch in path 2 of process_vs_validate_code: when
extract_coding returns an empty system, short-circuit with
result=false and "Coding has no system - cannot validate".
…n $lookup

The HL7 tx-ecosystem `simple-cases/simple-lookup-1` and `simple-lookup-2`
fixtures send `property=*` and expect $lookup to surface several
"well-known" concept properties that aren't stored directly in
`concept_properties`:

* `parent` — derived from `concept_hierarchy.parent_code WHERE child_code = req.code`
* `child`  — derived from `concept_hierarchy.child_code  WHERE parent_code = req.code`
* `inactive` — boolean derived from a `status` property in the FHIR
  inactive set (retired/deprecated/withdrawn/inactive); skipped when
  the concept already has an explicit `inactive` row to avoid duplicates
* `definition` — top-level Parameters entry sourced from
  `concepts.definition`

Each synthesised parent/child entry carries the related concept's
display in `description`, matching the IG fixtures' optional
`description` parts.

The wildcard handling and explicit-property filter both honour
synthesised entries: `property=*` includes them all alongside stored
properties; `property=parent` returns only the synthesised parent.

Both SQLite and PostgreSQL backends are updated. `LookupResponse`
gains a `definition: Option<String>` field, surfaced by
`process_lookup` as a top-level `definition` parameter.

Adds 8 unit tests (parent/child/inactive synthesis, no-status
fallback, explicit-inactive non-duplication, definition echo,
filtered selection, wildcard inclusion) and 1 HTTP-level integration
test asserting the IG response shape end-to-end. All 499
helios-hts lib tests pass.
…sion

The HL7 tx-ecosystem IG conformance suite passes `excludeNested=false`
on every `parameters/parameters-expand-*` test and expects nested
`contains[]` in the response. Previously we only built the hierarchical
tree when `hierarchical=true` was explicitly set, so 25+ IG tests failed
with "array item count differs" because we returned the flat list.

Map `excludeNested=false` (and the legacy `hierarchical=true` alias) to
the same backend tree-builder. `excludeNested=true` (or absent) preserves
the historical flat behaviour, keeping the `simple/*` IG fixtures green.

Tests cover:
- excludeNested=false → tree (root → child → grandchild + sibling orphan)
- excludeNested=true  → flat list
- both params absent  → flat list (default)
- hierarchical=true and excludeNested=false produce identical contains[]
…expand hot path

Reverts the two spawn_blocking additions to process_expand that caused the
run #86 regression (264K → vs 368K wRPS baseline):

  - populate_concept_flags(): per-system concept flag lookup (abstract/inactive)
  - used-codesystem emission: code_system_version_for_url call per system

The SystemVersionCache / ConceptFlagsCache infrastructure added in run #87
to partially mitigate the pool pressure is also removed from mod.rs since
it is no longer needed.

The trait methods (concept_expansion_flags, code_system_version_for_url)
and their SQLite implementations are retained but simplified back to
direct spawn_blocking form (no caching) for future use.

Restores run #84 performance baseline (~368K wRPS).
…t params)

The tx-ecosystem IG `translate/translate-1` and `translate-reverse` fixtures
post `sourceSystem`+`sourceCode`+`targetSystem` (or `targetCode` for reverse)
without an explicit ConceptMap `url`. We previously rejected those requests
with 400 because $translate required a URL.

Changes:

- operations/translate.rs: accept R5 parameter names alongside R4
  (`sourceCode`/`code`, `sourceSystem`/`system`, `targetCode`, `targetSystem`).
  Reverse mode is now driven implicitly by `targetCode`. Response includes
  `originMap` (canonical with `|version`), both `equivalence` and
  `relationship` (each marked $optional$ in IG fixtures by FHIR version), and
  a `source` Coding for reverse responses identifying the input code.

- backends/{sqlite,postgres}/concept_map.rs: query takes an extra
  `other_side_sys` filter restricting the result-side system, and now
  selects `cm.version` plus the input-side system/code for the response.

- types.rs: TranslateRequest gains `target_code`; TranslationMatch gains
  `map_version`, `source_system`, `source_code`.

- import/bundle_parser.rs: ConceptMap targets accept R5 `relationship` as
  a fallback for R4 `equivalence` so tx-ecosystem fixtures import correctly.

Targets the 2 IG `translate/` failures (translate-1, translate-reverse).
Closure has no IG tests on the bench so it's untouched.
The HL7 tx-ecosystem IG fixtures expect $validate-code failure cases to
surface one OperationOutcome.issue per concern (abstract + not-in-vs,
unknown-system + not-in-vs, inactive + valid-but-not-active + not-in-vs,
display-mismatch, etc.) rather than collapsing every signal into a
single error issue. The previous implementation always emitted a single
issue with code=code-invalid + tx-issue-type=not-in-vs, which made ~40
fixtures fail with "Expected:N Actual:1" issue counts.

Changes:
- types.rs: add ValidationIssue (severity, fhir_code, tx_code, text,
  location, message_id) and ValidateCodeResponse.issues field — backends
  push one issue per detected concern; the operations layer renders
  them as OperationOutcome.issue[] entries and joins their texts
  (alphabetically, "; "-separated) into the top-level message parameter.
- backends/sqlite/value_set.rs::finish_validate_code_response: emit
  separate issues for code-not-in-vs, abstract (business-rule code-rule
  + code-invalid not-in-vs), inactive (warning), display-mismatch
  (error invalid-display), and the IG-specific "valid but not active"
  triple when a code is in the underlying CodeSystem but excluded from
  the VS by activeOnly / compose.inactive=false.
- backends/sqlite/code_system.rs: emit not-found/not-found for unknown
  CodeSystem, code-invalid/invalid-code for unknown code, and
  invalid/invalid-display for display mismatch.
- operations/validate_code.rs::build_validate_response: walk
  resp.issues; synthesise an UNKNOWN_CODESYSTEM not-found issue when
  the operations layer sees an unknown system (surfaces the IG-required
  second issue alongside the backend's not-in-vs); fall back to the
  legacy single-issue path only when no structured issues are present.
- backends/postgres/*: add empty issues field to satisfy the new struct
  shape (postgres backend keeps single-message behaviour for now).

Adds two unit tests (vs_validate_unknown_system_emits_two_issues,
vs_validate_no_system_on_coding_emits_invalid_data_issue) covering the
two simplest multi-issue scenarios. All 622 helios-hts tests pass.

[skip ci]
…mpose

Drop the column-level UNIQUE on `code_systems.url` (both SQLite and PG
schemas) and replace it with a composite UNIQUE index on (url, version).
Two CodeSystems sharing a canonical URL now coexist whenever they
declare distinct `version` values, matching tx-ecosystem fixtures
`version/codesystem-version-1.json` / `-2.json` (both ship `id:"version"`,
url `http://hl7.org/fhir/test/CodeSystem/version`, versions `1.0.0` and
`1.2.0`). Add an idempotent SQLite migration (`migrate_code_systems_drop_url_unique`)
that rebuilds the table without the legacy constraint, plus a PG
DO-block that drops any pre-existing `UNIQUE (url)` constraint.

Storage id is now `<fhir-id>|<version>` (or `<fhir-id>` when version
absent) so per-(url, version) rows have stable, distinct PK values
without colliding when fixtures reuse the FHIR `id` across versions.
Concepts / hierarchy / properties / designations FK to that synthetic
storage id, so each version owns its own normalized data.

`resolve_code_system` (both backends) now picks the highest version
when no `version` parameter is supplied, exact-matches when a literal
version is given, and supports tx-ecosystem version-pattern syntax
(`1.x.x`, `1.x`, bare `1`) — each `x` segment is a wildcard, bare
numeric prefixes match any version starting with that segment, and
the highest matching version wins.

Compose includes that pin a `version` (`compose.include[].version`)
now resolve to the matching `code_systems` row instead of arbitrarily
grabbing the first row by URL. The mixed value set
`tests/version/valueset-version-mixed.json` (which pulls `code1` from
1.0.0 and `code2` from 1.2.0 of the same canonical URL) expands
correctly. CRUD lookups by FHIR id (e.g. `/CodeSystem/version`) fall
back to matching `resource_json.id` and pick the latest version when
the synthetic storage id miss occurs. The hierarchy/concept-map
helpers that look up `system_id` from a URL now `ORDER BY version DESC
LIMIT 1` so subsumption walks the most recent revision.

Targets the largest remaining tx-ecosystem bucket — the 147 failing
`version/` tests including coding-v10-vs1w, vs-expand-v1, code-vnn-*
families, codeableconcept-v10-* and the mixed-version paths.

Adds 8 new unit tests covering: dual-version coexistence after import,
exact / wildcard / short-prefix / unknown-version lookup, and three
compose-version pinning scenarios (pin v1, pin v2, mixed). Updates the
existing `cs_create_indexes_into_hts_tables` and
`cs_delete_removes_hts_normalized_rows` tests to match by URL rather
than the now-synthetic storage id.

cargo test -p helios-hts --lib: 473 passed, 0 failed
…o value

Add HtsError::VsInvalid distinct from InvalidRequest so the FHIR issue code
stays `invalid` (HTTP 400) but the tx-issue-type coding becomes `vs-invalid`,
matching the HL7 tx-ecosystem `errors/broken-filter-{expand,validate}`
fixtures (3 IG tests).

The validation runs at the top of `apply_compose_filters`: every
ValueSet.compose.include.filter[] entry must carry a non-empty `value`.
When missing, return a VsInvalid error with the IG-spec text
"The system <url> filter with property = <p>, op = <o> has no value".

Triggered identically by both `$expand` and `$validate-code` on the
broken-filter ValueSet, since both go through `apply_compose_filters`
during expansion.
The HL7 IG `deprecated/expand-*` fixtures (8 IG tests) require:

1. One `used-codesystem` parameter per distinct contributing CodeSystem,
   formatted as `<url>|<version>` — restored after the perf revert that
   removed it.
2. A parallel `warning-<status>` entry whenever the source ValueSet or a
   contributing CodeSystem carries one of:
     * `extension[].url == structuredefinition-standards-status` with
       `valueCode` of `deprecated`, `withdrawn`, `draft`, etc.
     * `experimental: true` → `warning-experimental`
     * `status: "draft"` → `warning-draft`

A new `standards_statuses()` helper centralises the rule application and
deduplicates when multiple markers point at the same status code.

Performance impact: one CodeSystem search per distinct contributing system
on the cache-miss path. Hot-path requests are still served from
`state.expand_cache` (pre-serialised bytes), so repeated identical
$expand calls pay the cost only once.
…lookup

A CodeSystem with content=supplement plus a supplements=<base> field adds
designations / properties to a base CS without redefining its concepts.
Until now we only rejected unknown supplements (commits 9077590 +
87f40db); valid ones were silently ignored, so the IG fixtures' good /
none / extension-driven cases all failed.

This commit makes useSupplement actually do something:

- New backend trait methods (default-empty so postgres degrades silently):
  supplement_target(url) → SupplementInfo{target_url, supplement_canonical},
  supplement_designations(urls, codes), supplement_property_values(...).
  SQLite reads from the same code_systems / concept_designations /
  concept_properties tables, filtered by content='supplement'.

- $lookup merges supplement designations into the response, tagging each
  one with `source = supplement|version` (emitted as designation.source
  on the Parameters output) and appends one used-supplement parameter
  per applied supplement. Mismatched-target supplements reject 404.

- $validate-code accepts supplement designation values as valid alt
  displays: when the backend returns a display-mismatch failure and the
  caller's display matches a supplement designation, "rescue" the
  response (clear message, set result=true). Restores honoring
  Coding.display in both CodeSystem and ValueSet variants (re-applies
  the logic reverted in 340cf5f but now scoped to supplements). Also
  emits used-supplement and propagates bad-supplement → 404.

- $expand resolves supplements from BOTH the request useSupplement
  parameter AND the source ValueSet's `valueset-supplement` extension.
  Merges supplement designations (when includeDesignations=true) and
  supplement properties (when property= is requested) into
  expansion.contains[]. Emits used-supplement in expansion.parameter.

Adds 9 new unit tests covering the good/none/bad cases for all three
operations. All 500 helios-hts unit tests pass.

Expected to unlock the IG `parameters/parameters-{lookup,validate,
expand}-supplement-{good,none}` fixtures (~6 tests), plus the
extensions/extensions-echo-* family that auto-applies via the VS
extension. Display-aware tests like validation/simple-coding-bad-
language-{vs,vslang} should now also pass since Coding.display is
honored again — without the supplement plumbing they would re-trigger
the issue that caused 340cf5f to revert 670a686.
…n + tx-resource [skip ci]

Inline `$expand` requests previously couldn't resolve `#fragment` ValueSet
references nor `tx-resource` shadow resources, and the existing canonical-URL
ref handling unioned matches per include rather than intersecting them. The
result was 404s on the simple-expand-contained tx-ecosystem fixture and wrong
totals on exclude-{combo,gender,gender2} / include-combo.

This change:

- Adds `ExpandRequest::tx_resources` and threads `tx-resource` parameters
  from the `$expand` handler through to the SQLite backend.
- Promotes `tx-resource` ValueSets to the inline-body path when their `url`
  matches the request's `url` parameter (include-combo / exclude-combo).
- Introduces an `InlineResolutionContext` that resolves `compose.include[].
  valueSet[]` refs in the order: inline `contained[]` -> `tx-resource` ->
  local store. Cycle-safe via a visited-URL set; emits `vs-invalid` warning
  on re-entry.
- Replaces the union-per-include behaviour with intersection-within-include
  semantics (FHIR R5 section 4.9.5): multiple `valueSet[]` entries are
  intersected with each other and with the include's local
  system/concept/filter.
- Mirrors the same intersection logic in `compose.exclude[]` so the
  exclude-combo fixture's `concept[] AND valueSet[]` deny-set works.
- Fixes the inline total-miss guard so partial successes (one ref hits,
  another misses) no longer 404 - guard now requires both empty result and
  every-include-warned.
- Skips the inline-compose disk + memory caches when the body carries
  `contained[]` or `tx-resource` shadows so a later request without those
  supplements can't hit the wrong cached row.

Adds five unit tests in `backends::sqlite::value_set::tests` covering
contained-fragment intersection, missing fragments, tx-resource shadowing,
cycle rejection, and exclude-with-valueSet intersection.

Expected to unlock: simple-expand-contained, exclude-{combo,gender,gender2,
include-combo}, and the expand-indirect-* fixtures whose canonical refs land
on existing ValueSets in the local store. Tests that pin to a specific
`url|version` (e.g. coding-indirect-zero-pinned-wrong) still need the
default-valueset-version parameter and a versioned `value_sets` table key -
explicitly out of scope for this bucket.

Cycle handling: tracks the in-flight ref string (canonical URL or `#fragment`)
in a per-request `BTreeSet<String>`; a re-entry pushes a warning containing
"Cyclic" and returns `vec![]` for that branch rather than recursing. The
existing depth limit (4) is unchanged and still trips on long chains that
don't actually cycle.
…ip ci]

Implements two FHIR R5 §4.9.5 compose filter operators that were previously
treated as unsupported (silently empty result):

- `regex`: full-string match against `code` or a named property value, using
  the `regex` crate's RE2-style linear-time engine wrapped in `\A(?:…)\z`
  for FHIR's anchored semantics.
- `child-of`: direct (one-level) children of the supplied parent code via
  the pre-materialized `concept_hierarchy` table, intentionally excluding
  self and transitive descendants.

Both ops slot into the existing 3-phase compose filter pipeline:
property-equality (Phase 1) → hierarchy / ECL / `child-of` (Phase 2) →
`regex` (Phase 3, last so it can intersect with already-narrowed candidates).

Adds `HtsError::VsInvalid` mapping to HTTP 400 + FHIR issue code `invalid`
with `tx-issue-type=vs-invalid`, used for malformed regex patterns and
`child-of` / `regex` filters that omit a value. Distinguishes
ValueSet-definition errors from generic 400-class request problems for the
HL7 tx-ecosystem IG validator.

Unblocks tx-ecosystem `simple/simple-expand-regex`, `simple-expand-regex2`,
`simple-expand-regex-prop`, `simple-expand-child-of`, plus the `regex-bad`
suite (Rust's RE2 engine handles `(a+)+` / `((a+)+)+` in linear time, so
no timeout path is needed).

Skipped corner cases:
- PCRE-only regex features (backreferences `\1`, lookaround `(?=…)`,
  possessive quantifiers): rejected with `vs-invalid`. None of the IG
  fixtures use them.
- code-regex on SNOMED-scale code systems: still loads all concepts for
  the system before filtering. The IG test data is small (3 concepts in
  regex-bad); a future optimisation could push regex into SQLite via
  a registered function for large-CS workloads.
…yLanguage swap

The HL7 IG `language/expand-xform-*` fixtures (8 IG tests) plus a
follow-on set of `display/validation-*` fixtures expect a richer
displayLanguage swap shape:

  * The matching-language designation value is promoted to top-level
    `display` (existing behavior).
  * The original CodeSystem-language display becomes a designation
    `{language: <cs-lang>, use: {system, code: preferredForLanguage}, value: <orig>}`.
  * The duplicate matched designation is removed from the designation list.

Also handle the Accept-Language style "hard fallback" form
`displayLanguage=de,*; q=0`: when no matching designation exists and the
caller forbids fallback (q=0 on the wildcard), drop top-level `display`
entirely — but still surface the original CS-default display as a
preferredForLanguage designation so consumers can recover it.

Implementation:

  * `parse_display_language()` parses the request value into a
    `DisplayLangSpec { preferred, hard_fallback }`.
  * Designation lookup uses BCP 47 / RFC 4647 prefix matching, so a
    request for `de` matches stored `de-CH` designations.
  * A single per-system CodeSystem search now drives THREE downstream
    blocks (displayLanguage swap, used-codesystem emission, and
    warning-<status> emission) via a shared `cs_by_url` map — net zero
    extra SQL on the cache-miss path.
Resolved conflicts in:
- crates/hts/src/backends/sqlite/schema.rs (kept title/resource_json columns from HEAD; dropped column-level UNIQUE on url per multi-version branch)
- crates/hts/src/backends/sqlite/mod.rs (run both migrate_concept_closure and migrate_code_systems_drop_url_unique)
- crates/hts/src/backends/sqlite/code_system.rs (kept multi-version resolve_code_system + fetch_versions; dropped HEAD's loose-match fallback so unknown-version requests stay 404)
- crates/hts/src/backends/sqlite/value_set.rs (kept HEAD's multi-include fast path + ValueSet ref handling; switched system_id lookup to multi-version's resolve_compose_system_id with (url, version) cache key)
- crates/hts/src/import/fhir_bundle.rs (kept multi-version's storage_id_for + INSERT OR IGNORE + UPDATE pattern; preserved HEAD's UPSERT_CONCEPT_SQL constants for the concepts loop)
Resolved conflicts in:
- crates/hts/src/error.rs (kept #103's tx-issue-type column in doc table; merged duplicated VsInvalid doc/comment text into single canonical wording)
- crates/hts/src/operations/batch.rs (kept HEAD's documentary comment; collapsed InvalidRequest|VsInvalid arms per #103's pattern)
- crates/hts/src/backends/sqlite/value_set.rs (added #103's empty-filter-value VsInvalid validation BEFORE HEAD's 3-way property/regex/hierarchy partition, preserving the regex fast path from #102)
Resolved conflicts in crates/hts/src/operations/expand.rs:
- Combined HEAD's displayLanguage/parse_display_language code with the supplements branch's apply_supplement_designations and apply_supplement_properties helpers.
- Re-added populate_concept_flags helper (referenced by the supplements branch but missing from the merged base after #102 + #103 reordering — restored from supplements branch source).
- Kept HEAD's source-VS warning-* expansion.parameter loop AND appended supplements branch's used-supplement entries (both emit distinct parameter names, no overlap).
- Closed parse_display_language_only_wildcard_returns_none test fn that the conflict cut mid-body, then preserved both test groups (parse_display_language + useSupplement).
Resolved conflict in crates/hts/src/operations/lookup.rs by keeping both supplement and hierarchical test fixtures: split into make_supplement_app() (HEAD) and make_hierarchical_app() (#97), preserved all four supplement-tests + the wildcard property synthesis test as separate tokio::test fns.
mauripunzueta and others added 30 commits May 13, 2026 17:17
Prerequisite for retrying Phase 23. The IG `simple/simple-expand-contained`
fixture pins a contained #vs1 with `concept[code2]` (abstract). My D.2
fix (Phase 4) auto-expands code2 → {code2, code2a, code2b}. The IG
expects #vs1 to expand to exactly {code2}: the contained VS is meant
to be "exactly this set" so intersection with sibling refs is
well-defined.

Add `if depth == 0` guard so D.2 only fires at the top-level include
(parameters/parameters-expand-enum-* still pass — they always run at
depth 0). Mirrors sqlite/value_set.rs:3502 (`if depth == 0`).
Phase 23 — backend-only. Targets the last 2 PG IG fails:
  - simple-cases/simple-expand-contained
  - validation/validation-contained-good

Both tests pass an inline ValueSet body whose
  compose.include[].valueSet[]: ["#fragment-id"]
references a sibling ValueSet inside the body's `contained[]` array.

Previous PG behaviour: pg_expand_vs_reference saw the `#` prefix and
returned an empty contribution with a warning. Now it looks up the
matching contained VS and recurses into its compose — same code path
as a stored VS reference but without a DB lookup.

Implementation:
- New compute_expansion_with_contained entry point that accepts the
  inline VS's contained[] slice. compute_expansion (no contained)
  stays as a thin wrapper passing &[].
- Thread `contained: &[serde_json::Value]` through compute_expansion_inner
  / _body / pg_expand_vs_reference.
- Inline-VS expand call site (req.value_set branch) extracts
  vs.get("contained") and passes it through.
- URL-path expand and validate-code paths still pass &[] (URL-based
  VSes don't typically have inline contained that needs resolution).

Mirrors the SQLite InlineResolutionContext (sqlite/value_set.rs:2748+)
in lighter form — no exclude_chain or visited tracking on the contained
itself; just a slice for the lookup.
Phase 24 — backend-only. Targets the LAST PG fail:
validation/validation-contained-good.

The operations-layer `build_validate_response_async` clones the
existing INACTIVE_CONCEPT_FOUND issue as the template for the
specific-status companion (e.g. "...status of retired..."). PG's
CS-side emission included `location: ["Coding"]`, which propagated
to both issues. The IG fixture pins both WITHOUT location.

URL-based VS-validate-code paths (`inactive-2a-validate` et al.) emit
their own INACTIVE_CONCEPT_FOUND WITH location via
finish_validate_code_response in value_set.rs:3858 — those paths are
unaffected by this change.

If clean, this brings PG to 100%. Verifying via dual PG + SQLite CI.
…ions

Targets the 5 PG hts-benchmark preflight non-passes (EX01/EX03 + EX02/EX04/EX07):

Tier 1 — `?fhir_vs` implicit ValueSet short-circuit (EX01, EX03).
PG's expand only consulted `value_sets.url` and a `code_systems.valueSet`
fallback. URLs of the form `<cs>?fhir_vs` and `<cs>?fhir_vs=isa/<code>`
miss both, returning NotFound → preflight "operation not supported".
SQLite handles them via `implicit_short_circuit` at the top of expand:
intercept `parse_fhir_vs_url`, translate to a synthetic compose, then
run it through compute_expansion. Port the same:
  - `?fhir_vs`            → compose = { include: [{ system: cs_url }] }
  - `?fhir_vs=isa/<code>` → compose = { include: [{ system: cs_url,
        filter: [{ property: "concept", op: "is-a", value: <code> }] }] }
The is-a filter is already implemented in pg_filter_is_a (a recursive
CTE over concept_hierarchy), so EX01 reuses an existing PG code path.

Tier 2 — gate `max_expansion_size` on `req.count.is_none()` (EX02/EX04/EX07).
The cap is a guardrail against unbounded materialisation. SQLite skips
it when the caller has bounded the request via `count` (see
backends/sqlite/value_set.rs — `if req.count.is_none() { if let Some(cap) ... }`).
PG enforced the cap regardless, so a `count=10` page request against a
5,975-code VSAC ValueSet (EX04) or a 97,677-code SNOMED hierarchy
filter (EX02) or a 696,663-code multi-system expansion (EX07) 422'd
with VALUESET_TOO_COSTLY even though the caller only wanted 10–200
codes. Mirror SQLite: keep the cap for unbounded requests, skip it
when count is set.

Conformance fixtures (tx-ecosystem-postgres / tx-ecosystem) pin
specific code sets and never page with count, so they still hit the
cap path unchanged — both backends stay at 100%.
Targets the 5 catastrophically-slow PG bench scenarios (EX01/04/05/07/08 —
6-12 RPS vs SQLite's 21-32K) by mirroring SQLite's iter7 inline_compose_index.

Per-instance cache `inline_compose_cache` on PostgresTerminologyBackend.
Keyed by fnv64 of (canonical compose body | URL | version pins | contained
VSes); the request-level filter / count / offset are post-cache slicing knobs
applied to the cached `Arc<Vec<ExpansionContains>>` on each hit.

Wired into three expand branches:
  1. `?fhir_vs[=isa/X]` short-circuit  (EX01)
  2. URL-resolved cached path          (EX03, EX04)
  3. Inline-VS POST                    (EX02/05/06/07/08)

Cache bypassed for hierarchical mode, multi-version overload composes,
default-valueset-version pins, and inline composes with tx-resources — same
guards SQLite applies. Soft cap PG_COMPOSE_CACHE_MAX=4096 entries; new
entries dropped silently once full.

Cold-miss is unchanged (still computes the full expansion via recursive CTE)
and pays a one-time `Arc::new(codes.clone())` to populate. Warm-hit avoids
all DB I/O: pure-Rust filter + page on the cached Vec.

Conformance fixtures keep distinct cache keys per request and are unaffected
— both backends should stay at 100%.
…r 2)

Closes the LK01-04 and SS01 gap by mirroring SQLite's per-instance response
caches (backends/sqlite/code_system.rs:240+, 1683+):

  - `lookup_response_cache`: \$lookup memo keyed by
    `system|code|version|lang|date|properties`. Skipped when
    `useSupplement` is set. Bound `PG_LOOKUP_RESPONSE_CACHE_MAX = 4096`.
  - `cs_resolved_meta_cache`: `(url, version) -> (system_id, name, version)`
    memo wrapping `resolve_code_system`. Bypassed when `date` is set
    (point-in-time filter can't share keys across requests).

Both caches are evicted on `import_bundle` and `delete_normalized` via the
new `clear_response_caches` helper, plus the existing
`inline_compose_cache` from iter 1.

Wiring:
  - lookup(): cache check at top → return memo on hit; populate at end.
  - validate_code(), subsumes(): route their `resolve_code_system` calls
    through `resolve_code_system_cached`.
  - The free `resolve_code_system` function is preserved for tests; the
    cache lives in a thin wrapper.

Single-tenant safe: all paths use TenantContext::system() and cache keys
fold no tenant identity. Conformance fixtures use distinct request
shapes → cold-miss every time → no behaviour change. Benchmarks
(50 VUs × 60 pool entries) get warm-path hits.
Replaces the per-row INSERT loop (5,975 roundtrips for a typical VSAC VS)
with a single UNNEST query over four parallel text[] arrays. Same DELETE +
single-transaction + ON CONFLICT DO NOTHING semantics; the conflict rule
now evaluates per row inside the multi-row INSERT.

Why this matters: EX04 cold-miss (5,975-code VS) currently spends most of
its 6-15s wall-clock in the populate_cache loop — visible in the
hts-bench-pg-25885467342 server log where backend_expand for the URL
takes 15,539ms cold vs 0.5ms warm. Cutting the populate roundtrips to 1
should bring cold-miss latency into hundreds of ms.

Independent of the upcoming single-flight change (iter 3b): both callers
of populate_cache (expand cold-miss + validate_code cold-miss) get the
win automatically.

Schema unchanged. tokio-postgres encodes Vec<&str> as text[] natively;
explicit ::text[] casts on $2..$5 keep the planner from inferring
unknown[]. validate_code's caller continues to call this unchanged.
Replaces the iter-1 cache value type from
  Arc<RwLock<HashMap<u64, Arc<Vec<ExpansionContains>>>>>
with
  Arc<RwLock<HashMap<u64, Arc<OnceCell<Arc<Vec<ExpansionContains>>>>>>>

so that simultaneous cache misses for the same key share one cold-miss
compute. The iter-1 bench (run 25885467342) showed EX04 at vu=50 stuck at
3 RPS because 50 VUs would each independently call compute_expansion +
populate_cache before any one populated the cache. With OnceCell, the
first VU runs the closure (compute + cap-check + populate_cache + clone
into Arc); the other 49 await the same cell and clone the resulting Arc
in O(1).

Refactor details:
- New `cache_cell(cache, key)` helper replaces `cache_get`+`cache_put`.
  Returns an `Arc<OnceCell<...>>`; double-checked locking under the
  HashMap RwLock ensures only one cell exists per key. Returns `None`
  when the cache is at PG_COMPOSE_CACHE_MAX so callers fall back to an
  unprotected compute (warm-set wins).
- Compute + cap-check + populate_cache (URL branch only) move INSIDE
  the get_or_try_init closure. On error the cell stays empty so the
  next caller retries cleanly — `HtsError` doesn't impl Clone, so the
  success-only OnceCell semantics are correct here.
- `enforce_cap()` and `build_implicit_compose_value()` extracted as
  helpers to keep all 3 cache branches (?fhir_vs, URL-resolved Ok,
  inline-VS) using the same cap-check phrasing and compose-build path.
- Pre-borrow request fields into locals before each `async move`
  closure so `req` stays usable for `serve_from_cached(arc, &req)`
  after the cell init.

Conformance fixtures still use distinct cache keys per request so the
single-flight collapse never fires for them — both backends should stay
at 100% conformance.
Closes the remaining PG-vs-SQLite perf gap for SS01 (17-19% of SQLite
after iter 2) and CM01/CM02 (53-57% of SQLite). Mirrors iter 2's
lookup_response_cache shape for the two ops not yet cached.

- subsumes_response_cache: `Arc<RwLock<HashMap<String, Arc<SubsumesResponse>>>>`
  keyed by `"system|version|codeA|codeB"`. Both ancestor-check directions
  (subsumes / subsumed-by / equivalent / not-subsumed) collapse to a
  single cached outcome. Saves 2 recursive-CTE ancestor walks + 2
  find_concept lookups per warm request.

- translate_response_cache: `Arc<RwLock<HashMap<String, Arc<TranslateResponse>>>>`
  keyed by every TranslateRequest field that affects output (url, system,
  code, source, target, target_system, target_code, reverse, date).

Both caches are evicted alongside the existing iter-1/2 caches in
`clear_response_caches` (mod.rs) on import_bundle + delete_normalized.

Caps `PG_SUBSUMES_RESPONSE_CACHE_MAX = PG_TRANSLATE_RESPONSE_CACHE_MAX
= 4096`. Conformance fixtures use distinct keys per request → cold-miss
every time → no behavioural change. Both backends should stay at 100%
conformance.
LK*/VC* benchmark pools hold 2000 distinct codes × 3-5 systems each,
totalling 6-10K keys across the bench run. With the prior 4096 cap, the
first two systems' keys filled the cache and the rest (notably LK04
RxNorm, LK02 LOINC tail) were silently dropped — visible in iter-2 perf
data where LK04 stayed at 67% of SQLite even though LK01 (cached first)
jumped to 67%.

Mirrors SQLite's iter-9 cap-raise from the prior bench-fix loop, which
took VC03 from 956 → 41,148 RPS once the cache could actually hold its
1,831 keys.

Affects all four PG response caches (compose, lookup, subsumes,
translate). Per-entry size is small (tens of KB max for inline-compose;
hundreds of bytes for $lookup); 16K entries × ~10KB worst case = ~160MB
ceiling for the inline_compose_cache, far below memory headroom.

No behavioural change — caps are advisory.
The iter-4+5 PG bench log revealed that EX01 was at 123 RPS even with the
cache fully populated — 50 VUs were all acquiring a deadpool connection
(pool size 16) at the top of expand() and then immediately returning from
the cache hit, holding the connection through serve_from_cached and
returning. The other 34 VUs blocked at pool.get() waiting on connections
that the hit-VUs were needlessly holding.

Fix: before acquiring the pool connection, build the URL or inline-compose
cache key without touching the DB and probe the in-memory OnceCell map for
an *initialized* entry. On hit, return the cached response and never call
pool.get(). On miss, fall through to the existing pool-acquiring branches,
which then double-check the cache via cache_cell (single-flight collapse
on stampede).

  - New `cache_get_initialized()` helper: read-lock on the HashMap, then
    `OnceCell::get()` — returns `Some(Arc<Vec<...>>)` only when the cell
    is fully initialised. Never creates an empty cell.
  - In `expand()`: compute `warm_cache_key` for URL or inline-VS paths
    that match the cold-miss cache_key build, probe, return early on
    hit.

Expected gains:
  - EX01 (60 distinct SNOMED is-a URLs, cold-miss 0.5-2s each, then warm):
    warm-path throughput limited by warm-hit cost alone (sub-millisecond),
    not pool contention. Should jump from 123 RPS to multi-K RPS.
  - All other scenarios with cache hits: same lift on warm path.

Conformance fixtures don't repeat keys → cold-miss every time → no
behaviour change. The cold-miss branch still acquires the pool exactly
as before.
PG had no precomputed transitive ancestor closure, so `pg_filter_is_a` walked
`concept_hierarchy` with a recursive CTE every request. For SNOMED's
`?fhir_vs=isa/404684003` (root Clinical finding, ~350K descendants), the cold
miss took 1959ms; iter 6's pool-free warm-hit only nudged EX01 from 123 →
218 RPS @ vu=50 (vs SQLite's 9808).

Mirrors SQLite's `concept_closure` table at backends/sqlite/schema.rs:232:

* Schema: new `concept_closure(system_id, ancestor_code, descendant_code)` with
  PK `(system_id, ancestor_code, descendant_code)` and reverse-lookup index
  `(system_id, descendant_code)`. FK CASCADE handles CodeSystem deletion.
* Build: Rust BFS with index-based adjacency lists and a u32 generation
  counter for O(1) visited reset. Pairs stream to PG via UNNEST batches of
  50K rows inside a single transaction (~30-60s for SNOMED CT).
* Migrate: `migrate_concept_closure_pg` finds systems with hierarchy but no
  closure rows and builds them. Called at server startup (idempotent) and at
  end-of-CLI `hts import` so the first request doesn't pay the build cost.
* Invalidation: `write_code_system` deletes the system's closure rows so
  multi-batch imports (SNOMED RF2 ~1300 chunks) don't leave a partial
  closure mistaken for complete; the end-of-CLI migration rebuilds once.

Use sites:
* `pg_filter_is_a` queries the closure (PK index scan, O(N descendants))
  with a recursive-CTE fallback for the pre-migration race.
* `validate_fhir_vs::IsA` does a single PK EXISTS check on the closure.
* `?fhir_vs=isa/X` short-circuit in `expand()` adds a paginated SQL fast
  path when `req.count.is_some() && req.filter.is_none() && !hierarchical`.
  Queries `concept_closure JOIN concepts` with LIMIT/OFFSET, bypassing the
  inline_compose_cache. The big SNOMED roots now serve 10 codes from a
  bounded PK scan instead of materialising the full subtree to slice.

Expected: EX01 lifts from 218 RPS toward parity with SQLite (9808 RPS).
Conformance preserved — fixtures use the un-counted compute path which
still benefits from the closure but populates `total` via the cached Vec.
…iter 7a)

The iter 7a post-commit loop used `query_opt("SELECT id FROM code_systems
WHERE url = $1")` to fetch the system_id for newly-imported CodeSystems.
That bombs with "query returned an unexpected number of rows" when a URL
maps to multiple code_systems rows — which is exactly what happens when the
hl7.terminology@7.0.1 SNOMED stub at http://snomed.info/sct is followed by
the RF2 import that adds a second row at the same URL with a fresh
2026-03-01 version. The SNOMED RF2 import bombed mid-stream.

Switch to `client.query(...)` and iterate every matching system_id. Each
candidate is checked for hierarchy edges before triggering the closure
build, so hierarchy-less stub rows are silently skipped.
…ter 7a)

The CLI import succeeded and built all closures, but the server's startup
migration still ran the search query, which blew up with

  ERROR: could not resize shared memory segment "/PostgreSQL.NNNNNNNN"
  to 16777216 bytes: No space left on device

The previous query did `SELECT DISTINCT h.system_id FROM concept_hierarchy`,
which the PG planner turned into a parallel sequential scan of the 1.2M-row
SNOMED hierarchy and asked for a 16MB dynamic-shared-memory segment — even
when there was no closure work to do. CI containers cap `/dev/shm` at
64 MB and PG's DSM allocation failed.

Drive the search off the small `code_systems` table (~100 rows) with two
EXISTS probes against indexed columns. The planner picks index-only scans,
needs no parallel workers, and finishes in milliseconds whether or not
closure rebuilds are pending.
…e (iter 7a-2)

Iter 7a's paginated SQL fast path made backend_expand fast (p50=31ms, p95=172ms
for ?fhir_vs=isa) but bypassed inline_compose_cache. Every request acquired a
pool connection and re-ran the closure SELECT — net EX01 RPS @ vu=50 was 216,
basically unchanged from iter 6's 218. Overall wRPS regressed 480K → 432K
(93% → 81% of SQLite) because cache fill never happened for the hot is-a URLs.

Revert the fast path. Keep the closure-table changes in `pg_filter_is_a` and
`validate_fhir_vs::IsA`, plus the schema, migration, and import wiring. Cold
miss now goes through `compute_expansion` again, which:

1. Calls closure-backed `pg_filter_is_a` (~200-500ms cold for SNOMED's biggest
   roots vs the 1959ms recursive CTE iter 6 measured).
2. Populates inline_compose_cache with the full expansion.
3. Returns serve_from_cached(arc, &req), slicing by offset+count.

Subsequent requests hit the iter-6 pool-free warm-hit path at ~0.2ms. The
cache fills 4-10× faster than iter 6 because the closure scan replaces the
recursive CTE, so the pool-saturation window closes sooner and warm-hit RPS
takes over.

Also drop the per-call `SELECT EXISTS(... concept_closure ...)` precondition
in `pg_filter_is_a` and `validate_fhir_vs::IsA`. Closure is built by
`PostgresTerminologyBackend::new` before the first request lands, so the
fallback path was hot-path overhead with no real coverage.
Mirror the validate_code_handler_cache / expand_handler_cache pattern for
CodeSystem/\$lookup. The backend has its own lookup_response_cache, but a
warm-hit there still pays for:

* supplement_target round-trip per useSupplement
* code_system_language lookup
* clone-then-mutate of LookupResponse
* FHIR Parameters JSON assembly

The new handler cache short-circuits all of that with one HashMap lookup +
Arc<Value> clone. Benefits both backends — the LK02 gap (PG 19840 vs SQLite
35847 RPS = 53% in iter 6) is partly a per-warm-hit overhead problem above
the backend cache, not in it.

Mechanics:
* `LookupHandlerCache = Arc<RwLock<HashMap<String, Arc<Value>>>>` (mirrors
  ValidateCodeHandlerCache shape; cached value is the assembled Parameters
  JSON).
* `build_lookup_cache_key(params)` produces sorted compact-JSON fragments —
  same canonicalisation strategy as `build_validate_code_cache_key`.
* `process_lookup` becomes a thin wrapper: check cache, miss → call
  `process_lookup_inner` (the original body, unchanged) → cache success.
* Cleared alongside the other handler caches in `clear_expand_cache`, so
  bundle imports / CRUD writes invalidate stale entries.
* Bounded to `LOOKUP_HANDLER_CACHE_MAX = 16384` — LK02 has 2 000 distinct
  LOINC codes, the rest of the LK pools fit easily.
…er 7c)

VSAC ValueSets ship `compose.include[].concept[]` lists of 5 000+ codes with
embedded `display` per concept — pure extensional, no filter[], no nested
valueSet[] refs. PG's `compute_expansion` was materialising the full Vec and
running an UNNEST `populate_cache` transaction on every cold-miss, taking
10-30 s per unique URL. EX04 measured 5.5 RPS @ vu=50 (vs SQLite's 27 K).

Mirror `backends/sqlite/value_set.rs:5285` (`compose_page_fast`): parse the
compose JSON, apply optional `exclude[]` + the request `filter`, slice the
requested page — all in memory, no DB round-trip. Wired into both the
URL-resolved and inline-VS branches of `expand()`, mirroring SQLite's
call-site shape. Falls through to the existing `cache_cell` +
`compute_expansion` path when:

* any include has a non-empty `filter[]`,
* any include has a non-empty `valueSet[]` ref,
* any include lacks `concept[]`,
* a version pin / tx_resource / contained[] would change the result.

Cold-miss for EX04 should drop from 10-30 s to ~1 ms. Bench window of 30 s
× 50 VUs no longer exhausted by 10 cold-misses worth of pool-saturation.
Doesn't help EX01 (`?fhir_vs=isa/X` — implicit hierarchy, no compose JSON
to parse) or EX05 (intensional with is-a filter — bails to compute_expansion).

The fallback path for missing embedded display still queries `concepts` per
unique (system, code), but VSAC URLs always embed display, so the typical
fast path makes zero DB calls.
Loads every (parent_code, child_code) row for a system into a
HashMap<parent, Vec<child>> on first use, then BFSes in process memory.
Replaces the per-request recursive CTE in `pg_filter_is_a` — which the
iter 7c re-run server log shows is the EX01/EX05 cold-miss bottleneck
(p99 = 657 ms, max = 1.8 s) because k6 hits ~100 distinct SNOMED roots
per 30 s window, so the existing `inline_compose_cache` warms ~95 % of
requests but the 5 % cold-misses stampede 49 VUs onto OnceCell wait.

Unlike iter 7a's concept_closure table (reverted: 600 MB of transitive
pairs blew PG's 128 MB shared_buffers), this stores only ~1.2 M raw
edges in Rust process memory (~60 MB resident for SNOMED) — no PG
buffer pressure on unrelated tables. Unlike iter 7e-A's per-page BFS,
this preserves the existing cache hierarchy (warm hits still served by
`inline_compose_cache` at the layer above).

Implementation:
- New process-global `HIERARCHY_CACHE: OnceLock<HierarchyEdgeCache>`
  in `mod.rs`. Keyed by `system_id`, single-flight via `OnceCell` per
  entry so a stampede on a fresh system runs one scan instead of 50.
  Avoids threading `&HierarchyEdgeCache` through 4 layers of
  `compute_expansion → apply_compose_filters_pg → pg_filter_is_a`.
- `load_hierarchy_edges` does the single
  `SELECT parent_code, child_code FROM concept_hierarchy WHERE system_id = $1`
  per (boot × system). PK on `(system_id, parent_code, child_code)`
  means the leftmost-column scan is efficient.
- `bfs_descendants_inmem` walks children_of with a HashSet for cycle
  safety (concept_hierarchy is a DAG in SNOMED — concepts can have
  multiple parents, so naive BFS could duplicate-visit).
- `pg_filter_is_a` BFSes then batches one
  `SELECT code, display FROM concepts WHERE code = ANY($2::text[])`
  for the display lookup. Hits the existing `idx_concepts_system_code`.
- `clear_response_caches` evicts the global on import/delete (same
  trigger as the other per-instance caches), so newly-imported edges
  surface on the next request. Conservative — clears all systems
  rather than just the imported one — matches the existing pattern.

Expected impact:
- EX01 cold-miss: ~600 ms → ~10-50 ms once edges are warm. At 50 VUs
  with single-flight queuing on cold-miss, this multiplies effective
  per-code throughput by ~12×.
- EX05 (SNOMED intensional with filter[op=is-a]): same code path,
  same expected delta. p95 should drop from 14 s into double-digit ms.
- LK/VC/CM/EX02/03/04/06/07/08: untouched (they don't go through
  `pg_filter_is_a`).
- First request after server boot on any SNOMED-touching is-a query
  pays the ~1-2 s scan once; subsequent requests amortise it.

Conformance: `pg_filter_is_a` returns the same set of `(code, display)`
tuples as before — same source data (concept_hierarchy + concepts) and
the downstream `apply_compose_filters_pg` already collapses to a
HashSet and sorts, so output ordering and set membership are
unchanged. SQLite backend not touched.
LK05 (PG ~50% of SQLite — 19.7 K vs 39.4 K @ vu=50) pounds the same ~200
nonexistent SNOMED codes per 30 s window. The iter 7d handler-level
$lookup cache short-circuits supplement resolve + backend.lookup() +
code_system_language fetch + Parameters builder for SUCCESSFUL responses,
but is keyed by `Ok(value)` only — every NotFound result re-runs the
entire chain.

Adds a per-AppState `lookup_not_found_cache: NotFoundCache` (same
HashSet<String> type as the existing `not_found_urls` cache for
ValueSet/$expand). Same canonical key as the positive cache, consulted
AFTER the positive miss. Both backends use
`HtsError::NotFound(format!("Concept not found: {code}"))` for missing
concepts (postgres/code_system.rs:1456 and sqlite/code_system.rs:1795),
so the replayed NotFound is byte-identical to a fresh slow-path miss
through the framework's HtsError -> Response converter (status 404 +
OperationOutcome with issue.code="not-found").

Invalidation: cleared in `clear_expand_cache` alongside the positive
caches on bundle import — a newly-imported supplement or CodeSystem
update might define a previously-unknown code. Bounded to
NOT_FOUND_CACHE_MAX = 10_000 entries so probing many distinct unknown
codes cannot grow the cache without bound.

Expected impact:
  - LK05 PG @ vu=50: ~19 K -> close to LK01-04's 22-44 K range
    (the slow path took ~10x the warm-path's microseconds; eliminating
    it puts LK05 in the same throughput band as the other LK* tests).
  - PG/SQLite LK05 ratio should jump from 50% toward parity.
  - LK01-04 unchanged (positive cache still hits before the negative
    cache is consulted).

SQLite gets the same change for parity — its existing `lookup_handler_cache`
pattern has the same negative-result gap, and bench shows it would
benefit similarly.

Conformance: HtsError::NotFound -> 404 mapping is unchanged. The cached
diagnostic string matches what both backends emit. No new failure modes.
Reapplies the iter 7a `concept_closure` design — reverted in `9a74863b` after
the buffer-cache pressure regression — on top of the new iter 7d / 7f / 7i
caches, paired with the PG tuning that addresses the original regression.

## Why this lands now (iter 7a → revert → 7e → 7j arc)

* iter 7a (`09aec678`+) introduced `concept_closure` and rewired
  `pg_filter_is_a` to query it via PK scan. EX01 cold-miss dropped from
  1959 ms (recursive CTE on 1.2M-row `concept_hierarchy`) to ~200-500 ms.
* The unbuilt-cache fast path in `expand()` (paginated SQL bypassing
  `inline_compose_cache`) prevented warm-hit ramp; iter 7a-2 (`b6072493`)
  removed it. Overall wRPS still regressed 480K → 432K because the ~600 MB
  closure index pages evicted the hot LK/VC/EX working set out of the
  default 128 MB `shared_buffers`. Iter 7a was reverted.
* `feat/hts-iter7e-closure-tuned` (tip `208ec592`) restored the closure and
  added `--shm-size=2g -c shared_buffers=1GB -c work_mem=64MB
  -c max_connections=100` to both PG workflows so the closure stops evicting
  the working set. That branch predates iter 7d/7f/7i and can't be merged
  as-is.
* iter 7j: applies the iter 7e file-by-file diff onto the current tip
  (`af305335`, post-7i) while preserving the recently-landed caches.

## What's preserved

* iter 7d (`05b10a9f`): handler-level `$lookup` cache in
  `operations/lookup.rs` + `state.rs`. **Not touched** — iter 7e's diff
  would have backed it out.
* iter 7i (`af305335`): `$lookup` negative-result cache. **Not touched** —
  same reason.
* `inline_compose_cache`, `subsumes_response_cache`,
  `translate_response_cache`, etc. — all unchanged.

## What's removed (and why it's safe)

* iter 7f (`958e08e2`): `HierarchyEdges` struct, process-global
  `HIERARCHY_CACHE` (`OnceLock<HashMap<system_id, OnceCell<…>>>`),
  `load_hierarchy_edges`, `bfs_descendants_inmem`. The closure table
  supersedes this entirely — the only caller (`pg_filter_is_a`) now does
  an indexed scan of `concept_closure` instead of an in-memory BFS over
  cached parent→children edges. Memory footprint also drops by the
  ~60 MB-per-SNOMED-system the edge cache held.

  Grepped the rest of the crate; no other caller used these symbols.
  Removed cleanly rather than left dead. `OnceLock` import in
  `backends/postgres/mod.rs` likewise dropped since `HIERARCHY_CACHE` was
  its only use site.

## What's added (mirroring iter 7e)

* `schema.rs`: `concept_closure` table (PK on
  `(system_id, ancestor_code, descendant_code)`, reverse-lookup index on
  `(system_id, descendant_code)`, FK CASCADE on `code_systems`).
  `build_concept_closure_pg` does Rust-side BFS with u32 generation
  counters for O(1) visited reset and streams pairs to PG via 50K-row
  `UNNEST` batches inside a single transaction. `migrate_concept_closure_pg`
  drives the search off `code_systems` with two EXISTS probes against
  indexed columns — the iter 7a-3 (`b7504ad5`) DSM-exhaustion fix vs the
  original `SELECT DISTINCT FROM concept_hierarchy` which the planner
  turned into a parallel seq-scan demanding >64 MB of `/dev/shm`.
* `mod.rs`: schema apply now followed by `migrate_concept_closure_pg`
  in `new()` so server startup pre-builds any missing closure (no-op on
  CI where `hts import` already built it). New
  `rebuild_missing_closures()` public method for CLI use. `import_bundle`
  records empty CodeSystems before the transaction, then post-commit
  iterates ALL `(url → system_id)` rows (the iter 7a-2 fix from
  `19da4dd9` for the multi-version-URL case where SNOMED's hl7.terminology
  stub and the RF2 import share `http://snomed.info/sct`) and rebuilds the
  closure for any that have hierarchy edges. `write_code_system` deletes
  stale closure rows so multi-batch RF2 imports never leave a partial
  closure mistaken for complete.
* `value_set.rs`: `pg_filter_is_a` rewritten to a single
  `concept_closure JOIN concepts` query. `validate_fhir_vs::IsA` rewritten
  to a single `EXISTS(SELECT 1 FROM concept_closure …)` PK lookup.
  No fallback path: closure is guaranteed built by the time we serve
  requests (`PostgresTerminologyBackend::new` runs the migration before
  the pool returns).
* `main.rs`: CLI bulk-import now calls `backend.rebuild_missing_closures()`
  after the import body so the server's first `?fhir_vs=isa` request
  doesn't trigger a ~30-60 s closure build and blow the 60 s health-check
  timeout.

## PG tuning rationale

The ~600 MB SNOMED `concept_closure` (~20 M rows × text PK) needs to fit
in PG's shared buffer pool alongside the existing hot working set
(concepts, hierarchy, lookup-bound rows). Default `shared_buffers=128 MB`
forces page churn: every closure scan evicts LK rows, every LK lookup
evicts closure pages. Bumping to 1 GB lets both fit simultaneously.

Both PG workflows (`hts-benchmark-postgres.yml`,
`tx-ecosystem-postgres.yml`) now start their PG container with
`--shm-size=2g` and `-c shared_buffers=1GB -c work_mem=64MB
-c max_connections=100`. The `--shm-size` bump is mandatory: PG refuses
to set `shared_buffers` above the container's `/dev/shm` size.

## Expected outcome

EX01 (`?fhir_vs=isa/<SNOMED root>` @ vu=50): 131 → 1000+ RPS, parity
with SQLite. EX05 (intensional with `is-a` filter): 12 → 1000+ RPS.
Rest of the suite holds at iter 7i baseline since the buffer tuning
prevents the iter 7a regression mode.
Builds on iter 7j (concept_closure table + buffer tuning, commit 9893d19).
That commit eliminated the recursive-CTE cost on cold misses, but EX01/EX05
remained bottlenecked because `pg_filter_is_a` still returned the FULL
descendant set (~140k rows for a SNOMED root) which then flowed through
`apply_compose_filters_pg`'s `Vec<ExpansionContains>` materialisation,
in-memory sort, and `Arc` wrap — only to let `serve_from_cached` slice
off 10 rows at the bottom.

This iter restores the paginated SQL serve at the `?fhir_vs=isa/X` URL
handler (and adds a narrow inline-VS variant). When the gate matches
we issue exactly two queries directly against `concept_closure`:

    SELECT COUNT(*) FROM concept_closure
     WHERE system_id=$1 AND ancestor_code=$2;

    SELECT cc.descendant_code, c.display
      FROM concept_closure cc
      JOIN concepts c ON c.system_id=$1 AND c.code=cc.descendant_code
     WHERE cc.system_id=$1 AND cc.ancestor_code=$2
     ORDER BY cc.descendant_code
     LIMIT $3 OFFSET $4;

The PK on `concept_closure (system_id, ancestor_code, descendant_code)`
makes that ORDER BY an index-order range scan — no sort, no temp
table — and the JOIN materialises only the requested page rows. Skips
`compute_expansion`, `apply_compose_filters_pg`, `pg_filter_is_a`'s
full-set fetch, and the in-memory `inline_compose_cache` Arc allocation
(which only mattered because the warmed Arc held 140k items per
distinct root).

This mirrors SQLite's `implicit_expansion_cache`-backed paged serve,
the reason SQLite gets 10k RPS on EX01 vs PG's 183 RPS at iter 7j.

ORDER BY descendant_code matches `apply_compose_filters_pg`'s
`out.sort_by(|a,b| a.code.cmp(&b.code))` so the serve order is the
same; `total = COUNT(*)` keeps `ExpandResponse.total` accurate.

URL fast-path gate (all required; otherwise fall through):
  * pattern is `FhirVsPattern::IsA(root)` (AllConcepts unchanged);
  * `count.is_some()` and `count > 0`;
  * `filter.is_none()` (free-text filter needs the full set);
  * `hierarchical != Some(true)` (tree mode disables paging);
  * no version pins (force_system_versions /
    system_version_defaults / default_value_set_versions) and
    no `tx_resources` — those can change CS resolution and would
    diverge from the closure built at import time;
  * the CS URL resolves to a system_id.

Inline-VS fast path uses the same SQL but a stricter compose shape:
exactly one `include` with `system` + exactly one `is-a` filter on
`property=concept`, no `concept[]`, no nested `valueSet[]`, no
`exclude[]`, no `contained[]`. Multi-filter intersected composes
(e.g. `is-a AND property=`) fall through to the existing closure-
backed compute path; EX05 will see only partial improvement.

History: iter 7a-2 (commit b607249 "drop paginated PG fast path;
rely on closure-backed compute") removed this path on the
assumption that closure-backed `pg_filter_is_a` would be fast
enough. The closure helped on cold misses but the downstream
materialisation cost is the actual EX01/EX05 floor — this restores
the paginated path on top of iter 7j's closure infrastructure.
That's the combination that lets SQLite reach 10k RPS via its
`implicit_expansion_cache` table.

All iter 7j infrastructure (closure schema, build, migrate path,
closure-backed `pg_filter_is_a`) is retained — the fast path is
purely additive and falls through to the existing path when the
gate is not satisfied. Iter 7i's lookup negative cache
(operations/lookup.rs, state.rs) is also retained.
Builds on iter 7k (commit 3183c45, "paginated closure fast-path for
?fhir_vs=isa/X"). That commit moved the fast-path serve onto
`concept_closure` so the page itself is an index-order range scan, but
left in place a `SELECT COUNT(*) FROM concept_closure WHERE
(system_id, ancestor_code) = ($1, $2)` per request for
`ExpandResponse.total`. Profiling EX01 at vu=50 showed only 237 RPS;
the vu=1 latency was ~51 ms/req and almost all of it was that COUNT
query — for a SNOMED root with ~140 k descendants PG can't satisfy it
with an index-only scan without VACUUM, so it has to touch the heap
for every row. SQLite stays at ~10 k RPS on the same shape because its
implicit_expansion_cache memoises the total alongside the materialised
page.

This commit adds a process-global `CLOSURE_COUNT_CACHE` keyed by
`(system_id, root_code)`. The closure table is rebuilt on import, so
the count is invariant for the import lifetime: cache populates only
from the actual query result (never an estimate), so the cached
`total` is bit-exact w.r.t. the COUNT — `ExpandResponse.total` does
not drift.

Both fast-path call sites are wrapped:
  * URL `?fhir_vs=isa/X` path (`system_id` + `root_code`);
  * inline-VS path (`system_id_isa` + `root_code_isa`).

Eviction lives in `PostgresTerminologyBackend::clear_response_caches`
(backends/postgres/mod.rs), which is the same hook used to drop
`inline_compose_cache`, `lookup_response_cache`,
`subsumes_response_cache`, `translate_response_cache`, and
`cs_resolved_meta_cache` after a successful `import_bundle`. That
gives parity with the other per-instance response caches and ensures
a re-import of a CodeSystem can't leave a stale descendant count
visible.

The cache is bounded to 16384 entries (`CLOSURE_COUNT_CACHE_MAX`),
mirroring the existing `PG_COMPOSE_CACHE_MAX` pattern. Bench pools
hold ~100 unique roots per system — well under the cap. Once full,
new entries are silently dropped (the COUNT still runs, just no
memo).

Expected impact on EX01 vu=50: ~237 RPS → 5-10 k RPS (SQLite parity
is ~10 k). The page-serve query is already an indexed range scan, so
once the COUNT is out of the per-request path the remaining latency is
just the JOIN-to-`concepts` lookup for the page rows.

The iter 7k gate conditions and fall-through behaviour are unchanged;
all iter 7j infrastructure (closure schema, build, migrate path) and
iter 7i lookup negative cache are retained.
Cold EX02/EX05 ValueSet $expand on Postgres took 400-3400ms because
compute_expansion's Phase A enumerated the entire CodeSystem
(~350k rows for SNOMED) before Phase B filtered it down. Three fixes:

* Filter-only includes (filter[] present, no concept[]) now use the
  apply_compose_filters_pg result directly instead of enumerating the
  whole system. Mirrors sqlite/value_set.rs:3433.
* The inline-VS closure fast path now also serves op=descendent-of
  (closure scan minus the root self-link), covering the entire EX02
  pool with an indexed SQL LIMIT query.
* New index concept_properties(property, value, concept_id); rewrite
  pg_filter_property_eq to drive from it via an IN subquery instead of
  scanning every concept and probing per row.
EX01 (GET ?fhir_vs=isa/<root>&_count=N&_offset=M) was capped at ~316
RPS at vu50 vs SQLite's ~10,300. The iter-7k closure fast path served
every request with two SQL round-trips while holding one of 16 pool
connections — at vu50 that is a hard 16-conn × hold-time ceiling.

* New process-global CLOSURE_PAGE_CACHE keyed by (url, version, count,
  offset) stores each paginated page's (code, display) rows plus the
  descendant total. A warm hit rebuilds the response and returns
  without ever acquiring a pool connection, mirroring the existing
  inline_compose_cache warm-hit path. Per-key memory is bounded by the
  page window, so it stays small even for ~140k-descendant SNOMED roots.
* Cold closure fast path now populates the cache; cleared on import.
* PG connection pool 16 -> 32 so the cold warm-up phase keeps pace with
  the ~6.5k-entry benchmark pool within a 30s vu50 stage.
Integrates the post-#93 work from feat/hts-terminology-service onto
current main. #93 squash-merged the terminology server; this branch
continued on the pre-squash history, so the integration reconciles the
duplicate #93 content (taking the branch's strictly-newer HTS sources,
none of which main touched post-#93) while preserving main's later
additions (testcontainer retry, #105 PG workflows, #109/#110 bulk-export
workflows, cargo update).

Net delta vs main: 13 files, +6959/-377 — confined to crates/hts +
the two PG CI workflows.

Conflict resolution:
- crates/hts/src/** : take branch (no post-#93 changes on main)
- .github/workflows/{tx-ecosystem,hts-benchmark}-postgres.yml : take
  branch (remote-Docker wiring + shared_buffers/work_mem tuning for the
  iter-7j concept_closure + DB-dump debug artifact; cargo check remains
  a documented temporary until cfg(test) gating follow-up lands)

Supersedes #106 (closed) and #107 (closed).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants