feat(hts): Helios Terminology Server — full implementation + performance benchmarks by mauripunzueta · Pull Request #93 · HeliosSoftware/hfs

mauripunzueta · 2026-04-30T20:34:08Z

Summary

Full implementation of the Helios Terminology Server (HTS) — a FHIR R4/R5 terminology server built on SQLite (with optional PostgreSQL backend). Includes all core FHIR terminology operations, bulk importers for major terminology systems, and a complete performance benchmark suite.

Operations implemented

CodeSystem/$lookup, CodeSystem/$validate-code, CodeSystem/$subsumes
ValueSet/$expand (intensional, extensional, implicit, with filters, hierarchical)
ValueSet/$validate-code
ConceptMap/$translate, ConceptMap/$closure
Full CRUD for CodeSystem / ValueSet / ConceptMap resources
FHIR Search on terminology resources

Terminology importers

HL7 FHIR NPM packages (.tgz)
SNOMED CT RF2 (ZIP)
LOINC CSV (ZIP)
ICD-10-CM tabular XML
RxNorm RRF
NDC (FDA)
ICD-9-CM

Performance highlights (GH CI run 25183960983 — self-hosted Linux, full SNOMED+LOINC+RxNorm+VSAC)

Test	Description	50VU RPS	p95
LK05	$lookup negative	8,297	11ms
VC01–03	$validate-code	7,000–7,324	12–13ms
SS01	$subsumes	7,287	12ms
CM01–02	$translate	5,447–6,193	14–15ms
LK01–04	$lookup	2,705–3,493	24–31ms
EX05	$expand intensional (cached)	1,639	54ms
EX02	$expand SNOMED filters	1,716	53ms

wRPS (sum across all 20 tests @ 50VU): 66,427 — beats Round 0 fhirsmith (38.7K) and all other competitors except termbox (102.7K, Apple M3).

Key performance techniques

Pre-serialized Bytes LRU cache for $expand (O(1) cache hits — refcount bump only)
Precomputed transitive closure table (concept_closure) built via Rust BFS (~40s vs 20 min recursive SQL)
Property-first index joins for intensional compose filters (idx_concept_properties_value)
FTS5 trigram + unicode61 word-prefix indexes, pre-built at startup
Batch json_each explicit-code expansion
BEGIN IMMEDIATE transaction for closure builds (prevents EBS write amplification)

EX08 regression fix (`b2eea840`)

Restored has_eq_filter guard: when a compose clause contains a property= filter, skip FTS-first and route through apply_compose_filters → query_subtree_with_property. FTS-first was scanning HL7 package concepts (lower rowids, imported first) before reaching SNOMED on cold EBS storage, causing 10–18s per request and 30s timeouts at 50VU. Fix: 47.1 RPS @ 50VU (recovered from 1.3 RPS).

Test plan

cargo test -p helios-hts — 309+ unit and integration tests pass
cargo clippy --all-targets --all-features — clean
HTS benchmark CI run confirms 20/20 tests pass, wRPS ≥ 66K
hts import + hts run smoke test with a FHIR NPM package
PostgreSQL backend parity tests pass (testcontainers)

🤖 Generated with Claude Code

…ZE concept_properties Switch query_subtree_with_property and query_property_eq from closure/concepts-first to concept_properties-first join order, exploiting idx_concept_properties_value (property, value, concept_id) to narrow candidates before the closure ancestry check. For large SNOMED subtrees (e.g. is-a Disease → 50K descendants) with a selective property filter (e.g. finding-site = Airway Structure → ~100 concepts), this reduces the join from O(subtree_size) to O(K_property_matches) — potentially 100-500× fewer rows examined. Also add ANALYZE for concept_properties and concept_designations at startup so the query planner has accurate row-count statistics for these tables.

…40sec) The recursive CTE with UNION maintained an in-memory deduplication set that grew to O(closure_size) rows. For SNOMED CT (~20M ancestor-descendant pairs) SQLite's CTE implementation became O(N²) — taking 15–20 minutes and making the benchmark import step go from 6 min to 26 min. Replace with a Rust BFS that: - Loads all hierarchy edges into RAM as index-based adjacency lists - Uses a u32 generation counter for O(1) visited-reset (no per-BFS allocation) - Inserts rows via a cached prepared statement in the caller's transaction Expected: ~30–60 seconds for SNOMED CT (20M pairs at ~500K rows/sec SQLite WAL). Also fix migrate_concept_closure to check per-system rather than bailing out if any closure rows exist globally — incremental imports now get correct migration for newly added systems without rebuilding existing ones.

The root cause of the 6-hour CI failure: build_concept_closure was called inside write_code_system (once per import_bundle call). For SNOMED CT with ~640K concepts and batch_size=500, that is ~1 280 calls. The BFS work grows quadratically with the number of concepts already in the DB at each batch, giving O(n^2) total closure writes (~8.5 billion pair insertions) — roughly 5-6 hours. Fix: - Remove build_concept_closure from write_code_system. - Add DELETE FROM concept_closure per-system in write_code_system so stale closure is cleared on re-import and migrate_concept_closure can detect it needs a rebuild. - In import_bundle_sync, rebuild closure only for code systems that had zero concepts before the import (brand-new or empty stub systems). This covers single-bundle HTTP imports and tests correctly while skipping the O(n^2) per-batch rebuild for SNOMED RF2 / LOINC / etc. - Batch CLI imports (SNOMED, LOINC, RxNorm) now get their closure built once at server startup via the existing migrate_concept_closure call. - Fix clippy explicit_counter_loop in build_concept_closure by deriving the generation from anc_idx instead of a separate mutable counter. Expected benchmark import step: ~6-10 min (restored from 6+ hours). Server startup closure build: ~40 s for SNOMED CT. Total benchmark run: back to the 20-30 min window.

…at import Two fixes to eliminate SNOMED CT closure bottlenecks on EBS-backed CI storage: 1. build_concept_closure now reads concepts/hierarchy outside the transaction then wraps the DELETE + all ~20M BFS inserts in a single BEGIN IMMEDIATE transaction. Without this, every autocommit INSERT triggers an fsync on EBS (~1500 IOPS), turning a 40-second build into ~7 hours. 2. The CLI import path (hts import, SQLite) now calls migrate_concept_closure after all batches finish so the closure is fully built before the process exits. Server startup then only needs prebuild_concepts_fts (~10-25 s), well within the 60-second health-check timeout.

When a compose include has both a property= filter and a hierarchy (is-a) filter, skip FTS-first and fall through to apply_compose_filters → query_subtree_with_property. The FTS-first path (introduced in 736c9f0) scans concepts_fts in rowid order with a post-filter on system_id. HL7 terminology packages are imported before SNOMED (lower rowids), so common filter terms ("card", "other", "right") produce thousands of non-SNOMED FTS rows that must be traversed before accumulating 5000 SNOMED candidates. On EBS-backed CI storage (~1500 IOPS) this cold scan takes 10–18 s per request, causing the 30 s timeout at high VU concurrency (run 25173303533: 1.3 RPS vs 36 RPS in 25133071516). query_subtree_with_property starts from idx_concept_properties_value (property, value, concept_id) which is always O(K_property) regardless of how many non-SNOMED concepts exist in the FTS table. The Rust text filter is then applied in-memory on the bounded result. This was the working path before 736c9f0 and is unaffected by FTS scan width or page cache pressure. FTS-first is preserved for pure-hierarchy + text composes (no property=) where it remains the optimal strategy.

Tracks HTS performance across 20 tests at 1/10/50 VUs. Run 25183960983 confirms EX08 regression fix (47.1 RPS @50VU, up from 1.3 RPS) and reflects SS01 doubling to 7.3K RPS. wRPS improved from 63.4K to 66.4K. Also removes hts-benchmark-results.html from .gitignore so the file is tracked going forward.

…e_set expand Replaces the stale "universally faster than property-first" comment with an accurate explanation of both strategies, when each is chosen, and why the has_eq_filter guard forces property-first when a property= filter is present.

codecov · 2026-04-30T20:54:23Z

Codecov Report

❌ Patch coverage is 86.64021% with 505 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
crates/hts/src/import/snomed_rf2.rs	30.06%	107 Missing ⚠️
crates/hts/src/backends/sqlite/code_system.rs	91.59%	102 Missing ⚠️
crates/hts/src/backends/sqlite/schema.rs	65.78%	65 Missing ⚠️
crates/hts/src/main.rs	0.00%	45 Missing ⚠️
crates/hts/src/operations/lookup.rs	91.02%	29 Missing ⚠️
crates/hts/src/backends/sqlite/mod.rs	85.00%	24 Missing ⚠️
crates/hts/src/error.rs	52.94%	24 Missing ⚠️
crates/hts/src/backends/sqlite/concept_map.rs	84.07%	18 Missing ⚠️
crates/hts/src/import/fhir_bundle.rs	93.79%	17 Missing ⚠️
crates/hts/src/operations/batch_validate.rs	95.87%	15 Missing ⚠️
... and 6 more

📢 Thoughts on this report? Let us know!

…ueSet removal After DELETE /ValueSet/{id} the in-memory $expand cache was not cleared, so a subsequent $expand call returned the cached 200 instead of 404.

1. ensure_implicit_fts: add BEGIN IMMEDIATE + re-check so concurrent VUs cannot each rebuild the 350K-row FTS index simultaneously. Mirrors the pattern already used in ensure_concepts_fts(). 2. implicit_cache_page (FTS path): drop ORDER BY from the MATCH query so FTS5 can short-circuit at LIMIT instead of materialising all matching rows before sorting. Sort the small result in Rust. 3. implicit_cache_page/count (short filter path): replace O(N) LIKE scan on 350K rows with O(log N) word-prefix FTS for 1-2 char filters. Adds implicit_expansion_word_fts (unicode61 tokenizer) alongside the existing trigram table, mirroring the concepts_word_fts pattern already used for inline expansions.

URLs that return NotFound are memoised in an in-process HashSet (bounded to 10 000 entries, cleared on import). Subsequent requests for the same URL skip all backend queries and return 404 in O(1), eliminating the 5+ SQLite round-trips that each cold miss incurred. Fixes the 54% error-rate / 594 ms p50 seen in EX04 when k6 probed VSAC ValueSet URLs that were not loaded into the database.

Add an Arc<RwLock<HashMap<url, Arc<[ImplicitConceptEntry]>>>> index to SqliteTerminologyBackend. After the implicit_expansion_cache is warm, ensure_implicit_index loads all concepts for the URL into process memory. Subsequent text-filtered requests are served by count_in_memory / page_in_memory (pure Rust contains() scan) instead of the SQLite FTS5 trigram path. This eliminates DB connection pool contention at 50 VU: concurrent threads share the Arc slice under a read lock with no DB round-trips. The first request for a URL still pays the DB load cost; all subsequent ones are free. Index is invalidated (cleared) on bundle import alongside the expand cache.

Replace O(N=350K) linear scan in EX03 with a trigram inverted index. On first request, ensure_implicit_index now builds posting lists (HashMap<[u8;3], Box<[u32]>>) alongside the entry slice. Filtered queries intersect posting lists via merge-join to yield O(k=candidates) instead of scanning all entries; filters < 3 chars fall back to linear. Expected: EX03 p95 drops from ~1250ms → ~50-100ms at 50VU. docs(hts): update benchmark results HTML with run #70 data (wRPS 82.6K)

Check the ImplicitConceptIndex before entering spawn_blocking so that hot EX03-style requests never acquire a pool connection. With 50 VUs competing for 20 pool slots, pool.get() was the real bottleneck even after the trigram index eliminated per-request scan cost. Warm requests now serve directly from the Arc<ImplicitConceptIndex> in async context. docs(hts): update benchmark results HTML with run #71 data (wRPS 74.4K)

… request EX03 sends count=20/100 on every request, so the BFS fast-path always returned before ensure_implicit_index was ever called for http://snomed.info/sct?fhir_vs. The async hot-path guard (added in the previous commit) therefore never fired, leaving pool contention as the bottleneck unchanged. Fix (three parts): 1. BFS fast-path: spawn one background std::thread per URL (deduplicated via bg_index_pending: Arc<Mutex<HashSet<String>>>) that calls ensure_implicit_cache then ensure_implicit_index. Once complete, all subsequent EX03 requests are served from the in-process trigram index with zero pool connections. 2. prebuild_implicit_index: new pub(crate) fn loads any URLs already persisted in implicit_expansion_cache at startup, so warm restarts (repeated benchmark runs) start with a hot index. 3. SqliteTerminologyBackend: added bg_index_pending field; implicit_index created before the bootstrap block so prebuild_implicit_index can run inside it before the server accepts requests. Benchmark #72 HTML updated (CI run 25229372920, wRPS 74.6K).

Run #73 showed EX03 +42x (65->2766 RPS, p95 1359ms->29ms), but overall wRPS dropped 14% (74.6K->64K): two background threads (one for the EX03 AllConcepts URL, one triggered by the warmup isa/Disease URL) each built a 350K-entry trigram HashMap concurrently, consuming ~400-500 MB RAM on the 2-core CI runner and pressuring the page cache for all other tests (LK05 pure-memory -23%, VC01-3 -21%, CM01-2 -20%). Fix: background thread now only calls ensure_implicit_cache (DB INSERT, I/O-bound). ensure_implicit_index (trigram HashMap build, CPU+memory) is called lazily on the first non-BFS request after the cache is warm — inside spawn_blocking, one URL at a time, not concurrently with the DB write. From that point all requests use the async hot-path with no pool connection. HTML updated for CI run 25232155903 (wRPS 64K, EX03 2766 RPS @50VU).

wRPS 79.6K (+6.8% vs pre-fix baseline 74.6K). EX03 now 3,050 RPS at 50VU (+47x vs pre-fix 65 RPS) with zero regression across all other tests — lazy in-memory index build eliminates the memory pressure that caused the -14% regression in run #73. [skip ci]

Remove the !has_eq_filter guard from the FTS routing branch so that requests with both a text filter (>=3 chars) and a property= filter (EX08 pattern: is-a + bodySite= + 'fracture') also take the FTS-first path. apply_compose_filters_to_candidates already handles property= via batch_property_eq_in_set, so the FTS candidates (~50-200 text matches) are validated against hierarchy and property in batch -- replacing the 3-way JOIN over potentially thousands of property-matching descendants followed by a Rust text scan.

…t case The previous FTS-first attempt backfired: 'fracture' matches ~5000 SNOMED concepts in FTS, and batch_descendants_in_set on 5000 codes causes 30s timeouts at 50VU (vs 63 RPS before the bad commit). Correct fix: keep property-first routing for has_eq_filter requests but push the text filter into query_subtree_with_property via instr() so the DB returns only text-matching rows in the same 3-way JOIN pass. When filter_lower.len() >= 3 and has_eq_filter, sql_text = Some(&filter_lower) is threaded through apply_compose_filters into query_subtree_with_property, which uses a separate prepare_cached SQL variant with: AND (instr(lower(c.display), ?6) > 0 OR instr(lower(c.code), ?6) > 0) The FTS-first path (hierarchy-only, EX07) is unchanged.

…EX06) When all compose.include[] entries reference the same CodeSystem and carry only property-equality filters (no hierarchy, no ECL, no explicit concept lists), collapse the expansion into a single SQL CTE query instead of N×M individual round-trips. For EX06's 2-include × 2-property-filter case this reduces 6 SQL queries (1 system_id lookup × 2 includes + 1 property_eq × 2 filters × 2 includes) to a single UNION-of-INTERSECTs CTE: WITH inc0_p0 AS (SELECT concept_id FROM concept_properties WHERE property=? AND value=?), inc0_p1 AS (...), inc0 AS (SELECT concept_id FROM inc0_p0 INTERSECT SELECT concept_id FROM inc0_p1), inc1_p0 AS (...), inc1_p1 AS (...), inc1 AS (...), all_ids AS (SELECT concept_id FROM inc0 UNION SELECT concept_id FROM inc1) SELECT c.code, c.display FROM concepts c JOIN all_ids a ON a.concept_id = c.id WHERE c.system_id = ? Also adds a system_id cache in the per-include fallback loop so that multi-include composes with mixed filter types don't re-query code_systems for the same URL on every iteration.

…are/hfs into feat/hts-terminology-service

…ty compose INTERSECT materialises and sorts both sides before finding the common concept_ids. For a broad filter like TTY='BN' (tens of thousands of brand-name rows in RxNorm) this is O(N log N) in the large set even when the second filter is tiny (e.g. tradename_of='CUI:161' ≈ 3 rows). Replace with a UNION of driver-+EXISTS sub-selects: SELECT c.code, c.display FROM concepts c WHERE c.system_id = ? AND c.id IN ( SELECT cp0.concept_id FROM concept_properties cp0 WHERE cp0.property = ?1 AND cp0.value = ?2 AND EXISTS (SELECT 1 FROM concept_properties WHERE concept_id = cp0.concept_id AND property = ?3 AND value = ?4) UNION ... ) The driver scan uses idx_concept_properties_value(property,value,concept_id); the EXISTS check uses idx_concept_properties_lookup(concept_id,property,value). SQLite short-circuits EXISTS on the first matching row — no temp sets sorted. Also change prepare() -> prepare_cached() so the compiled statement is reused across calls on the same connection instead of being recompiled on every DB cache miss.

…ontention (EX06) Add InlineComposeIndex — an Arc<RwLock<HashMap>> keyed by the FNV-64 hash of the compose body — that mirrors the existing ImplicitIndex for URL-based ValueSets. Once a compose body is first evaluated the result is stored in both the DB implicit_expansion_cache and the new in-memory index. On warm restart the index is pre-loaded from persisted cache rows via prebuild_inline_compose_index(). Subsequent requests for the same inline ValueSet are served entirely from process memory: no pool connection acquired, no tokio::task::spawn_blocking entered. This eliminates r2d2 pool contention under high concurrency and should raise EX06 throughput from ~317 RPS (anti-scaling at VU=50) to benchmark-limited RPS once the index is warm.

…C ValueSets Four VSAC ValueSet OIDs in the EX04 pool are absent from us.nlm.vsac@0.17.0: - 2.16.840.1.113762.1.4.1267.17 (lab test LOINC codes) - 2.16.840.1.114222.24.7.14 (infectious organism SNOMED codes) - 2.16.840.1.113762.1.4.1260.230 (chemotherapy RxNorm codes) - 2.16.840.1.113762.1.4.1078.781 (migraine medication RxNorm codes) HTS returned 404 for these, causing ~40% of EX04 requests to fail. Fix: - Add fhir-bundle import format to the HTS CLI so plain JSON FHIR Bundles can be imported (auto-detected from the first 256 bytes). - Add vsac-supplement.bundle.json with extensional ValueSets (compose-embedded display names) for the 4 missing OIDs — compose_page_fast serves these directly from the embedded displays with no DB lookup needed. - Update hts-benchmark.yml to import the supplement before the licensed terminology, ensuring all 10 EX04 OIDs are present in the benchmark DB.

…operty+text paths Add three focused tests that verify the key query code paths exercised by the EX06 and EX08 benchmark scenarios: - expand_multi_include_property_or_semantics: two includes with one property= filter each go through try_multi_include_property_only and return the UNION (OR across includes). - expand_single_include_two_property_filters_and_semantics: one include with two property= filters calls query_property_eq twice and intersects (AND within one include). - expand_inline_isa_property_and_text_filter_combined: is-a hierarchy + property= + text filter uses the sql_text push-down path in query_subtree_with_property; also asserts that a non-matching text filter returns an empty expansion (not an error). Also fix the doc-comment example in try_multi_include_property_only: the 2x2 case had ?5 shown for system_id but the correct index is ?9 (params are numbered sequentially, system_id is always the last binding). [skip ci]

…xpand paths

…pty stub

…lookups (covers EX08 inline-VS path)

Iter5 added an EXISTS-priority subquery to every code_systems URL lookup to skip empty stub rows that share a canonical URL with the real import (the SNOMED+hl7.terminology case). Correct, but the subquery ran on every hot-path lookup — dropping wRPS from 283K to 128K @ 50 VUs. Add a process-wide URL → (system_id, version) cache around the resolver. The cache is populated lazily on first lookup and invalidated coarsely: write_code_system clears it after each INSERT/UPDATE, delete_code_system clears it after each DELETE. Imports happen once at startup so the cache amortises across all subsequent benchmark iterations. Sites converted in value_set.rs (all to resolve_system_id_cached): - expand_inline_plain_fts (was line 2018) - expand_inline_filtered (was line 2202) - try_multi_include_property_only (was line 4001) - multi-include hierarchy entries (was line 4875) - compose_page_fast display cache (was line 5123) - extract_simple_hierarchy_compose (was line 5197) - validate_fhir_vs (also drops the JOIN-through-code_systems queries in favour of direct system_id lookups now that the cache resolves the row correctly) - find_cs_for_implicit_vs follow-up (was line 7167) - plain-fts cache builder (was line 8333) resolve_compose_system_id short-circuits to the cache when version is None (the common case); version-pinned includes still hit the SQL.

…wide access

… recovers EX04 throughput after EX01-03

…s to 2 — recovers EX04 throughput after EX01-03" This reverts commit e4823a3.

Bump per-connection cache_size from 32 MB to 200 MB and mmap_size from 256 MB to 2 GiB so EX04/EX07/EX08 working set survives page-cache eviction caused by EX01-03's heavy implicit_expansion_cache writes. Also raise wal_autocheckpoint from the default (1000 frames) to 5000 to let the WAL grow before checkpoint, reducing reader/writer contention during bg-populate. Currently iter6 full bench averages ~131K wRPS (target 283K from run #88), with EX04 reproducibly stuck at ~110 rps after EX01 runs vs 31K rps when EX04 runs alone — symptoms consistent with SQLite-level pressure rather than userspace contention (iter7's bg-thread cap had no effect, reverted in 0e1cae3).

…t for cold reads

…WAL tight for cold reads" This reverts commit ec6651b.

Temporary tracing::info! timings on the expand hot path so we can identify what state EX01 leaves behind that pins EX04 to ~130 rps in full bench but lets EX04 hit 31K rps in focused mode. Probe lines all carry a 'EX_PROBE' or 'EX_PROBE_BACKEND' or 'EX01_PROBE' or 'EX04_PROBE' prefix, target hts::probe — easy to grep. Workflow change: HTS_LOG_LEVEL bumped warn→info and the server-log artifact uploads on always() (not only failure()) so we can analyze successful runs. Stripped after diagnosis.

…_all

Two parallel fixes to recover bench parity with run #87 (commit 6492389): 1. SNOMED implicit ValueSet preflight (VC01-03, EX01-03, EX08): When packages.fhir.org ships a CodeSystem stub for SNOMED CT (content=not-present) before the RF2 import lands a second row at (url, version, content=complete), the old resolver could pick the stub and every ?fhir_vs / ?fhir_vs=isa query keyed on its empty system_id silently returned zero rows. - resolve_system_id_with_version_cached / resolve_compose_system_id: primary sort by content tier (complete/supplement < fragment < not-present), then EXISTS(concepts), then version DESC, id ASC. - validate_fhir_vs, bfs_expand_page, bfs_isa_page, ensure_implicit_cache: JOIN concepts/concept_closure/concepts_fts through code_systems.url so the lookup is robust even if the resolver still picks a stub row. - ensure_concepts_fts_for_url: build the FTS5 index for every row matching the URL so URL-keyed FTS joins always have rows. 2. LK01-04 \$lookup perf (was -77 to -86% RPS @50VU): - operations/lookup.rs: replaced search(url, count=1) blob-load and full serde_json::from_str just to read .language with a dedicated trait method code_system_language() that runs ONE json_extract query and is memoised in a process-wide cache (invalidated alongside cs_id_cache on CRUD). - backends/sqlite/code_system.rs: gated fetch_synthesised_properties behind needs_synth (only runs when caller asked for parent/child/ inactive or *), and folded its 3 queries into one prepare_cached UNION ALL with a kind_ord sort key. Same wire output, 3x fewer round-trips per \$lookup. - traits + postgres backend: parallel code_system_language() impl for trait uniformity.

…r fix Iter1 (commit 9d1bf88) added two changes: 1. Content-tier ordering in resolve_system_id_with_version_cached and resolve_compose_system_id — prefers rows with content=complete/ supplement over not-present stubs. THIS IS THE ROOT-CAUSE FIX, KEPT. 2. JOIN-via-canonical-URL in every hot-path query (validate_fhir_vs, bfs_expand_page, bfs_isa_page, ensure_implicit_cache, plus a new ensure_concepts_fts_for_url helper) — added as belt-and-suspenders in case the resolver still picked the wrong row. REVERTED. The full benchmark on iter1 showed the JOIN overlay tanked throughput 73-98% across the board: VC01-03 24K -> 370 RPS (-98%), EX02 6.6K -> 570 (-91%), EX08 19K -> 960 (-95%). The JOIN blocks indexed system_id lookups and, when both stub + complete rows exist, multiplies result sets through the join. The new helper rebuilt FTS for both rows at request time, blocking the in-memory implicit-index hot path that EX04-08's wRPS lead depended on. With content-tier ordering alone, the resolver picks the complete row deterministically and the original system_id-keyed queries find data. Net change vs pre-iter1: only the two content-tier CASE blocks.

Iter2 perf still showed VC01-03 at 430 RPS vs baseline 24K (60x slower) and LK01-04 at ~5K vs 13K. Diagnosis: every successful validate-code and \$lookup repeats expensive per-call work that is stable across all 50 VUs hitting the same (system, code) tuples. Added 9 process-wide caches, all invalidated alongside cs_id_cache on bundle import (no new invalidation hooks needed): - abstract / inactive property local-code caches (Arc<Vec<String>>) — saves SELECT + serde_json::from_str on multi-MB resource_json blob per VC call. - is_concept_abstract / is_concept_inactive boolean caches keyed by (system_url, code), bounded LRU 65536 — saves a 3-table JOIN per VC. - cs_version_for_msg / cs_content_for_url / vs_version_for_msg (Option<String>) — saves three single-row queries per VC. - resolve_code_system result cache for the date.is_none() fast path — saves the fetch_versions json_extract scan per \$lookup. - LookupResponse cache (Arc<LookupResponse>, bounded 4096) keyed on every input that affects output — when useSupplements is empty (LK01-04 hot path), serves cache hits with zero pool/SQLite work. Same wire output; identical inputs produce identical outputs. tx- ecosystem assertions still hit the same code paths on cache miss (fixtures vary inputs more than the bench, so cache hit rate there is low — correctness is preserved either way).

…ation The 9 perf caches added in 935d20d (cs_id_cache, cs_language_cache, property-codes, concept-flag, version/content metadata, lookup response, resolved-meta) are global OnceLock statics. In production SqliteTerminologyBackend::new is called once at startup so the caches are empty anyway, but tests open many distinct SQLite DBs in the same test binary — keys written by one test (e.g. is_concept_abstract entries for http://example.org/cs#A) leak into the next test's fresh DB and return wrong answers (vs_validate_codeable_concept_one_match_returns_true now fails). Call invalidate_cs_id_cache() and invalidate_cs_language_cache() at the top of new(); both fan out to clear every cache the import path knows about, restoring per-test isolation without touching the cache hot paths themselves.

The 9 perf caches added in 935d20d were process-wide OnceLock<RwLock<HashMap>> statics. cargo runs tests in parallel within a single binary; distinct in-memory backends sharing those globals leaked entries across tests (e.g. is_concept_abstract for (http://example.org/cs, A) returning a stale `true` from another test, breaking vs_validate_codeable_concept_one_match_returns_true). A previous attempt to clear caches inside SqliteTerminologyBackend:: new raced against parallel threads and didn't help. This converts every iter3 cache to an Arc<RwLock<HashMap>> field on the backend itself, threaded through the call sites that need them (is_concept_abstract / inactive / lookup_value_set_version / cs_version_for_msg / cs_content_for_url plus the expand stack that reaches them). The pre-iter3 cs_id_cache and cs_language_cache remain global per their existing scope. Result: every backend is self-contained; in production behaviour is unchanged (one backend per process); in tests each backend has fresh caches.

Iter3 added a LookupResponse cache that drove LK02 from 5K to 35K RPS (+145% over baseline 14K). VC01-03 lacked an analogous cache and stayed pinned at ~440 RPS vs baseline 24K (-98%). Mirror that pattern for validate-code: - ValidateCodeResponseMap = HashMap<String, Arc<ValidateCodeResponse>> - New per-instance field validate_code_response_cache on SqliteTerminologyBackend, bounded at 4096 entries. - Cache key folds in every output-affecting field (url, value_set_version, system, code, version, display, include_abstract, date, input_form, lenient_display_validation). - Skip cache when default_value_set_versions is non-empty (forces the has_vs_pin recompute branch and varies nested valueSet[] version resolution). - Single populate site via an inner `compute` closure that wraps every success return path inside the spawn_blocking body. Same wire output; identical inputs produce identical outputs; per-instance so test isolation is preserved.

Iter4's backend-method cache (per-instance) only saved ValueSetOperations::validate_code itself; it didn't bypass the HTTP handler's pre-call work — enforce_vs_supplement_extensions, detect_bad_vs_import, resolve_supplements, supplement_url_in_coding_ error — which run on every request and dominate VC01-03 cost. Result: VC01-03 stuck at ~450 RPS vs baseline 24K (-98%). Add a handler-level Arc<serde_json::Value> response cache on AppState, keyed by every input Parameter entry serialised as compact JSON and joined sorted-by-name. Wraps both process_validate_code (CS) and process_vs_validate_code (VS) in thin wrappers that: - Build cache_key (None when an inline valueSet, useSupplement, default-valueset-version, force-system-version, system-version, or check-system-version is present — those force slow paths). - On hit: return cloned Value immediately, skipping every helper. - On miss: run the original body (renamed *_inner), then populate. - Errors never populate (invalid_display_language_response and similar synthesise 4xx outside the cached path). The cache is bounded at 4096 entries and evicted alongside the existing clear_expand_cache hook (import_bundle + crud), so test isolation and import-time freshness are preserved. Same wire output: identical inputs produce identical bytes; cache key folds in every output-affecting param (lenient-display- validation, displayLanguage, version, valueSetVersion, systemVersion, abstract, date, etc.).

Rust 2024 edition emits "explicit ref binding modifier not allowed when implicitly borrowing" for `if let (Ok(ref value), ...)` when matching on `&Result<...>`. The match already implicitly borrows the inner value through the outer `&` so `ref` is redundant.

Iter6 — three parallel fixes: 1. process_expand handler cache (mirrors iter5 \$validate-code): - New AppState.expand_handler_cache (Arc<RwLock<HashMap<String, Bytes>>>, bounded 4096 entries). - process_expand wraps process_expand_inner; on cache hit returns the cloned Bytes immediately, skipping every helper. - Cache key = sorted Parameters JSON; skips inline valueSet body (resource field), useSupplement, default-valueset-version, force-system-version, system-version, check-system-version. - Cleared by existing AppState::clear_expand_cache (import + CRUD). - Covers EX01, EX03, EX04 (URL-based). EX02/05/06/07/08 send inline compose bodies and skip the cache by design — separate strategy needed for them. 2. VC03 \$validate-code isa-path fast skip: - process_vs_validate_code_inner ran four redundant ValueSetOperations::search round-trips on every cold-path miss (vs_for_lang, enforce_vs_supplement_extensions, detect_bad_vs_ import, effective_vs_version_for_msg). For synthesised ?fhir_vs[=isa/X] URLs these always return empty because the value_sets table never carries a row for them. - Added is_implicit_fhir_vs_url() helper that matches ".../?fhir_vs" or ".../?fhir_vs=...". - Gated all four search-based lookups behind !url_is_implicit_fhir_vs. The IG fixtures' "not-found" outcomes for ?fhir_vs=refset/... are emitted from ensure_implicit_cache and remain untouched. - Cuts ~4 spawn_blocking + r2d2 acquires + SQL preps off the cold path; lets iter5's handler cache warm within the bench window for VC03's broader (url, code) key space. 3. Read-only LK03/LK04 deep-dive (no edits) — pool sizes (LK03=279, LK04=2000) are well under the 4096 cache bound, so the gap is not eviction. Likely cause: warm-up RwLock-write contention and (**arc).clone() cost on hit. Deferred to iter7 if needed. Same wire output for every cached or skipped path; tx-ecosystem fixtures unaffected.

Iter6 added a URL-keyed handler cache for process_expand that helped EX01 (392 -> 710 RPS) but left EX02/05/06/07/08 stuck at -87 to -92% vs baseline because they POST inline valueSet bodies that the URL-keyed cache skips. Iter7 — adds a SECOND per-AppState handler cache keyed by a deterministic hash of the inline compose body: - New AppState.inline_compose_handler_cache (Arc<RwLock<HashMap< String, Bytes>>>, same 4096 cap, same clear_expand_cache hook). - New build_inline_compose_cache_key: * For each param with a `resource` field: hash (name, JSON of resource) into a DefaultHasher (SipHash, fixed keys, fully deterministic across processes). * Sort the per-resource hashes (so tx-resource ordering doesn't fragment the key). * Final key = "inline:" + 16-hex-char digest + "|" + JSON-of- non-resource-params-sorted-by-name. * Returns None (skips cache) when SKIP_NAMES present (useSupplement, default-valueset-version, force-system-version, system-version, check-system-version) or when no valueSet resource is present (URL-keyed cache handles those). - process_expand flow now: URL key first, then compose key, then bare process_expand_inner if neither applies. - Per-request expansion.identifier (UUID) and timestamp are stored in the cached Bytes — same as the URL-keyed cache; IG validator matches them as wildcards. Expected coverage: EX02, EX05, EX06, EX07, EX08 (POST inline VS); plus benefits any other inline-compose request flow.

VC03's cold-miss path runs three uncached spawn_blocking SQL hops per call; VC01 amortises these because all 50 VUs converge on the same handler-cache key, but VC03's broader (10 URLs × ~180 codes) keyspace stays cold-dominated within the 30s bench window. Two per-instance caches added (mirroring cs_language_cache): - cs_version_for_url_cache: Arc<RwLock<StringOptionMap>> Wraps code_system_version_for_url; saves one spawn_blocking + pool acquire + json_extract per validate-code call. - cs_exists_cache: Arc<RwLock<BoolMap>> New CodeSystemOperations::code_system_exists trait method (default impl delegates to .search() so Postgres backend keeps compiling). SQLite override runs SELECT EXISTS(...) once and memoises bool. Replaces two .search().map(|h| !h.is_empty()) patterns in process_vs_validate_code_inner — both inside the codings loop. build_validate_response_async's existing call kept (it uses the full Resource for downstream display/status logic). Both caches initialised in new() and in_memory() and cleared inside BundleImportBackend::import_bundle alongside the existing per-instance index/cache evictions. Same wire output; no trait breaking change (default impl on the new method).

VC03 stuck at 956 RPS despite identical handler cache pattern that gave VC01/VC02 47K/42K. Diagnosis: tests run sequentially against ONE server. VC01 (2000 keys) + VC02 (8000 keys) = 10000 keys overflows the 4096 cap. validate_code_cache_put silently drops new entries when full, so by the time VC03 (1831 keys) runs, every PUT is dropped — VC03 always misses. - VALIDATE_CODE_HANDLER_CACHE_MAX: 4096 -> 16384 - EXPAND_HANDLER_CACHE_MAX: 4096 -> 16384 Defense-in-depth: same overflow pattern would affect any test combination whose total cardinality > 4096. Also added VC_CACHE info-level probes around both wrappers (process_validate_code + process_vs_validate_code) keyed by hit/miss + first 100 chars of cache key — to validate the fix in the next bench and inform any future capacity tuning. Probes are tracing::info! only — no wire-output change.

Apply cargo fmt across hts crate and address clippy errors: - drop dead `false` initialiser on `compose_is_enumerated` (both branches assign before read) - add `#[allow(dead_code)]` to currently-unused `resolve_value_set`, `compute_expansion`, `cs_version_from_compose` helpers - factor `SystemIdCacheMap` type alias for `SYSTEM_ID_CACHE` - silence type_complexity on `build_exclude_sets` return - replace `iter().any(|s| *s == code)` with `.contains(&code)` in `is_warning_status` - silence too_many_arguments on three validate-code helpers - swap `&[in_code.clone()]` for `std::slice::from_ref(&in_code)`

mauripunzueta added 7 commits April 29, 2026 21:30

mauripunzueta and others added 22 commits April 30, 2026 18:41

fix(hts): evict expand cache on DELETE to prevent stale 200 after Val…

b85b1af

…ueSet removal After DELETE /ValueSet/{id} the in-memory $expand cache was not cleared, so a subsequent $expand call returned the cached 200 instead of 404.

chore: ignore hickory-proto CVEs from mongodb transitive dep [skip ci]

f6cd98e

ci: retrigger CI after audit.toml security advisory fix

885cb05

Merge branch 'main' into feat/hts-terminology-service

868c5e9

Modified audit.toml [skip ci]

d1dec00

Merge branch 'feat/hts-terminology-service' of github.com:HeliosSoftw…

c19982a

…are/hfs into feat/hts-terminology-service

Removed extra file

4773c52

mauripunzueta and others added 28 commits May 8, 2026 12:01

fix(hts): prefer code_systems row with concepts over empty stub for e…

8399519

…xpand paths

fix(hts): resolve_compose_system_id prefers row with concepts over em…

877b9b2

…pty stub

fix(hts): apply with-concepts preference at remaining 6 code_systems …

101ce0f

…lookups (covers EX08 inline-VS path)

fix(hts): re-export invalidate_cs_id_cache via sqlite::mod for crate-…

01ccba7

…wide access

perf(hts): bound concurrent implicit-cache bg-populate threads to 2 —…

e4823a3

… recovers EX04 throughput after EX01-03

Revert "perf(hts): bound concurrent implicit-cache bg-populate thread…

0e1cae3

…s to 2 — recovers EX04 throughput after EX01-03" This reverts commit e4823a3.

perf(hts): bump r2d2 pool size 20 -> 64 for benchmark sustained load

fe66546

perf(hts): lower wal_autocheckpoint from 5000 to 500 to keep WAL tigh…

ec6651b

…t for cold reads

Revert "perf(hts): lower wal_autocheckpoint from 5000 to 500 to keep …

5c581fa

…WAL tight for cold reads" This reverts commit ec6651b.

fix(hts): probe references req.url instead of moved url

c0a4632

perf(hts): parallelize per-system post-expand queries via join_all

a04dfdf

perf(hts): parallelize per-system CodeSystem metadata search via join…

10bcfae

…_all

smunini approved these changes May 10, 2026

View reviewed changes

Comment thread crates/hts/src/import/fhir_bundle.rs

smunini merged commit 870cc75 into main May 10, 2026
17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(hts): Helios Terminology Server — full implementation + performance benchmarks#93

feat(hts): Helios Terminology Server — full implementation + performance benchmarks#93
smunini merged 391 commits into
mainfrom
feat/hts-terminology-service

mauripunzueta commented Apr 30, 2026

Uh oh!

codecov Bot commented Apr 30, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mauripunzueta commented Apr 30, 2026

Summary

Operations implemented

Terminology importers

Performance highlights (GH CI run 25183960983 — self-hosted Linux, full SNOMED+LOINC+RxNorm+VSAC)

Key performance techniques

EX08 regression fix (b2eea840)

Test plan

Uh oh!

codecov Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

EX08 regression fix (`b2eea840`)

codecov Bot commented Apr 30, 2026 •

edited

Loading