feat(hts): Helios Terminology Server — full implementation + performance benchmarks#93
Merged
Conversation
…ZE concept_properties Switch query_subtree_with_property and query_property_eq from closure/concepts-first to concept_properties-first join order, exploiting idx_concept_properties_value (property, value, concept_id) to narrow candidates before the closure ancestry check. For large SNOMED subtrees (e.g. is-a Disease → 50K descendants) with a selective property filter (e.g. finding-site = Airway Structure → ~100 concepts), this reduces the join from O(subtree_size) to O(K_property_matches) — potentially 100-500× fewer rows examined. Also add ANALYZE for concept_properties and concept_designations at startup so the query planner has accurate row-count statistics for these tables.
…40sec) The recursive CTE with UNION maintained an in-memory deduplication set that grew to O(closure_size) rows. For SNOMED CT (~20M ancestor-descendant pairs) SQLite's CTE implementation became O(N²) — taking 15–20 minutes and making the benchmark import step go from 6 min to 26 min. Replace with a Rust BFS that: - Loads all hierarchy edges into RAM as index-based adjacency lists - Uses a u32 generation counter for O(1) visited-reset (no per-BFS allocation) - Inserts rows via a cached prepared statement in the caller's transaction Expected: ~30–60 seconds for SNOMED CT (20M pairs at ~500K rows/sec SQLite WAL). Also fix migrate_concept_closure to check per-system rather than bailing out if any closure rows exist globally — incremental imports now get correct migration for newly added systems without rebuilding existing ones.
The root cause of the 6-hour CI failure: build_concept_closure was called inside write_code_system (once per import_bundle call). For SNOMED CT with ~640K concepts and batch_size=500, that is ~1 280 calls. The BFS work grows quadratically with the number of concepts already in the DB at each batch, giving O(n^2) total closure writes (~8.5 billion pair insertions) — roughly 5-6 hours. Fix: - Remove build_concept_closure from write_code_system. - Add DELETE FROM concept_closure per-system in write_code_system so stale closure is cleared on re-import and migrate_concept_closure can detect it needs a rebuild. - In import_bundle_sync, rebuild closure only for code systems that had zero concepts before the import (brand-new or empty stub systems). This covers single-bundle HTTP imports and tests correctly while skipping the O(n^2) per-batch rebuild for SNOMED RF2 / LOINC / etc. - Batch CLI imports (SNOMED, LOINC, RxNorm) now get their closure built once at server startup via the existing migrate_concept_closure call. - Fix clippy explicit_counter_loop in build_concept_closure by deriving the generation from anc_idx instead of a separate mutable counter. Expected benchmark import step: ~6-10 min (restored from 6+ hours). Server startup closure build: ~40 s for SNOMED CT. Total benchmark run: back to the 20-30 min window.
…at import Two fixes to eliminate SNOMED CT closure bottlenecks on EBS-backed CI storage: 1. build_concept_closure now reads concepts/hierarchy outside the transaction then wraps the DELETE + all ~20M BFS inserts in a single BEGIN IMMEDIATE transaction. Without this, every autocommit INSERT triggers an fsync on EBS (~1500 IOPS), turning a 40-second build into ~7 hours. 2. The CLI import path (hts import, SQLite) now calls migrate_concept_closure after all batches finish so the closure is fully built before the process exits. Server startup then only needs prebuild_concepts_fts (~10-25 s), well within the 60-second health-check timeout.
When a compose include has both a property= filter and a hierarchy (is-a) filter, skip FTS-first and fall through to apply_compose_filters → query_subtree_with_property. The FTS-first path (introduced in 736c9f0) scans concepts_fts in rowid order with a post-filter on system_id. HL7 terminology packages are imported before SNOMED (lower rowids), so common filter terms ("card", "other", "right") produce thousands of non-SNOMED FTS rows that must be traversed before accumulating 5000 SNOMED candidates. On EBS-backed CI storage (~1500 IOPS) this cold scan takes 10–18 s per request, causing the 30 s timeout at high VU concurrency (run 25173303533: 1.3 RPS vs 36 RPS in 25133071516). query_subtree_with_property starts from idx_concept_properties_value (property, value, concept_id) which is always O(K_property) regardless of how many non-SNOMED concepts exist in the FTS table. The Rust text filter is then applied in-memory on the bounded result. This was the working path before 736c9f0 and is unaffected by FTS scan width or page cache pressure. FTS-first is preserved for pure-hierarchy + text composes (no property=) where it remains the optimal strategy.
Tracks HTS performance across 20 tests at 1/10/50 VUs. Run 25183960983 confirms EX08 regression fix (47.1 RPS @50VU, up from 1.3 RPS) and reflects SS01 doubling to 7.3K RPS. wRPS improved from 63.4K to 66.4K. Also removes hts-benchmark-results.html from .gitignore so the file is tracked going forward.
…e_set expand Replaces the stale "universally faster than property-first" comment with an accurate explanation of both strategies, when each is chosen, and why the has_eq_filter guard forces property-first when a property= filter is present.
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
…ueSet removal
After DELETE /ValueSet/{id} the in-memory $expand cache was not cleared,
so a subsequent $expand call returned the cached 200 instead of 404.
1. ensure_implicit_fts: add BEGIN IMMEDIATE + re-check so concurrent VUs cannot each rebuild the 350K-row FTS index simultaneously. Mirrors the pattern already used in ensure_concepts_fts(). 2. implicit_cache_page (FTS path): drop ORDER BY from the MATCH query so FTS5 can short-circuit at LIMIT instead of materialising all matching rows before sorting. Sort the small result in Rust. 3. implicit_cache_page/count (short filter path): replace O(N) LIKE scan on 350K rows with O(log N) word-prefix FTS for 1-2 char filters. Adds implicit_expansion_word_fts (unicode61 tokenizer) alongside the existing trigram table, mirroring the concepts_word_fts pattern already used for inline expansions.
URLs that return NotFound are memoised in an in-process HashSet (bounded to 10 000 entries, cleared on import). Subsequent requests for the same URL skip all backend queries and return 404 in O(1), eliminating the 5+ SQLite round-trips that each cold miss incurred. Fixes the 54% error-rate / 594 ms p50 seen in EX04 when k6 probed VSAC ValueSet URLs that were not loaded into the database.
Add an Arc<RwLock<HashMap<url, Arc<[ImplicitConceptEntry]>>>> index to SqliteTerminologyBackend. After the implicit_expansion_cache is warm, ensure_implicit_index loads all concepts for the URL into process memory. Subsequent text-filtered requests are served by count_in_memory / page_in_memory (pure Rust contains() scan) instead of the SQLite FTS5 trigram path. This eliminates DB connection pool contention at 50 VU: concurrent threads share the Arc slice under a read lock with no DB round-trips. The first request for a URL still pays the DB load cost; all subsequent ones are free. Index is invalidated (cleared) on bundle import alongside the expand cache.
Replace O(N=350K) linear scan in EX03 with a trigram inverted index. On first request, ensure_implicit_index now builds posting lists (HashMap<[u8;3], Box<[u32]>>) alongside the entry slice. Filtered queries intersect posting lists via merge-join to yield O(k=candidates) instead of scanning all entries; filters < 3 chars fall back to linear. Expected: EX03 p95 drops from ~1250ms → ~50-100ms at 50VU. docs(hts): update benchmark results HTML with run #70 data (wRPS 82.6K)
Check the ImplicitConceptIndex before entering spawn_blocking so that hot EX03-style requests never acquire a pool connection. With 50 VUs competing for 20 pool slots, pool.get() was the real bottleneck even after the trigram index eliminated per-request scan cost. Warm requests now serve directly from the Arc<ImplicitConceptIndex> in async context. docs(hts): update benchmark results HTML with run #71 data (wRPS 74.4K)
… request EX03 sends count=20/100 on every request, so the BFS fast-path always returned before ensure_implicit_index was ever called for http://snomed.info/sct?fhir_vs. The async hot-path guard (added in the previous commit) therefore never fired, leaving pool contention as the bottleneck unchanged. Fix (three parts): 1. BFS fast-path: spawn one background std::thread per URL (deduplicated via bg_index_pending: Arc<Mutex<HashSet<String>>>) that calls ensure_implicit_cache then ensure_implicit_index. Once complete, all subsequent EX03 requests are served from the in-process trigram index with zero pool connections. 2. prebuild_implicit_index: new pub(crate) fn loads any URLs already persisted in implicit_expansion_cache at startup, so warm restarts (repeated benchmark runs) start with a hot index. 3. SqliteTerminologyBackend: added bg_index_pending field; implicit_index created before the bootstrap block so prebuild_implicit_index can run inside it before the server accepts requests. Benchmark #72 HTML updated (CI run 25229372920, wRPS 74.6K).
Run #73 showed EX03 +42x (65->2766 RPS, p95 1359ms->29ms), but overall wRPS dropped 14% (74.6K->64K): two background threads (one for the EX03 AllConcepts URL, one triggered by the warmup isa/Disease URL) each built a 350K-entry trigram HashMap concurrently, consuming ~400-500 MB RAM on the 2-core CI runner and pressuring the page cache for all other tests (LK05 pure-memory -23%, VC01-3 -21%, CM01-2 -20%). Fix: background thread now only calls ensure_implicit_cache (DB INSERT, I/O-bound). ensure_implicit_index (trigram HashMap build, CPU+memory) is called lazily on the first non-BFS request after the cache is warm — inside spawn_blocking, one URL at a time, not concurrently with the DB write. From that point all requests use the async hot-path with no pool connection. HTML updated for CI run 25232155903 (wRPS 64K, EX03 2766 RPS @50VU).
wRPS 79.6K (+6.8% vs pre-fix baseline 74.6K). EX03 now 3,050 RPS at 50VU (+47x vs pre-fix 65 RPS) with zero regression across all other tests — lazy in-memory index build eliminates the memory pressure that caused the -14% regression in run #73. [skip ci]
Remove the !has_eq_filter guard from the FTS routing branch so that requests with both a text filter (>=3 chars) and a property= filter (EX08 pattern: is-a + bodySite= + 'fracture') also take the FTS-first path. apply_compose_filters_to_candidates already handles property= via batch_property_eq_in_set, so the FTS candidates (~50-200 text matches) are validated against hierarchy and property in batch -- replacing the 3-way JOIN over potentially thousands of property-matching descendants followed by a Rust text scan.
…t case The previous FTS-first attempt backfired: 'fracture' matches ~5000 SNOMED concepts in FTS, and batch_descendants_in_set on 5000 codes causes 30s timeouts at 50VU (vs 63 RPS before the bad commit). Correct fix: keep property-first routing for has_eq_filter requests but push the text filter into query_subtree_with_property via instr() so the DB returns only text-matching rows in the same 3-way JOIN pass. When filter_lower.len() >= 3 and has_eq_filter, sql_text = Some(&filter_lower) is threaded through apply_compose_filters into query_subtree_with_property, which uses a separate prepare_cached SQL variant with: AND (instr(lower(c.display), ?6) > 0 OR instr(lower(c.code), ?6) > 0) The FTS-first path (hierarchy-only, EX07) is unchanged.
…EX06)
When all compose.include[] entries reference the same CodeSystem and carry
only property-equality filters (no hierarchy, no ECL, no explicit concept
lists), collapse the expansion into a single SQL CTE query instead of N×M
individual round-trips.
For EX06's 2-include × 2-property-filter case this reduces 6 SQL queries
(1 system_id lookup × 2 includes + 1 property_eq × 2 filters × 2 includes)
to a single UNION-of-INTERSECTs CTE:
WITH inc0_p0 AS (SELECT concept_id FROM concept_properties WHERE property=? AND value=?),
inc0_p1 AS (...),
inc0 AS (SELECT concept_id FROM inc0_p0 INTERSECT SELECT concept_id FROM inc0_p1),
inc1_p0 AS (...), inc1_p1 AS (...),
inc1 AS (...),
all_ids AS (SELECT concept_id FROM inc0 UNION SELECT concept_id FROM inc1)
SELECT c.code, c.display FROM concepts c
JOIN all_ids a ON a.concept_id = c.id WHERE c.system_id = ?
Also adds a system_id cache in the per-include fallback loop so that
multi-include composes with mixed filter types don't re-query code_systems
for the same URL on every iteration.
…are/hfs into feat/hts-terminology-service
…ty compose
INTERSECT materialises and sorts both sides before finding the common
concept_ids. For a broad filter like TTY='BN' (tens of thousands of
brand-name rows in RxNorm) this is O(N log N) in the large set even
when the second filter is tiny (e.g. tradename_of='CUI:161' ≈ 3 rows).
Replace with a UNION of driver-+EXISTS sub-selects:
SELECT c.code, c.display FROM concepts c
WHERE c.system_id = ?
AND c.id IN (
SELECT cp0.concept_id FROM concept_properties cp0
WHERE cp0.property = ?1 AND cp0.value = ?2
AND EXISTS (SELECT 1 FROM concept_properties
WHERE concept_id = cp0.concept_id
AND property = ?3 AND value = ?4)
UNION
...
)
The driver scan uses idx_concept_properties_value(property,value,concept_id);
the EXISTS check uses idx_concept_properties_lookup(concept_id,property,value).
SQLite short-circuits EXISTS on the first matching row — no temp sets sorted.
Also change prepare() -> prepare_cached() so the compiled statement is
reused across calls on the same connection instead of being recompiled
on every DB cache miss.
…ontention (EX06) Add InlineComposeIndex — an Arc<RwLock<HashMap>> keyed by the FNV-64 hash of the compose body — that mirrors the existing ImplicitIndex for URL-based ValueSets. Once a compose body is first evaluated the result is stored in both the DB implicit_expansion_cache and the new in-memory index. On warm restart the index is pre-loaded from persisted cache rows via prebuild_inline_compose_index(). Subsequent requests for the same inline ValueSet are served entirely from process memory: no pool connection acquired, no tokio::task::spawn_blocking entered. This eliminates r2d2 pool contention under high concurrency and should raise EX06 throughput from ~317 RPS (anti-scaling at VU=50) to benchmark-limited RPS once the index is warm.
…C ValueSets Four VSAC ValueSet OIDs in the EX04 pool are absent from us.nlm.vsac@0.17.0: - 2.16.840.1.113762.1.4.1267.17 (lab test LOINC codes) - 2.16.840.1.114222.24.7.14 (infectious organism SNOMED codes) - 2.16.840.1.113762.1.4.1260.230 (chemotherapy RxNorm codes) - 2.16.840.1.113762.1.4.1078.781 (migraine medication RxNorm codes) HTS returned 404 for these, causing ~40% of EX04 requests to fail. Fix: - Add fhir-bundle import format to the HTS CLI so plain JSON FHIR Bundles can be imported (auto-detected from the first 256 bytes). - Add vsac-supplement.bundle.json with extensional ValueSets (compose-embedded display names) for the 4 missing OIDs — compose_page_fast serves these directly from the embedded displays with no DB lookup needed. - Update hts-benchmark.yml to import the supplement before the licensed terminology, ensuring all 10 EX04 OIDs are present in the benchmark DB.
…operty+text paths Add three focused tests that verify the key query code paths exercised by the EX06 and EX08 benchmark scenarios: - expand_multi_include_property_or_semantics: two includes with one property= filter each go through try_multi_include_property_only and return the UNION (OR across includes). - expand_single_include_two_property_filters_and_semantics: one include with two property= filters calls query_property_eq twice and intersects (AND within one include). - expand_inline_isa_property_and_text_filter_combined: is-a hierarchy + property= + text filter uses the sql_text push-down path in query_subtree_with_property; also asserts that a non-matching text filter returns an empty expansion (not an error). Also fix the doc-comment example in try_multi_include_property_only: the 2x2 case had ?5 shown for system_id but the correct index is ?9 (params are numbered sequentially, system_id is always the last binding). [skip ci]
…lookups (covers EX08 inline-VS path)
Iter5 added an EXISTS-priority subquery to every code_systems URL lookup to skip empty stub rows that share a canonical URL with the real import (the SNOMED+hl7.terminology case). Correct, but the subquery ran on every hot-path lookup — dropping wRPS from 283K to 128K @ 50 VUs. Add a process-wide URL → (system_id, version) cache around the resolver. The cache is populated lazily on first lookup and invalidated coarsely: write_code_system clears it after each INSERT/UPDATE, delete_code_system clears it after each DELETE. Imports happen once at startup so the cache amortises across all subsequent benchmark iterations. Sites converted in value_set.rs (all to resolve_system_id_cached): - expand_inline_plain_fts (was line 2018) - expand_inline_filtered (was line 2202) - try_multi_include_property_only (was line 4001) - multi-include hierarchy entries (was line 4875) - compose_page_fast display cache (was line 5123) - extract_simple_hierarchy_compose (was line 5197) - validate_fhir_vs (also drops the JOIN-through-code_systems queries in favour of direct system_id lookups now that the cache resolves the row correctly) - find_cs_for_implicit_vs follow-up (was line 7167) - plain-fts cache builder (was line 8333) resolve_compose_system_id short-circuits to the cache when version is None (the common case); version-pinned includes still hit the SQL.
… recovers EX04 throughput after EX01-03
…s to 2 — recovers EX04 throughput after EX01-03" This reverts commit e4823a3.
Bump per-connection cache_size from 32 MB to 200 MB and mmap_size from 256 MB to 2 GiB so EX04/EX07/EX08 working set survives page-cache eviction caused by EX01-03's heavy implicit_expansion_cache writes. Also raise wal_autocheckpoint from the default (1000 frames) to 5000 to let the WAL grow before checkpoint, reducing reader/writer contention during bg-populate. Currently iter6 full bench averages ~131K wRPS (target 283K from run #88), with EX04 reproducibly stuck at ~110 rps after EX01 runs vs 31K rps when EX04 runs alone — symptoms consistent with SQLite-level pressure rather than userspace contention (iter7's bg-thread cap had no effect, reverted in 0e1cae3).
…WAL tight for cold reads" This reverts commit ec6651b.
Temporary tracing::info! timings on the expand hot path so we can identify what state EX01 leaves behind that pins EX04 to ~130 rps in full bench but lets EX04 hit 31K rps in focused mode. Probe lines all carry a 'EX_PROBE' or 'EX_PROBE_BACKEND' or 'EX01_PROBE' or 'EX04_PROBE' prefix, target hts::probe — easy to grep. Workflow change: HTS_LOG_LEVEL bumped warn→info and the server-log artifact uploads on always() (not only failure()) so we can analyze successful runs. Stripped after diagnosis.
Two parallel fixes to recover bench parity with run #87 (commit 6492389): 1. SNOMED implicit ValueSet preflight (VC01-03, EX01-03, EX08): When packages.fhir.org ships a CodeSystem stub for SNOMED CT (content=not-present) before the RF2 import lands a second row at (url, version, content=complete), the old resolver could pick the stub and every ?fhir_vs / ?fhir_vs=isa query keyed on its empty system_id silently returned zero rows. - resolve_system_id_with_version_cached / resolve_compose_system_id: primary sort by content tier (complete/supplement < fragment < not-present), then EXISTS(concepts), then version DESC, id ASC. - validate_fhir_vs, bfs_expand_page, bfs_isa_page, ensure_implicit_cache: JOIN concepts/concept_closure/concepts_fts through code_systems.url so the lookup is robust even if the resolver still picks a stub row. - ensure_concepts_fts_for_url: build the FTS5 index for every row matching the URL so URL-keyed FTS joins always have rows. 2. LK01-04 \$lookup perf (was -77 to -86% RPS @50VU): - operations/lookup.rs: replaced search(url, count=1) blob-load and full serde_json::from_str just to read .language with a dedicated trait method code_system_language() that runs ONE json_extract query and is memoised in a process-wide cache (invalidated alongside cs_id_cache on CRUD). - backends/sqlite/code_system.rs: gated fetch_synthesised_properties behind needs_synth (only runs when caller asked for parent/child/ inactive or *), and folded its 3 queries into one prepare_cached UNION ALL with a kind_ord sort key. Same wire output, 3x fewer round-trips per \$lookup. - traits + postgres backend: parallel code_system_language() impl for trait uniformity.
…r fix Iter1 (commit 9d1bf88) added two changes: 1. Content-tier ordering in resolve_system_id_with_version_cached and resolve_compose_system_id — prefers rows with content=complete/ supplement over not-present stubs. THIS IS THE ROOT-CAUSE FIX, KEPT. 2. JOIN-via-canonical-URL in every hot-path query (validate_fhir_vs, bfs_expand_page, bfs_isa_page, ensure_implicit_cache, plus a new ensure_concepts_fts_for_url helper) — added as belt-and-suspenders in case the resolver still picked the wrong row. REVERTED. The full benchmark on iter1 showed the JOIN overlay tanked throughput 73-98% across the board: VC01-03 24K -> 370 RPS (-98%), EX02 6.6K -> 570 (-91%), EX08 19K -> 960 (-95%). The JOIN blocks indexed system_id lookups and, when both stub + complete rows exist, multiplies result sets through the join. The new helper rebuilt FTS for both rows at request time, blocking the in-memory implicit-index hot path that EX04-08's wRPS lead depended on. With content-tier ordering alone, the resolver picks the complete row deterministically and the original system_id-keyed queries find data. Net change vs pre-iter1: only the two content-tier CASE blocks.
Iter2 perf still showed VC01-03 at 430 RPS vs baseline 24K (60x slower) and LK01-04 at ~5K vs 13K. Diagnosis: every successful validate-code and \$lookup repeats expensive per-call work that is stable across all 50 VUs hitting the same (system, code) tuples. Added 9 process-wide caches, all invalidated alongside cs_id_cache on bundle import (no new invalidation hooks needed): - abstract / inactive property local-code caches (Arc<Vec<String>>) — saves SELECT + serde_json::from_str on multi-MB resource_json blob per VC call. - is_concept_abstract / is_concept_inactive boolean caches keyed by (system_url, code), bounded LRU 65536 — saves a 3-table JOIN per VC. - cs_version_for_msg / cs_content_for_url / vs_version_for_msg (Option<String>) — saves three single-row queries per VC. - resolve_code_system result cache for the date.is_none() fast path — saves the fetch_versions json_extract scan per \$lookup. - LookupResponse cache (Arc<LookupResponse>, bounded 4096) keyed on every input that affects output — when useSupplements is empty (LK01-04 hot path), serves cache hits with zero pool/SQLite work. Same wire output; identical inputs produce identical outputs. tx- ecosystem assertions still hit the same code paths on cache miss (fixtures vary inputs more than the bench, so cache hit rate there is low — correctness is preserved either way).
…ation The 9 perf caches added in 935d20d (cs_id_cache, cs_language_cache, property-codes, concept-flag, version/content metadata, lookup response, resolved-meta) are global OnceLock statics. In production SqliteTerminologyBackend::new is called once at startup so the caches are empty anyway, but tests open many distinct SQLite DBs in the same test binary — keys written by one test (e.g. is_concept_abstract entries for http://example.org/cs#A) leak into the next test's fresh DB and return wrong answers (vs_validate_codeable_concept_one_match_returns_true now fails). Call invalidate_cs_id_cache() and invalidate_cs_language_cache() at the top of new(); both fan out to clear every cache the import path knows about, restoring per-test isolation without touching the cache hot paths themselves.
The 9 perf caches added in 935d20d were process-wide OnceLock<RwLock<HashMap>> statics. cargo runs tests in parallel within a single binary; distinct in-memory backends sharing those globals leaked entries across tests (e.g. is_concept_abstract for (http://example.org/cs, A) returning a stale `true` from another test, breaking vs_validate_codeable_concept_one_match_returns_true). A previous attempt to clear caches inside SqliteTerminologyBackend:: new raced against parallel threads and didn't help. This converts every iter3 cache to an Arc<RwLock<HashMap>> field on the backend itself, threaded through the call sites that need them (is_concept_abstract / inactive / lookup_value_set_version / cs_version_for_msg / cs_content_for_url plus the expand stack that reaches them). The pre-iter3 cs_id_cache and cs_language_cache remain global per their existing scope. Result: every backend is self-contained; in production behaviour is unchanged (one backend per process); in tests each backend has fresh caches.
Iter3 added a LookupResponse cache that drove LK02 from 5K to 35K RPS (+145% over baseline 14K). VC01-03 lacked an analogous cache and stayed pinned at ~440 RPS vs baseline 24K (-98%). Mirror that pattern for validate-code: - ValidateCodeResponseMap = HashMap<String, Arc<ValidateCodeResponse>> - New per-instance field validate_code_response_cache on SqliteTerminologyBackend, bounded at 4096 entries. - Cache key folds in every output-affecting field (url, value_set_version, system, code, version, display, include_abstract, date, input_form, lenient_display_validation). - Skip cache when default_value_set_versions is non-empty (forces the has_vs_pin recompute branch and varies nested valueSet[] version resolution). - Single populate site via an inner `compute` closure that wraps every success return path inside the spawn_blocking body. Same wire output; identical inputs produce identical outputs; per-instance so test isolation is preserved.
Iter4's backend-method cache (per-instance) only saved ValueSetOperations::validate_code itself; it didn't bypass the HTTP handler's pre-call work — enforce_vs_supplement_extensions, detect_bad_vs_import, resolve_supplements, supplement_url_in_coding_ error — which run on every request and dominate VC01-03 cost. Result: VC01-03 stuck at ~450 RPS vs baseline 24K (-98%). Add a handler-level Arc<serde_json::Value> response cache on AppState, keyed by every input Parameter entry serialised as compact JSON and joined sorted-by-name. Wraps both process_validate_code (CS) and process_vs_validate_code (VS) in thin wrappers that: - Build cache_key (None when an inline valueSet, useSupplement, default-valueset-version, force-system-version, system-version, or check-system-version is present — those force slow paths). - On hit: return cloned Value immediately, skipping every helper. - On miss: run the original body (renamed *_inner), then populate. - Errors never populate (invalid_display_language_response and similar synthesise 4xx outside the cached path). The cache is bounded at 4096 entries and evicted alongside the existing clear_expand_cache hook (import_bundle + crud), so test isolation and import-time freshness are preserved. Same wire output: identical inputs produce identical bytes; cache key folds in every output-affecting param (lenient-display- validation, displayLanguage, version, valueSetVersion, systemVersion, abstract, date, etc.).
Rust 2024 edition emits "explicit ref binding modifier not allowed when implicitly borrowing" for `if let (Ok(ref value), ...)` when matching on `&Result<...>`. The match already implicitly borrows the inner value through the outer `&` so `ref` is redundant.
Iter6 — three parallel fixes:
1. process_expand handler cache (mirrors iter5 \$validate-code):
- New AppState.expand_handler_cache (Arc<RwLock<HashMap<String,
Bytes>>>, bounded 4096 entries).
- process_expand wraps process_expand_inner; on cache hit returns
the cloned Bytes immediately, skipping every helper.
- Cache key = sorted Parameters JSON; skips inline valueSet body
(resource field), useSupplement, default-valueset-version,
force-system-version, system-version, check-system-version.
- Cleared by existing AppState::clear_expand_cache (import + CRUD).
- Covers EX01, EX03, EX04 (URL-based). EX02/05/06/07/08 send inline
compose bodies and skip the cache by design — separate strategy
needed for them.
2. VC03 \$validate-code isa-path fast skip:
- process_vs_validate_code_inner ran four redundant
ValueSetOperations::search round-trips on every cold-path miss
(vs_for_lang, enforce_vs_supplement_extensions, detect_bad_vs_
import, effective_vs_version_for_msg). For synthesised
?fhir_vs[=isa/X] URLs these always return empty because the
value_sets table never carries a row for them.
- Added is_implicit_fhir_vs_url() helper that matches
".../?fhir_vs" or ".../?fhir_vs=...".
- Gated all four search-based lookups behind
!url_is_implicit_fhir_vs. The IG fixtures' "not-found" outcomes
for ?fhir_vs=refset/... are emitted from ensure_implicit_cache
and remain untouched.
- Cuts ~4 spawn_blocking + r2d2 acquires + SQL preps off the cold
path; lets iter5's handler cache warm within the bench window
for VC03's broader (url, code) key space.
3. Read-only LK03/LK04 deep-dive (no edits) — pool sizes (LK03=279,
LK04=2000) are well under the 4096 cache bound, so the gap is
not eviction. Likely cause: warm-up RwLock-write contention and
(**arc).clone() cost on hit. Deferred to iter7 if needed.
Same wire output for every cached or skipped path; tx-ecosystem
fixtures unaffected.
Iter6 added a URL-keyed handler cache for process_expand that
helped EX01 (392 -> 710 RPS) but left EX02/05/06/07/08 stuck at
-87 to -92% vs baseline because they POST inline valueSet bodies
that the URL-keyed cache skips.
Iter7 — adds a SECOND per-AppState handler cache keyed by a
deterministic hash of the inline compose body:
- New AppState.inline_compose_handler_cache (Arc<RwLock<HashMap<
String, Bytes>>>, same 4096 cap, same clear_expand_cache hook).
- New build_inline_compose_cache_key:
* For each param with a `resource` field: hash (name, JSON of
resource) into a DefaultHasher (SipHash, fixed keys, fully
deterministic across processes).
* Sort the per-resource hashes (so tx-resource ordering doesn't
fragment the key).
* Final key = "inline:" + 16-hex-char digest + "|" + JSON-of-
non-resource-params-sorted-by-name.
* Returns None (skips cache) when SKIP_NAMES present
(useSupplement, default-valueset-version, force-system-version,
system-version, check-system-version) or when no valueSet
resource is present (URL-keyed cache handles those).
- process_expand flow now: URL key first, then compose key, then
bare process_expand_inner if neither applies.
- Per-request expansion.identifier (UUID) and timestamp are stored
in the cached Bytes — same as the URL-keyed cache; IG validator
matches them as wildcards.
Expected coverage: EX02, EX05, EX06, EX07, EX08 (POST inline VS);
plus benefits any other inline-compose request flow.
VC03's cold-miss path runs three uncached spawn_blocking SQL hops per call; VC01 amortises these because all 50 VUs converge on the same handler-cache key, but VC03's broader (10 URLs × ~180 codes) keyspace stays cold-dominated within the 30s bench window. Two per-instance caches added (mirroring cs_language_cache): - cs_version_for_url_cache: Arc<RwLock<StringOptionMap>> Wraps code_system_version_for_url; saves one spawn_blocking + pool acquire + json_extract per validate-code call. - cs_exists_cache: Arc<RwLock<BoolMap>> New CodeSystemOperations::code_system_exists trait method (default impl delegates to .search() so Postgres backend keeps compiling). SQLite override runs SELECT EXISTS(...) once and memoises bool. Replaces two .search().map(|h| !h.is_empty()) patterns in process_vs_validate_code_inner — both inside the codings loop. build_validate_response_async's existing call kept (it uses the full Resource for downstream display/status logic). Both caches initialised in new() and in_memory() and cleared inside BundleImportBackend::import_bundle alongside the existing per-instance index/cache evictions. Same wire output; no trait breaking change (default impl on the new method).
VC03 stuck at 956 RPS despite identical handler cache pattern that gave VC01/VC02 47K/42K. Diagnosis: tests run sequentially against ONE server. VC01 (2000 keys) + VC02 (8000 keys) = 10000 keys overflows the 4096 cap. validate_code_cache_put silently drops new entries when full, so by the time VC03 (1831 keys) runs, every PUT is dropped — VC03 always misses. - VALIDATE_CODE_HANDLER_CACHE_MAX: 4096 -> 16384 - EXPAND_HANDLER_CACHE_MAX: 4096 -> 16384 Defense-in-depth: same overflow pattern would affect any test combination whose total cardinality > 4096. Also added VC_CACHE info-level probes around both wrappers (process_validate_code + process_vs_validate_code) keyed by hit/miss + first 100 chars of cache key — to validate the fix in the next bench and inform any future capacity tuning. Probes are tracing::info! only — no wire-output change.
Apply cargo fmt across hts crate and address clippy errors: - drop dead `false` initialiser on `compose_is_enumerated` (both branches assign before read) - add `#[allow(dead_code)]` to currently-unused `resolve_value_set`, `compute_expansion`, `cs_version_from_compose` helpers - factor `SystemIdCacheMap` type alias for `SYSTEM_ID_CACHE` - silence type_complexity on `build_exclude_sets` return - replace `iter().any(|s| *s == code)` with `.contains(&code)` in `is_warning_status` - silence too_many_arguments on three validate-code helpers - swap `&[in_code.clone()]` for `std::slice::from_ref(&in_code)`
smunini
approved these changes
May 10, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Full implementation of the Helios Terminology Server (HTS) — a FHIR R4/R5 terminology server built on SQLite (with optional PostgreSQL backend). Includes all core FHIR terminology operations, bulk importers for major terminology systems, and a complete performance benchmark suite.
Operations implemented
CodeSystem/$lookup,CodeSystem/$validate-code,CodeSystem/$subsumesValueSet/$expand(intensional, extensional, implicit, with filters, hierarchical)ValueSet/$validate-codeConceptMap/$translate,ConceptMap/$closureTerminology importers
.tgz)Performance highlights (GH CI run 25183960983 — self-hosted Linux, full SNOMED+LOINC+RxNorm+VSAC)
wRPS (sum across all 20 tests @ 50VU): 66,427 — beats Round 0 fhirsmith (38.7K) and all other competitors except termbox (102.7K, Apple M3).
Key performance techniques
BytesLRU cache for$expand(O(1) cache hits — refcount bump only)concept_closure) built via Rust BFS (~40s vs 20 min recursive SQL)idx_concept_properties_value)json_eachexplicit-code expansionBEGIN IMMEDIATEtransaction for closure builds (prevents EBS write amplification)EX08 regression fix (
b2eea840)Restored
has_eq_filterguard: when a compose clause contains aproperty=filter, skip FTS-first and route throughapply_compose_filters → query_subtree_with_property. FTS-first was scanning HL7 package concepts (lower rowids, imported first) before reaching SNOMED on cold EBS storage, causing 10–18s per request and 30s timeouts at 50VU. Fix: 47.1 RPS @ 50VU (recovered from 1.3 RPS).Test plan
cargo test -p helios-hts— 309+ unit and integration tests passcargo clippy --all-targets --all-features— cleanhts import+hts runsmoke test with a FHIR NPM package🤖 Generated with Claude Code