Skip to content

feat(subscriptions): implement WebSocket channel — Phase 2#62

Merged
smunini merged 87 commits into
mainfrom
feature/subcriptions
Apr 21, 2026
Merged

feat(subscriptions): implement WebSocket channel — Phase 2#62
smunini merged 87 commits into
mainfrom
feature/subcriptions

Conversation

@aacruzgon
Copy link
Copy Markdown
Contributor

@aacruzgon aacruzgon commented Apr 12, 2026

Summary

Implements Phase 2 of the FHIR Topic-Based Subscriptions roadmap: the WebSocket channel. Builds on top of the Phase 1 rest-hook infrastructure without any breaking changes.

Clients subscribe with channel.type = "websocket", obtain a short-lived binding token from $get-ws-binding-token, connect to /ws/subscriptions/bind?token=<token>, and receive notification bundles as JSON text frames. The protocol is unidirectional (server → client).

Related: discussion #59, Phase 1 PR #61.

What changed

helios-subscriptions crate

New files:

  • src/channels/ws_token.rsWsBindingTokenManager: generates UUID v4 tokens with configurable expiry (default 30 s), single-use via atomic DashMap::remove()
  • src/channels/ws_manager.rsWebSocketManager: tracks (tenant_id, subscription_id) → Vec<(client_id, UnboundedSender)>; broadcasts notifications, prunes closed channels, removes all clients on subscription deletion
  • src/channels/websocket.rsWebSocketChannel implementing ChannelDispatcher: dispatches by broadcasting to connected clients (best-effort, always Success); handshake is a no-op (activation is immediate)
  • tests/websocket_integration.rs — 8 integration tests: activation, delivery to single/multiple clients, zero-client no-panic, disconnected-client cleanup, binding token lifecycle, subscription-delete closes clients, tenant isolation

Modified files:

  • src/channels/mod.rs — declare the three new modules
  • src/config.rs — add ws_token_lifetime_secs: i64 (default 30)
  • src/engine/mod.rs — add ws_manager, ws_channel, ws_token_manager fields; expose ws_manager() / ws_token_manager() accessors; add ChannelType::Websocket arms to activate_subscription() and dispatch_with_retry(); call ws_manager.remove_all_clients() when a Subscription is deleted
  • src/lib.rs — re-export WebSocketManager and WsBindingTokenManager for use by the REST layer

helios-rest crate

New files:

  • src/handlers/ws.rsws_bind_handler: validates the binding token, upgrades the HTTP connection, sends the handshake bundle as the first frame, then loops tokio::select! over the notification channel and the socket (for close detection); cleans up via ws_manager.remove_client() on exit

Modified files:

  • Cargo.toml — add "ws" to axum features (pulls in tokio-tungstenite automatically)
  • src/handlers/subscriptions.rs — add get_ws_binding_token_handler: validates channel type is websocket, generates token, returns Parameters with token / expiration / websocket-url
  • src/handlers/mod.rs — register ws module behind #[cfg(feature = "subscriptions")]
  • src/routing/fhir_routes.rs — add /{resource_type}/{id}/$get-ws-binding-token and /ws/subscriptions/bind routes
  • src/lib.rs — add "websocket" to supported_channel_types in the engine initializer

Design decisions

WebSocket dispatch is best-effort. dispatch() always returns DispatchResult::Success even when zero clients are connected. Treating zero clients as an error would incorrectly drive the subscription into error / off status — a WebSocket subscription is valid before any client connects.

Handshake is deferred to connect time. Unlike rest-hook (which POSTs a handshake to validate the endpoint), WebSocket subscriptions activate immediately. The handshake bundle is sent as the first WebSocket frame when a client connects and binds.

Binding tokens are single-use. DashMap::remove() is atomic — a token can only be consumed once even under concurrent connection attempts.

Subscription deletion closes clients. When a Subscription resource is deleted, ws_manager.remove_all_clients() drops all sender halves. This causes the rx.recv() calls in the WebSocket handlers to return None, triggering graceful close without an explicit close frame.

No retry for WebSocket. The retry loop in dispatch_with_retry still runs, but since WebSocketChannel.dispatch() always returns Success, it exits on the first call. Retries are meaningless for in-process channel broadcasting.

Test plan

  • cargo test -p helios-subscriptions — 116 tests pass (106 unit + 2 rest-hook integration + 8 WebSocket integration)
  • cargo test -p helios-rest — 170 tests pass
  • cargo clippy --all-targets --all-features -- -D warnings … — zero warnings
  • cargo fmt --all — clean
  • Manual smoke test: HFS_SUBSCRIPTIONS_ENABLED=true cargo run --bin hfs --features subscriptions, create a websocket subscription, call $get-ws-binding-token, connect with websocat "ws://localhost:8080/ws/subscriptions/bind?token=<token>", POST a matching resource, verify notification bundle arrives

  assertions and monotonic event checks
  │ needless lifetimes in websocket integration helpers
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 12, 2026

@aacruzgon
Copy link
Copy Markdown
Contributor Author

aacruzgon commented Apr 13, 2026

Additional fixes in the subscriptions smoke regression and closes the R4 backport gaps that caused it.

1) R4 backport topic handling (Basic) is now implemented

  • Added strict parsing for R4 backport topics represented as Basic resources.
  • Topic detection requires:
    • resourceType = Basic
    • code.coding includes system=http://hl7.org/fhir/fhir-types + code=SubscriptionTopic
    • required core topic extensions (canonical URL + resource trigger)
  • Added parsing for trigger/filter/notification-shape backport extensions into internal TopicDefinition.

Why:
Our smoke/job was failing at POST /SubscriptionTopic on R4 builds because native SubscriptionTopic is not valid in R4.
Backport topics in R4 are represented through Basic, so we now support that path directly and safely (strict markers prevent accidental misclassification of arbitrary Basic resources).


2) Engine lifecycle now recognizes R4 Basic topic events

  • Extended subscription engine event routing to process R4 Basic topic create/update/delete lifecycle events.
  • Non-topic Basic resources still flow through normal evaluation (no behavior change there).

Why:
Even with parser support, topics were never registered unless event type was native SubscriptionTopic. This blocked end-to-end R4 topic registration.


3) Backport payload-content extraction improved for R4 Subscription

  • R4 channel parsing now prefers channel._payload.extension(backport-payload-content) and falls back to the previous root extension path for compatibility.

Why:
This aligns with backport-style payload extension placement while preserving compatibility with older payload shapes.


4) WebSocket binding protocol switched to bind-with-token

  • /ws/subscriptions/bind now requires first WS message: bind-with-token <token>.
  • Token validation and subscription binding happen after upgrade on that message.
  • Enforced single bind per connection; additional bind attempts are rejected.

Why:
This aligns runtime behavior with the backport websocket interaction model and removes ambiguity from query-string token binding.


5) External subscriptions smoke test updated for true R4 backport flow

  • Topic creation changed from POST /SubscriptionTopic to POST /Basic with backport topic extensions.
  • R4 Subscription payloads now include backport profile and payload-content extension.
  • WebSocket smoke flow now:
    1. calls $get-ws-binding-token
    2. connects to websocket-url
    3. sends bind-with-token <token> over the socket
  • Kept existing handshake/event assertions (Bundle.type=history, Parameters.type checks).

Why:
The previous smoke was testing native-topic behavior against an R4 build, causing immediate failure before notification logic executed. This now validates the intended R4 backport contract end-to-end.


6) Documentation updated

  • README websocket binding docs now describe the bind-message flow instead of ?token= URL binding.

Validation Performed

  • cargo test -p helios-subscriptions (full suite passed)
  • Targeted/new tests for:
    • R4 Basic topic parsing
    • R4 Basic topic engine registration
    • R4 payload-content via channel._payload extension
    • websocket bind message parsing
  • cargo check -p helios-rest --features subscriptions
  • smoke script syntax check (bash -n)

Net effect

This makes the R4 path actually backport-compliant in the tested areas, unblocks the external subscriptions smoke workflow, and removes protocol drift between documented and implemented websocket binding behavior.

Copy link
Copy Markdown
Contributor

@smunini smunini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall a good PR, however, I think there was a compilation without any fhir version feature and it caused a bunch of changes that are unnecessary. I think we can revert all of the changes to the fhirpath crate for example. It's ok to assume we will always have at least one fhirversion feature turned on (or all of them) during a compile. Perhaps something to add to CLAUDE.md.

Comment thread crates/fhirpath/src/cli.rs Outdated
Comment thread crates/fhirpath/src/models.rs Outdated
Comment thread crates/fhirpath/src/cli.rs Outdated
Comment thread crates/fhirpath/src/handlers.rs Outdated
Comment thread crates/fhirpath/src/models.rs
Comment thread crates/rest/Cargo.toml
Comment thread crates/rest/Cargo.toml
Comment thread crates/subscriptions/Cargo.toml
Comment thread crates/subscriptions/Cargo.toml
Comment thread crates/subscriptions/Cargo.toml
@aacruzgon aacruzgon force-pushed the feature/subcriptions branch from 34c0915 to 13be1ac Compare April 16, 2026 20:45
aacruzgon and others added 20 commits April 16, 2026 22:43
  default-features=false on helios-fhir dep
  default_cli_fhir_version cfg ladder
  restore cfg-gated fallback arm in parse_fhir_resource
   unconditional fallback arm in extract_parameters
  keep helios-fhirpath/R<X> forwarding for cfg-gated arms
  default-features=false on intra-workspace deps; keep axum ws
  helios-subscriptions?/R<X> forwarding and default-features=false
  FhirVersion::notification_bundle_type helper
The match in get_compartment_params_for_version cfg-gates each arm by
FHIR feature, but FhirVersion variants are compiled based on helios-fhir's
own features (which default to R4). Builds that exclude R4 on helios-rest
(e.g. single-R4B) leave FhirVersion::R4 uncovered. A wildcard fallback
previously handled this but was dropped in e10a951.

Return Result<_, String> with an explicit Err for the fallback, matching
the pattern used in responses/format.rs, and map it to RestError::InternalError
at the call site.
Internal workspace crates pulled in helios-fhir, helios-persistence,
helios-fhirpath, helios-sof, helios-serde, and helios-subscriptions with
their default features on. Because cargo unifies features across the
build graph, helios-fhir/R4 was forced on by default even in builds
targeting only R4B/R5/R6, producing FhirVersion variants that the
version-gated match arms in callers could not cover.

Add default-features = false to internal consumers. Each parent crate's
R<x>/backend feature table already forwards what it needs, so no
functional change when default features are selected upstream.

Known leak: helios-audit still pulls helios-fhir/R4 unconditionally
because its source hardcodes helios_fhir::r4::AuditEvent. Wildcard
fallback arms (already used in responses/format.rs and fhir_types.rs)
are extended to the four analogous matches in helios-fhirpath so the
leak stays harmless. Fix a pre-existing bug in fhir_types.rs where
get_r5_resource_types / get_r6_resource_types referenced
R4_RESOURCE_TYPES in branches where R4 was not enabled.
smunini
smunini previously approved these changes Apr 20, 2026
The hand-maintained R*_RESOURCE_TYPES constants in fhir_types.rs were
gated with `not(feature = "R4")` etc., so any multi-version build that
also enabled R4 dropped the later-version constants entirely. The
extractor's `is_valid_resource_type` then rejected types like
SubscriptionTopic with HTTP 400 whenever the server ran as R4B/R5/R6
on a build that also included R4.

Replace the constants with OnceLock-cached lookups that delegate to
each version's generated Resource enum via FhirResourceTypeProvider.
`is_valid_resource_type` now checks the union across all enabled
versions; `get_resource_type_names_for_version` returns the correct
per-version list. Adds regression tests covering single- and
multi-version feature combinations.
@smunini smunini merged commit 683adf8 into main Apr 21, 2026
152 checks passed
@aacruzgon aacruzgon deleted the feature/subcriptions branch April 24, 2026 17:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants