Skip to content

Bulk data access server#2482

Merged
johngrimes merged 453 commits intorelease/9.2.0from
pathling-8-server
Jan 1, 2026
Merged

Bulk data access server#2482
johngrimes merged 453 commits intorelease/9.2.0from
pathling-8-server

Conversation

@fhnaumann
Copy link
Collaborator

@fhnaumann fhnaumann commented Sep 10, 2025

Renamed the branch to reflect the planned changes more accurately.

Relates to #2467, #1987 and #1986.
In the future, this PR may also include #2476 and #1988

@github-project-automation github-project-automation bot moved this to Backlog in Pathling Sep 15, 2025
@johngrimes johngrimes moved this from Backlog to In progress in Pathling Sep 15, 2025
@johngrimes johngrimes added fhirpath Related to fhirpath reference implementation new feature New feature or request labels Sep 15, 2025
@fhnaumann
Copy link
Collaborator Author

fhnaumann commented Sep 16, 2025

Demonstration:
An AidBox server is deployed to Kubernetes. A small script pulls data from it (using bulk export?) and converts the ndjson files to parquet files (delta tables). The developed pathling-server uses that data as input and it can be requested to perform bulk export on it. Requests are made through a web client. Every component is dockerized, packaged using helm charts and then deployed to Kubernetes.

In the future, additional technologies such as databricks may be used. Also authentication and authorization may be performed through the web client to the pathling-server.

use the auth in the createTag method, but unsure what the effects are. Is there something "in" the auth object that stays the same across requests so caching still works? Anyhow, some auth information should maybe be part of the tag (if parts stay the same across requests)

@johngrimes johngrimes self-requested a review September 18, 2025 01:39
@johngrimes
Copy link
Member

@fhnaumann Could you please merge main into this branch?

@johngrimes
Copy link
Member

Why are there some deleted files from the test data directory in the library API? There are other tests that rely upon this data.

Copy link
Member

@johngrimes johngrimes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't actually compile for me yet, and not passing the tests on CI.

I've added some preliminary comments anyway, I can take another look once we have a green build.

As a general comment please also take a look at the CONTRIBUTING.md file and make sure that everything is ticked off there.

@Component
@Profile("server")
@Slf4j
public class ConformanceProvider implements IServerConformanceProvider<CapabilityStatement>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will need to be updated to accurately reflect the capabilities of the server now.

@fhnaumann
Copy link
Collaborator Author

fhnaumann commented Oct 15, 2025

Re-implementing the $import operation requires some changes to the API:

  • DataSourceBuilder:
    • Add the ability to load a list of filepaths instead of just an entire directory. Proposed change: Overload the current methods which take in a directory path with new methods which take in Collection<String> filepaths

Change the $import parameters:

  • rename source to input
  • move mode and format to top-level

Align the $import to the bulk-import manifest:

  • only support mandatory fields (except for inputSource)

@johngrimes johngrimes moved this from In progress to Planned in Pathling Oct 26, 2025
@johngrimes johngrimes moved this from Planned to In progress in Pathling Nov 3, 2025
@johngrimes johngrimes assigned johngrimes and unassigned fhnaumann Nov 3, 2025
@johngrimes johngrimes changed the title Pathling 8 server Bulk data access server Nov 3, 2025
The FHIR Bulk Data Export manifest now correctly sets requiresAccessToken
to true when server authorisation is enabled. This ensures that bulk data
clients include the access token when downloading exported files.

Also fixes Dependencies.java to use PathlingContext.Builder pattern
instead of the non-existent create(SparkSession, EncodingConfiguration,
TerminologyConfiguration) method.
…ent scan

Prevents the deltaLake() bean from being created during test data import,
which was failing because it tried to read Delta tables that hadn't been
generated yet.
Implement patient-level and group-level bulk export per FHIR Bulk Data
Access specification. Key changes:

- Add PatientExportProvider for /Patient/$export and /Patient/[id]/$export
- Add GroupExportProvider for /Group/[id]/$export
- Add PatientCompartmentService to filter resources by patient compartment
- Add ExportOperationHelper to deduplicate export execution logic
- Extend ExportRequest with exportLevel and patientIds fields
- Update ExportOperationValidator with patient-level validation
- Register new providers in FhirServer and ConformanceProvider
- Fix code style issues and add missing Javadoc across modified files
The bulk export manifest was generating incorrect result URLs for
patient-level exports (e.g. /Patient/$result instead of /$result).
This was caused by parsing the request URL to derive the server base,
which included the resource type path segment.

Changes:
- Add serverBaseUrl field to ExportRequest record
- Pass requestDetails.getFhirServerBase() from validator to request
- Update ExportResponse to use serverBaseUrl directly
- Remove backwards-compatible constructors from ExportRequest
- Update test utilities to use canonical constructor
Add Javadoc documentation, nullability annotations, and rename ND_JSON
constant to NDJSON to follow Java naming conventions.
Add Javadoc documentation, nullability annotations, and final modifiers.
Fix redundant registry lookup and improve code formatting.
Clarify that this provider handles system-level bulk exports, consistent
with PatientExportProvider and GroupExportProvider naming.
The server module was using unshaded Gson while ConstantDeclarationTypeAdapterFactory
(bundled in library-runtime) uses shaded Gson, causing a type mismatch when
registering the adapter. This prevented valueCode, valueString, and other
constant value types from being parsed in ViewDefinitions.

Removed the direct Gson dependency and switched all server code to use the
shaded Gson from library-runtime.
Errors intentionally raised via raise_error() in Spark SQL (such as
ViewDefinition columns expecting singular values but receiving multiple)
are now returned as HTTP 400 Bad Request instead of 500 Internal Server
Error.
…ource

Previously, DynamicDeltaSource.read() threw IllegalArgumentException when
no Delta table existed for a resource type. This caused ViewDefinition
queries to fail with an error instead of returning 0 rows.

Now returns an empty dataset with the correct schema, allowing view
queries on resource types without data to succeed with empty results.
Navigation links were being cut off on small screens. Added a hamburger
menu that appears on mobile viewports, providing access to all nav items
via a dropdown. Desktop view retains the horizontal navigation layout.
Replace manual Flex-based key-value layout with the Radix UI DataList
component, which provides semantic HTML (dl/dt/dd) and consistent styling
for metadata display.
HAPI FHIR's automatic parameter extraction does not correctly handle
resources nested within part arrays. This caused the $viewdefinition-export
operation to reject valid requests with the error "At least one
view.viewResource parameter is required."

Added extractViewInputsFromRequest() to manually parse the Parameters
resource and extract view inputs from nested parts. Also added integration
tests to verify the fix.
Replace shaded Gson imports with unshaded com.google.gson imports.
Maven shading still works at package time, but IntelliJ can now compile
directly from source without type mismatch errors.
The view export showed "Completed" status but no output files appeared
for download. The root cause was a type mismatch: the backend returns
a FHIR Parameters resource, but the UI expected a flat JSON structure
with an output array.

Added getViewExportOutputFiles() parser function to extract output
entries from the Parameters resource, following the existing pattern
used for bulk export.
Add ViewDefinitionGson factory class to provide pre-configured Gson
instances for ViewDefinition parsing. Remove Gson from library-runtime
shading since Spark 4.0 already provides it.

Also fix ErrorHandlingInterceptorTest to use getCondition() instead of
getErrorClass() for Spark 4.0 API compatibility.
Implements the write() method in ConstantDeclarationTypeAdapter to
enable serialisation of view definitions containing constants. This is
needed for generating deterministic cache keys for view export
operations.

Also adds disableHtmlEscaping() to ViewDefinitionGson to prevent
unnecessary escaping of characters like '=' in Base64 values.
Hashed assets (JS/CSS in /admin/assets/) now have a 1-year cache
duration since content hashes in filenames make them immutable.
HTML and other files use no-cache to ensure users always get the
latest version with current asset references.
Radix UI Themes buttons don't show pointer cursor by default.
Added global CSS rule targeting .rt-Button and .rt-IconButton.
Changed responsive padding breakpoint from md (1024px) to lg (1280px)
so the layout maintains margins until the container is naturally centred.
Adds a copy button to the top-right corner of the expanded JSON code
box in ResourceCard. Includes a useClipboard hook for clipboard
operations, unit tests, and E2E test coverage.
Add a copy-to-clipboard button to the read-only textarea that displays
the selected ViewDefinition JSON. Also reduce the font size for better
readability. Include e2e test for clipboard functionality.
The tests expected different export level option labels than what the
ExportForm component actually renders. Also removed a test for "Select
all" functionality that doesn't exist in the component.
The fhir-auth library is published to Maven Central's new snapshot
repository at central.sonatype.com, which is not configured by default.
@johngrimes johngrimes changed the base branch from main to release/9.2.0 January 1, 2026 08:16
@johngrimes johngrimes merged commit 5c659ce into release/9.2.0 Jan 1, 2026
0 of 2 checks passed
@johngrimes johngrimes deleted the pathling-8-server branch January 1, 2026 08:45
@github-project-automation github-project-automation bot moved this from In progress to Done in Pathling Jan 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fhirpath Related to fhirpath reference implementation new feature New feature or request

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants