# Demonstrating FHIR Profiling to define FAIR datasets

## Approach

### Why FHIR Profiling for FAIR dataset?

FHIR Profiling is a mechanism through which a rulebook/codebook can be defined for healthcare data. Although originally intended to be used for localization of base FHIR versions, for example creating national profiles or profiles for a specific use case, we demonstrate here that the mechanism can be used define FAIR datasets and create data pipelines which validate incoming data in bulk.

### Why use Python

In this demonstrator notebook, we choose to implement the FHIR Profiling mechanism in Python, as it is intended as a way to validate FHIR data in bulk (the whole dataset). We choose to implement it specifically since Python is widely used in data engineering, in line with the approach of [FHIR Analytics in the Open Health Stack](https://developers.google.com/open-health-stack/fhir-analytics).

### Logic

[Pydantic v2](https://pydantic.dev/opensource), the most widely used data validation library, does all of the heavy lifting in this demonstrator. Do get pydantic to work with FHIR, we first generate FHIR R4B pydantic models using [fhir-py-types](https://github.com/beda-software/fhir-py-types). The output of this conversion is [`resources-r4b.py`](./resources_r4b.py) and has been included in this repository for convenience.

We subsequently use pydantic to subclass the R4B resources, which effectively mimics the profiling mechanism of FHIR. We demonstrate that through this mechanism we can combine three FHIR profiles (R4B, IPS and WHO ANC) and also explicitly integrate [SNOMED IPS Terminology](https://www.snomed.org/international-patient-summary-terminology) into a single, consistent datamodel (you could even say ontology since it based on the full base FHIR R4B specification) which can be used to validate incoming, bulk data.

In the following, we demonstrate how to:

- Put more constrained cardinalities on FHIR Resources by subclassing Pydantic models
- Define which coding systems should be used by implementing specifc field validators in pydantic. We choose to use [SNOMED IPS Terminology](https://www.snomed.org/international-patient-summary-terminology) as the default terminology, with mappings to other terminologies, most notably ICD-10, ICD-11 and LOINC.
- Visualize the ontology of the Momcare dataset


## Demonstrator

### Setup R4B with pydantic


In [None]:
from itertools import repeat
from typing import Any

from pprint import pprint
from pydantic import field_validator, ValidationError
from resources import (
    BaseModel,
    CodeableConcept,
    Coding,
    Condition,
    dateType,
    Encounter,
    HumanName,
    Identifier,
    List_,
    Literal_,
    Observation,
    Organization,
    Patient,
    Procedure,
    Reference,
    Questionnaire,
    QuestionnaireResponse,
)


def show_object(obj: BaseModel):
    """Prints obj attributes that are not None"""
    for k, v in obj.__dict__.items():
        if v:
            pprint(f"{k}: {getattr(obj, k)}")

Note that almost all fields in R4B resources are optional.


In [1]:
Patient().__dict__

NameError: name 'Patient' is not defined

### FHIR IPS: adding cardinality constraints

We need to add cardinality constraints to `Patient` and `Organization`. Also, we subclass `Reference` into `ReferencePatient` which is used in the IPS profile to refer back to the `Patient` resource.


In [None]:
class PatientIPS(Patient):
    "PatientIPS has constrained 'name' and 'birthdate' as mandatory."

    name: List_[HumanName] = None
    birthdate: dateType = None


# OrganizationIPS has constrained 'name' as mandatory
class OrganizationIPS(Organization):
    "OrganizationIPS has constrained 'name'as mandatory."

    name: str


class ReferencePatient(Reference):
    "Reference(PatientIPS) has fixed value for type and reference string is mandatory."

    type: str = "http://hl7.org/fhir/uv/ips/StructureDefinition/Patient-uv-ips"
    reference: str


# example PatientIPS. Note given name is a list, and there is only one family name
show_object(
    PatientIPS(name=[HumanName(given=["Jane"], family="Doe")], birthdate="01-01-1980")
)

'resourceType: Patient'
('name: [HumanName(id=None, id__ext=None, use=None, use__ext=None, text=None, '
 "text__ext=None, given=['Jane'], given__ext=None, family='Doe', "
 'family__ext=None, prefix=None, prefix__ext=None, suffix=None, '
 'suffix__ext=None, period=None, extension=None)]')
'birthdate: 01-01-1980'


### FHIR IPS: adding cardinality constraints and constraining terminologies

Besides cardinality constraints, , `ObservationIPS`, `ConditionIPS`, and `ProcedureIPS` also add contraints as to which valuesets/terminologies are to be used. We implement this using the `field_validator` function of Pydantic.

More specifically, we use _after validators_, which means that first Pydantic's internal validation checks whether the incoming data conforms to the types specified by the FHIR profile, and then we do extra validation that checks value is included in the mandated valueset.

First, consider the R4B `Observation` resource which we instantiate as follows:


In [None]:
codeEDD = CodeableConcept(
    coding=[
        Coding(
            code="11778-8",
            system="https://loinc.org",
            display="Delivery date Estimated",
        )
    ]
)

# R4B Observation, 'status' and 'code are mandatory
show_object(
    Observation(
        status="registered",
        code=codeEDD,
    )
)

'resourceType: Observation'
'status: registered'
('code: id=None id__ext=None text=None text__ext=None coding=[Coding(id=None, '
 "id__ext=None, system='https://loinc.org', system__ext=None, version=None, "
 "version__ext=None, display='Delivery date Estimated', display__ext=None, "
 "extension=None, userSelected=None, userSelected__ext=None, code='11778-8', "
 'code__ext=None)] extension=None')


Next, we implement the constraints of the IPS including validation of the codeable concepts:


In [None]:
class ObservationPregnancyEddIPS(Observation):
    code: CodeableConcept
    subject: ReferencePatient

    @field_validator("code", mode="after")
    @classmethod
    def ensure_edd(cls, value: CodeableConcept) -> CodeableConcept:
        allowed_codings = list(
            zip(repeat("https://loinc.org", 3), ("11778-4", "11778-6", "11778-8"))
        )
        codings = [(code.system, code.code) for code in value.coding]
        if not all(coding in allowed_codings for coding in codings):
            raise ValueError(
                f"Provided coding(s) {codings} should be one of {allowed_codings}"
            )
        else:
            return value


show_object(
    ObservationPregnancyEddIPS(
        status="registered",
        subject=ReferencePatient(
            reference="https://some.fhir/server/jane-doe",
        ),
        code=codeEDD,
    )
)

'resourceType: Observation'
'status: registered'
('subject: id=None id__ext=None '
 "type='http://hl7.org/fhir/uv/ips/StructureDefinition/Patient-uv-ips' "
 'type__ext=None display=None display__ext=None extension=None '
 "reference='https://some.fhir/server/jane-doe' reference__ext=None "
 'identifier=None')
('code: id=None id__ext=None text=None text__ext=None coding=[Coding(id=None, '
 "id__ext=None, system='https://loinc.org', system__ext=None, version=None, "
 "version__ext=None, display='Delivery date Estimated', display__ext=None, "
 "extension=None, userSelected=None, userSelected__ext=None, code='11778-8', "
 'code__ext=None)] extension=None')


It is interesting to note that the FHIR IPS Profile only allows use of LOINC codes. There is a one-to-one mapping with SNOMED-CT for the following two codes:

| LOINC                                                | SNOMED CT                                              |
| ---------------------------------------------------- | ------------------------------------------------------ |
| 11778-8: Delivery date Estimated                     | 161714006: Estimated date of delivery                  |
| 11778-6: Delivery date Estimated from ovulation date | 289206005: Estimated date of delivery from last period |

LOINC has an additional code 11778-6: Delivery date Estimated from last menstrual period, with no equivalent in SNOMED CT.

SNOMED has an addtional code 738070007: Estimated date of delivery from antenatal ultrasound scan, with no equivalent in LOINC.

Using the strict implementation of the IPS, this is instance of `ObservationPregnancyEddIPS` is not allowed:


In [None]:
try:
    show_object(
        ObservationPregnancyEddIPS(
            status="registered",
            subject=ReferencePatient(
                reference="https://some.fhir/server/dkapitan",
            ),
            code=CodeableConcept(
                coding=[
                    Coding(
                        code="161714006",
                        system="https://snomed.info/sct",
                        display="Estimated date of delivery",
                    )
                ],
            ),
        )
    )
except ValueError as e:
    print(e)

1 validation error for ObservationPregnancyEddIPS
code
  Value error, Provided coding(s) [('https://snomed.info/sct', '161714006')] should be one of [('https://loinc.org', '11778-4'), ('https://loinc.org', '11778-6'), ('https://loinc.org', '11778-8')] [type=value_error, input_value=CodeableConcept(id=None, ...=None)], extension=None), input_type=CodeableConcept]
    For further information visit https://errors.pydantic.dev/2.10/v/value_error


In [None]:
# TO DO: add field_validators for SNOMED IPS
class ConditionIPS(Condition):
    code: CodeableConcept
    subject: ReferencePatient


class ProcedureIPS(Procedure):
    code: CodeableConcept
    subject: ReferencePatient

### Resources not included in IPS

- Encounter
- Questionnaire
- QuestionnaireResponse
