# Demonstrating FHIR Profiling to define FAIR datasets

## Approach

### Why FHIR Profiling for FAIR dataset?

FHIR Profiling is a mechanism through which a rulebook/codebook can be defined for healthcare data. Although originally intended to be used for localization of a FHIR standards, for example creating national profiles or profiles for a specific use case, we demonstrate here that the mechanism can be used define FAIR datasets.

### Why use Python

In this demonstrator notebook, we choose to implement the FHIR Profiling mechanism in Python, as it is intended as a way to validate FHIR data in bulk (the whole dataset). We choose to implement it specifically since Python is widely used in data engineering (see Open Health Stack FHIR Pipelines).

### Logic

[pydantic v2](https://pydantic.dev/opensource), the most widely used data validation library, does all of the heavy lifting in this demonstrator. Do get pydantic to work with FHIR, we first generate FHIR R4B pydantic models using [fhir-py-types](https://github.com/beda-software/fhir-py-types). The output of this conversion is `resources.py` and has been included in the repository for convenience.

We subsequently pydantic to subclass the R4B resources, which effectively mimics the profiling mechanism of FHIR. We demonstrate that through this mechanism we can combine three FHIR profiles (R4B, IPS and WHO ANC) and also explicitly integrate [SNOMED IPS Terminology](https://www.snomed.org/international-patient-summary-terminology) into a single, consistent datamodel (you could even say ontology since it based in Python) which can be used to validate incoming, bulk data.

In the following, we demonstrate how to:

- Put more constrained cardinalities on FHIR Resources by subclassing pydantic models
- Define which coding systems should be used by implementing specifc
- Show how [SNOMED IPS Terminology](https://www.snomed.org/international-patient-summary-terminology) is used to validate `ProcedureIPS`


## Demonstrator

### Setup R4B with pydantic


In [1]:
# require Python 3.12: using `type` soft keyword

from enum import Enum
from pprint import pprint
from resources import (
    BaseModel,
    CodeableConcept,
    Coding,
    Condition,
    dateType,
    Encounter,
    HumanName,
    Identifier,
    List_,
    Literal_,
    Observation,
    Organization,
    Patient,
    Procedure,
    Reference,
    Questionnaire,
    QuestionnaireResponse,
)


def show_object(obj: BaseModel):
    """Prints obj attributes that are not not"""
    for k, v in obj.__dict__.items():
        if v:
            pprint(f"{k}: {getattr(obj, k)}")

Note that almost all fields in R4B resources are optional.


In [2]:
Patient().__dict__

{'resourceType': 'Patient',
 'id': None,
 'id__ext': None,
 'meta': None,
 'text': None,
 'name': None,
 'link': None,
 'photo': None,
 'active': None,
 'active__ext': None,
 'gender': None,
 'gender__ext': None,
 'telecom': None,
 'address': None,
 'contact': None,
 'language': None,
 'language__ext': None,
 'contained': None,
 'extension': None,
 'birthDate': None,
 'birthDate__ext': None,
 'identifier': None,
 'deceasedBoolean': None,
 'deceasedBoolean__ext': None,
 'deceasedDateTime': None,
 'deceasedDateTime__ext': None,
 'implicitRules': None,
 'implicitRules__ext': None,
 'maritalStatus': None,
 'communication': None,
 'multipleBirthBoolean': None,
 'multipleBirthBoolean__ext': None,
 'multipleBirthInteger': None,
 'multipleBirthInteger__ext': None,
 'modifierExtension': None,
 'generalPractitioner': None,
 'managingOrganization': None}

### FHIR IPS: adding cardinality constrains

For `Patient`, `Organization`, `Condition`, `Encounter` we only need to add cardinality constraints. Also, we subclass `Reference` into `ReferencePatient` which is used in the IPS profile to refer back to the `Patient` resource.


In [3]:
# PatientIPS has constrained 'name' and 'birthdate' as mandatory
class PatientIPS(Patient):
    name: List_[HumanName] = None
    birthdate: dateType = None


#
class ReferencePatient(Reference):
    type: str = "http://hl7.org/fhir/uv/ips/StructureDefinition/Patient-uv-ips"
    reference: str


# example PatientIPS. Note given name is a list, and there is only one family name
show_object(
    PatientIPS(
        name=[HumanName(given=["Daniel"], family="Kapitan")], birthdate="09-09-1973"
    )
)

'resourceType: Patient'
('name: [HumanName(id=None, id__ext=None, use=None, use__ext=None, text=None, '
 "text__ext=None, given=['Daniel'], given__ext=None, family='Kapitan', "
 'family__ext=None, prefix=None, prefix__ext=None, suffix=None, '
 'suffix__ext=None, period=None, extension=None)]')
'birthdate: 09-09-1973'


### FHIR IPS: constraining terminologies


In [4]:
# TO DO: add field_validators for SNOMED IPS
class ConditionIPS(Condition):
    code: CodeableConcept
    subject: ReferencePatient


class ProcedureIPS(Procedure):
    code: CodeableConcept
    subject: ReferencePatient

In [5]:
codeEDD = CodeableConcept(
    coding=[
        Coding(
            code="11778-8",
            system="https://loinc.org",
            display="Delivery date Estimated",
        )
    ]
)

In [6]:
Observation(
    status="registered",
    code=codeEDD,
    subject=Reference(
        reference="https://some.fhir/server/dkapitan",
    ),
)

Observation(resourceType='Observation', id=None, id__ext=None, meta=None, text=None, note=None, focus=None, partOf=None, status='registered', status__ext=None, issued=None, issued__ext=None, method=None, device=None, basedOn=None, subject=Reference(id=None, id__ext=None, type=None, type__ext=None, display=None, display__ext=None, extension=None, reference='https://some.fhir/server/dkapitan', reference__ext=None, identifier=None), language=None, language__ext=None, category=None, valueQuantity=None, valueCodeableConcept=None, valueString=None, valueString__ext=None, valueBoolean=None, valueBoolean__ext=None, valueInteger=None, valueInteger__ext=None, valueRange=None, valueRatio=None, valueSampledData=None, valueTime=None, valueTime__ext=None, valueDateTime=None, valueDateTime__ext=None, valuePeriod=None, bodySite=None, specimen=None, contained=None, extension=None, encounter=None, performer=None, hasMember=None, component=None, identifier=None, derivedFrom=None, effectiveDateTime=None, 

In [7]:
# this is a bit of a roundabout way to put constraints on the allowed coding
# but it serves to demonstrate the mechanism
# TO DO: implement with field_validators
# https://docs.pydantic.dev/latest/concepts/validators/#field-validators
class CodingEdd(Enum):
    _11778_8 = Coding(
        code="11778-8", system="https://loinc.org", display="Delivery date Estimated"
    )
    _11778_6 = Coding(
        code="11778-6",
        system="https://loinc.org",
        display="Delivery date Estimated from last menstrual period",
    )
    _11778_4 = Coding(
        code="11778-4",
        system="https://loinc.org",
        display="Delivery date Estimated from ovulation date",
    )


class CodeableConceptEDD(CodeableConcept):
    coding: List_[Literal_[CodingEdd._11778_8, CodingEdd._11778_6, CodingEdd._11778_4]]

In [8]:
# Observation
# fixed code list how EDD is estimated using LOINC codes
# https://build.fhir.org/ig/HL7/fhir-ips/ValueSet-edd-method-uv-ips.html
# note that ObservationPregnancyEddIPS removes bodysite, method, specimen, device, referenceRange and component
# we haven't implemented that here
# Main point that we demonstrate is value binding
class ObservationPregnancyEddIPS(Observation):
    code: CodeableConceptEDD
    subject: ReferencePatient


show_object(
    ObservationPregnancyEddIPS(
        status="registered",
        code=CodeableConceptEDD(coding=[CodingEdd._11778_8]),
        subject=ReferencePatient(
            reference="https://some.fhir/server/dkapitan",
        ),
    )
)

'resourceType: Observation'
'status: registered'
('subject: id=None id__ext=None '
 "type='http://hl7.org/fhir/uv/ips/StructureDefinition/Patient-uv-ips' "
 'type__ext=None display=None display__ext=None extension=None '
 "reference='https://some.fhir/server/dkapitan' reference__ext=None "
 'identifier=None')
('code: id=None id__ext=None text=None text__ext=None '
 'coding=[<CodingEdd._11778_8: Coding(id=None, id__ext=None, '
 "system='https://loinc.org', system__ext=None, version=None, "
 "version__ext=None, display='Delivery date Estimated', display__ext=None, "
 "extension=None, userSelected=None, userSelected__ext=None, code='11778-8', "
 'code__ext=None)>] extension=None')


### Resources not included in IPS

- Encounter
- Questionnaire
- QuestionnaireResponse


### WHO ANC ValueSet

- Note we are using value sets with mapping to SNOMED IPS
- Also constraints are not relevant (too detailed for Momcare)
- We do use Measures (downstream)


In [9]:
import fsspec
import polars as pl


systems = ["ICD-10", "ICD-11", "ICF", "ICHI", "LOINC", "SNOMED-CT"]


def parse_conceptmap(system: str) -> pl.DataFrame:
    "Generate flattened mapping tables from WHO ANC conceptmap."

    if system not in systems:
        return None

    with fsspec.open(
        f"https://build.fhir.org/ig/dhes/smart-anc/ConceptMap-{system}.json"
    ) as f:
        df = pl.read_json(f)

    unnest_group = pl.col("group").list.explode().struct.unnest()
    unnest_element = pl.col("element").list.explode().struct.unnest().list.explode()

    return (
        df.select(unnest_group)
        .select(unnest_element)
        .select(
            pl.col(pl.String).name.prefix("who_anc_"),
            pl.lit(system.replace("-", "")).alias("target"),
            pl.col("target").struct.unnest(),
        )
    )

In [10]:
df = pl.concat([parse_conceptmap(system) for system in systems], how="diagonal")

In [11]:
# 735 unique WHO ANC codes
df.select(pl.col("who_anc_code").n_unique())

who_anc_code
u32
735


In [12]:
# Coverage varies widely, SNOMED most complete
# Multiple WHO ANC code can map to the same target code
df.group_by("target").agg(pl.n_unique("who_anc_code", "code"))

target,who_anc_code,code
str,u32,u32
"""ICD11""",550,225
"""ICHI""",163,42
"""SNOMEDCT""",725,430
"""ICF""",100,32
"""LOINC""",385,145
"""ICD10""",532,188


In [13]:
# Oh my goodness, dear WHO, what have you done?
many_to_one = (
    df.group_by("target", "code")
    .agg(pl.count("who_anc_code").alias("count_"))
    .filter(pl.col("count_") > 1)
)
many_to_one.join(df, on=["target", "code"]).sort(["target", "code"])

target,code,count_,who_anc_code,who_anc_display,display,equivalence
str,str,u32,str,str,str,str
"""ICD10""","""A53.9""",2,"""ANC.B9.DE111""","""Syphilis positive""","""Syphilis, unspecified""","""equivalent"""
"""ICD10""","""A53.9""",2,"""ANC.B9.DE108""","""Syphilis positive""","""Syphilis, unspecified""","""equivalent"""
"""ICD10""","""B18.1""",2,"""ANC.B9.DE72""","""Hepatitis B positive""","""Chronic viral hepatitis B with…","""equivalent"""
"""ICD10""","""B18.1""",2,"""ANC.B9.DE75""","""Hepatitis B positive""","""Chronic viral hepatitis B with…","""equivalent"""
"""ICD10""","""B18.2""",2,"""ANC.B9.DE93""","""Hepatitis C positive""","""Chronic viral hepatitis C""","""equivalent"""
…,…,…,…,…,…,…
"""SNOMEDCT""","""84229001""",3,"""ANC.B7.DE53""","""Gets tired easily""","""Fatigue (finding)""","""equivalent"""
"""SNOMEDCT""","""8517006""",2,"""ANC.B7.DE12""","""Recently quit tobacco products""","""Ex-smoker (finding)""","""equivalent"""
"""SNOMEDCT""","""8517006""",2,"""ANC.B6.DE154""","""Recently quit tobacco products""","""Ex-smoker (finding)""","""equivalent"""
"""SNOMEDCT""","""91175000""",2,"""ANC.B6.DE41""","""Convulsions""","""Seizure (finding)""","""equivalent"""
