# Accessing Patient Data for CDS Systems

In this Jupyter notebook, we'll explore how to access and retrieve various types of patient data and make the data usable by a clinical decision support (CDS) tool. We'll mock access data from two services:
- An electronic health record (EHR) that hosts a FHIR server
- An imaging service that uses its own standard and format to share image data

From these services, we will retrieve:
- **Basic patient data**, such as name and date of birth.
- **Patient conditions**, which provide context for clinical decision-making.
- **Patient medication information** to ensure safe treatment plans.
- **Relevant patient observations**, specifically hemoglobin A1c levels.
- **Eye imagery data**.

We'll convert the data into [pandas](https://pandas.pydata.org/) [DataFrames](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html), which we'll send to the mock CDS tool.

## Basic setup

Here is the basic setup code for our Python work. We import [pandas](https://pandas.pydata.org/), add the locations of our servers, and add the ID of the synthetic patient whose data we'll access.

In [None]:
import pandas as pd

FHIR_SERVER = "http://hapi.fhir.org/baseR4"
PATIENT_ID = "685a8c8c-40e8-d40c-477f-2317c5ab7a15"

## Accessing the patient data

In this demo, we'll access the following information from a FHIR server:
- Basic patient data
- Patient conditions
- Patient medication information
- Relevant patient observations
- Metadata for the eye imagery



### Basic Patient Data

We'll demonstrate two ways to retrieve data from the FHIR server:

1. Using the [SMART FHIR Client](https://github.com/smart-on-fhir/client-py) to retrieve the FHIR resources and [fhirpath.py](https://github.com/beda-software/fhirpath-py) to extract information from the resources.
2. Using [FHIR-PYrate](https://github.com/UMEssen/FHIR-PYrate) to both retrieve the FHIR resources and extract information into pandas DataFrames.

For the rest of the data, we will use [FHIR-PYrate](https://github.com/UMEssen/FHIR-PYrate).

#### 1. Using the [SMART FHIR Client](https://github.com/smart-on-fhir/client-py) and [fhirpath.py](https://github.com/beda-software/fhirpath-py)

First, we set up the FHIR Client to query the FHIR Server. **Note**: Because this is a demo FHIR server, we don't need to handle authentication. See the [FHIR Client documentation](https://github.com/smart-on-fhir/client-py) for how to connect to a protected server.

In [None]:
from fhirclient import client

settings = {
    'app_id': 'my_web_app',
    'api_base': FHIR_SERVER
 }
smart = client.FHIRClient(settings=settings)


Next, we query the server for the patient data.

In [None]:
from fhirclient.models.patient import Patient

search = Patient.where({"identifier":PATIENT_ID})
patients = search.perform_resources(smart.server) # returns list of length=1
patient_obj = patients[0].as_json()

patient_obj

Then, we extract the desired information using [fhirpath.py](https://github.com/beda-software/fhirpath-py). fhirpath.py is a Python library for using [FHIRPath](https://build.fhir.org/fhirpath.html). [FHIRPath](https://build.fhir.org/fhirpath.html) allows you to navigate and extract data in a FHIR resource.

In [None]:
import fhirpathpy

patient_info = {
    'given_name':fhirpathpy.evaluate(patient_obj, "Patient.name.where(use='official').given"),
    'family_name':fhirpathpy.evaluate(patient_obj, "Patient.name.where(use='official').family"),
    'birth_date': fhirpathpy.evaluate(patient_obj, "Patient.birthDate")[0],
    'ehr_id': fhirpathpy.evaluate(patient_obj, "Patient.identifier.where(type.coding.system = 'http://terminology.hl7.org/CodeSystem/v2-0203' and type.coding.code = 'MR').value")[0]
}

patient_info


You can also access the data like you would any JSON-based object in Python.

In [None]:
patient_info['fhir_id'] = patient_obj['id']

patient_info

To add the extracted patient information to a pandas DataFrame, use the `from_dict` method. The `from_dict` method expects a list of dictionaries.

In [None]:
df_patient = pd.DataFrame.from_dict([patient_info])
df_patient

It can sometimes be useful to see what URL is being requested:

In [None]:
f'{FHIR_SERVER}/{search.construct()}'

You can open ths URL up in your browser, or use a tool like `cURL` to access it directly.

There is a lot more data in this Patient resource instance than we are viewing in this notebook. It may be helpful to review the FHIR documentation: <https://www.hl7.org/fhir/R4/patient.html>.

(Note we are using FHIR R4 for this because that's what is supported by the synthetic data. FHIR R5 has been released, and is the likely default for a web search for "FHIR Patient resource documentation".)

#### 2. Using [FHIR-PYrate](https://github.com/UMEssen/FHIR-PYrate)

[FHIR-PYrate](https://github.com/UMEssen/FHIR-PYrate) lets us import FHIR resources into [pandas](https://pandas.pydata.org/) [DataFrames](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html). Because DataFrames are a common format for data analysis in Python, the CDS tool will likely use it.

First, we setup the FHIR-PYrate client to query the FHIR server.

Because we're using a demo FHIR server, we don't need to authenticate, so we have `auth=None`. See the [FHIR-PYrate documentation](https://github.com/UMEssen/FHIR-PYrate) for how to authenticate to a FHIR server.

In [None]:
from fhir_pyrate import Pirate

search = Pirate(
    auth=None, # the demo fhir server does not require authentication
    base_url=FHIR_SERVER,
)

`Pirate.steal_bundles` returns a FHIR bundle generator, which you can convert into a dataframe.

We need to specify the [FHIR resource](https://build.fhir.org/resourcelist.html) (`resource_type`) type and the [FHIR search parameters](https://www.hl7.org/fhir/search.html) (`request_params`).

In [None]:
patient_bundle = search.steal_bundles(
    resource_type="Patient",
    request_params={"identifier":PATIENT_ID}
)
print(patient_bundle)

Now we can convert the bundle generator into a DataFrame using `Pirate.bundles_to_dataframe`.

We specify the bundle generator we retrieved (`bundles`) and a pairing between the column name and the [FHIRPath](https://build.fhir.org/fhirpath.html) for the information we want to include in that column (`fhir_paths`).

In [None]:
df_patient = search.bundles_to_dataframe(
    bundles=patient_bundle,
    fhir_paths=[
        ("given_name", "Patient.name.where(use = 'official').given"),
        ("family_name", "Patient.name.where(use = 'official').family"),
        ("birth_date", "Patient.birthDate"),
        ("fhir_id", "Patient.id"),
        ("ehr_id", "Patient.identifier.where(type.coding.system = 'http://terminology.hl7.org/CodeSystem/v2-0203' and type.coding.code = 'MR').value")
    ]
)

df_patient

##### A quicker way
FHIR-PYrate provides a shortcut for querying FHIR resources and adding them to DataFrames:
`Pirate.steal_bundles_to_dataframe`. This method combines `Pirate.steal_bundles` and `Pirate.bundles_to_dataframe`.

In [None]:
df_patient = search.steal_bundles_to_dataframe(
    resource_type="Patient",
    request_params={"identifier":PATIENT_ID},
    fhir_paths=[
        ("given_name", "Patient.name.where(use = 'official').given"),
        ("family_name", "Patient.name.where(use = 'official').family"),
        ("birth_date", "Patient.birthDate"),
        ("fhir_id", "Patient.id"),
        ("ehr_id", "Patient.identifier.where(type.coding.system = 'http://terminology.hl7.org/CodeSystem/v2-0203' and type.coding.code = 'MR').value")
    ]
)
df_patient

We'll use `Pirate.steal_bundles_to_dataframe` for the rest of the FHIR resource queries.

##### Elements with multiple sub-values

In many instances, you'll want to extract multiple instances of the same kind of data from a FHIR resource.

For example, you may want to extract each identifier of a patient. The demo patient we are using has several identifiers.

In [None]:
df = search.steal_bundles_to_dataframe(
    resource_type='Patient',
    request_params={"identifier":PATIENT_ID},
    fhir_paths=[
        ("id", "identifier[0].value"),
        ("identifiers", "identifier.value"),
    ])

df

To convert to separate columns, do the following:

In [None]:
df.join(
    pd.DataFrame(
        df.pop('identifiers').values.tolist()
    ).add_prefix('identifier_'))

### Conditions and Medications

We'll repeat the process of using `steal_bundles_to_dataframe` for the patient's conditions and medications.

We've added two parameters to our query:
- The [`_sort` FHIR search parameter](https://www.hl7.org/fhir/search.html#sort) sorts the results. Use `-` for descending order. The `_sort` value must be one of the search parameters specified in the FHIR server's [Capability Statement](https://build.fhir.org/capabilitystatement.html).
- `num_pages` limits the number of pages of resources to return.
  - *To limit response sizes, servers may split a response into "pages". Many FHIR Servers support paging, and limit each "page" to 20 resources. You can specify the number of resources to include in a page with the `_count` parameter.*

We also use the internal FHIR ID for the patient (`PATIENT_FHIR_ID`) to cross-reference other FHIR resources.

In [None]:
PATIENT_FHIR_ID = df_patient.at[0, 'fhir_id']

df_conditions = search.steal_bundles_to_dataframe(
    resource_type="Condition",
    request_params={
        "subject":PATIENT_FHIR_ID,
        "_sort":"-onset-date"
    },
    fhir_paths=[
        ("coding_system", "Condition.code.coding.system"),
        ("coding_code", "Condition.code.coding.code"),
        ("coding_display", "Condition.code.coding.display"),
        ("id", "Condition.id"),
        ("date", "Condition.onsetDateTime")
    ],
    num_pages=1
)

df_conditions

In [None]:
df_medications = search.steal_bundles_to_dataframe(
    resource_type="MedicationRequest",
    request_params={
        "subject":PATIENT_FHIR_ID,
        "_sort":"-authoredon"
    },
    fhir_paths=[
        ("coding_system", "MedicationRequest.medicationCodeableConcept.coding.system"),
        ("coding_code", "MedicationRequest.medicationCodeableConcept.coding.code"),
        ("coding_display", "MedicationRequest.medicationCodeableConcept.coding.display"),
        ("id", "MedicationRequest.id"),
        ("date", "MedicationRequest.authoredOn"),
    ],
    num_pages=1
)

df_medications

### Observations

Next, we retrieve the observations for the patient. We are only interested in the "Hemoglobin A1c/Hemoglobin.total in Blood" (LOINC code: 4548-4) observations, so we add that restriction to the request parameters.

In [None]:
df_observations = search.steal_bundles_to_dataframe(
    resource_type="Observation",
    request_params={
        "subject":PATIENT_FHIR_ID,
        "_sort":"-date",
        "code":"4548-4"
    },
    fhir_paths=[
        ("coding_system", "Observation.code.coding.system"),
        ("coding_code", "Observation.code.coding.code"),
        ("coding_display", "Observation.code.coding.display"),
        ("id", "Observation.id"),
        ("date", "Observation.effectiveDateTime"),
        ("value", "Observation.valueQuantity.value")
    ],
    num_pages=1
)

df_observations

## Challenges

You can try to complete the challenges below.

To see a solution for each challenge, un-comment the line with the `%load` magic and run that cell. It will pull in the solution from a separate `.py` file.

In [None]:
# Load magic example
# %load ./snippets/challenge_example.py

### Challenge 1: Loading All Procedures for a Patient

Try loading all Procedure resources for our patient into a DataFrame. Identify some key data elements to include, like the code and description of the procedure, and the date it occurred on.

In [None]:
# Experiment here

In [None]:
# Un-comment the line below and run the cell to show the solution.
# %load ./snippets/challenge_1.py

### Challenge 2: FHIR search API

Use the FHIR search API (<https://www.hl7.org/fhir/http.html#search>) to get all the patients who have had visual acuity exams.

Before you start writing code, answer the following:

1. What [FHIR resource](http://hl7.org/fhir/R4/resourcelist.html) represents a visual acuity exam?
2. What data element in that resource identifies it as a visual acuity exam?
3. What code system and code indicate a visual acuity exam?

In [None]:
# Un-comment the line below and run the cell to show the answers to these questions.
# %load ./snippets/challenge_2.txt

Now, construct a FHIR search to find these patients.

Hint: see [reverse chaining](https://hl7.org/fhir/search.html#chaining).

In [None]:
# Experiment here

In [None]:
# Un-comment the line below and run the cell to show the solution.
# %load ./snippets/challenge_2.py

### Challenge 3: Search another FHIR server for the same patient

See if a patient you found in Challenge 2 exists on another FHIR server.

Before you start writing code, answer the following:

1. What data elements in [the Patient resource](http://hl7.org/fhir/R4/patient.html) can be used to identify a patient?
2. What FHIR search strategy can be used to query the 2nd FHIR server for this information?

In [None]:
# Un-comment the line below and run the cell to show the answers to these questions.
# %load ./snippets/challenge_3.txt

Use `https://r4.smarthealthit.org` for the 2nd FHIR server.

In [None]:
# Experiment here

In [None]:
# Un-comment the line below and run the cell to show the solution.
# %load ./snippets/challenge_3.py

### Challenge 4: Retrieve a different resource (no solution)

Access a FHIR server's [CapabilityStatement](https://www.hl7.org/fhir/capabilitystatement.html) and look for `"profile:"` to see the different profiles that are available.

Pick one of these for a resource we haven't looked at yet, and retrieve resource instances for that profile.