Let us consider a third problem: We want to find a set of patients having a specific condition,
and then find some examination belonging to these patients. To be more practical, let us consider
 all patients that have a history of seizures, and that we are interested in finding out their
 blood pressure.

The process:
1. Get all Conditions that have the SNOMED code for seizures and store the Patients.
2. Drop the Patient duplicates.
3. Get all Observations for these Patients that have LOINC code for blood pressure.

First, we initialize FHIR-PYrate.

In [3]:
from fhir_pyrate import Pirate
from fhir_pyrate.util import FHIRObj
from typing import List, Dict

search = Pirate(
    auth=None,
    base_url="http://hapi.fhir.org/baseDstu2",
    print_request_url=False,
    num_processes=1,
)

To find out all patients with seizures, we need to have a look at Condition.

Remember to install [fhirpath-py](https://github.com/beda-software/fhirpath-py) to use the
`fhir_paths` parameter.

In [4]:
condition_df = search.query_to_dataframe(  # Wrapper function
    bundles_function=search.steal_bundles,  # search.sail_through_search_space can also be used in case we have a lot of data. But we would need to specify a time frame for the query.
    resource_type="Condition",
    request_params={
        "_count": 100,
        "code": "http://snomed.info/sct%7C84757009",  # Code for seizures
        "_sort": "_id",
    },
    fhir_paths=["id", "patient.reference", "verificationStatus"],
)
condition_df

Query: 1it [00:00, 3998.38it/s]


Unnamed: 0,id,patient.reference,verificationStatus
0,1839,Patient/1834,confirmed
1,14316,Patient/14311,confirmed
2,34354,Patient/34346,confirmed
3,43629,Patient/43625,confirmed
4,46711,Patient/46706,confirmed
5,49356,Patient/49351,confirmed
6,57176,Patient/57171,confirmed
7,62556,Patient/62550,confirmed
8,65046,Patient/64991,confirmed
9,69808,Patient/69804,confirmed


This query could also be run in parallel with the `search.sail_through_search_space` function, which splits the desired period into smaller time frames and runs on query for each period. In this case, this is not needed.

Now we get the patients, and we make sure that there are no duplicates.

In [5]:
patient_df = condition_df["patient.reference"].drop_duplicates(keep="first").to_frame()
len(patient_df)

31

Now we have our patients, and we need to get their blood pressure Observations, and decide which
fields are relevant for us.

In [6]:
observation_df = search.trade_rows_for_dataframe(
    df=patient_df,
    resource_type="Observation",
    request_params={
        "_count": 100,
        "code": "http://loinc.org%7C55284-4",  # Blood pressure code
        "_sort": "_id",
    },
    df_constraints={"subject": "patient.reference"},
    fhir_paths=[
        "id",
        "effectiveDateTime",
        "component.code.coding.display",
        "component.valueQuantity.value",
        "component.valueQuantity.unit",
    ],
)
observation_df

Query & Build DF: 100%|██████████| 31/31 [00:05<00:00,  5.24it/s]


Unnamed: 0,id,effectiveDateTime,component.code.coding.display,component.valueQuantity.value,component.valueQuantity.unit,patient.reference
0,1860,2008-07-10T10:38:49-04:00,"[Systolic Blood Pressure, Diastolic Blood Pres...","[111, 72]","[mmHg, mmHg]",Patient/1834
1,1879,2009-04-17T10:51:37-04:00,"[Systolic Blood Pressure, Diastolic Blood Pres...","[120, 81]","[mmHg, mmHg]",Patient/1834
2,1904,2010-05-04T11:15:19-04:00,"[Systolic Blood Pressure, Diastolic Blood Pres...","[102, 85]","[mmHg, mmHg]",Patient/1834
3,1929,2011-04-08T07:41:18-04:00,"[Systolic Blood Pressure, Diastolic Blood Pres...","[130, 77]","[mmHg, mmHg]",Patient/1834
4,1947,2012-02-17T17:43:51-05:00,"[Systolic Blood Pressure, Diastolic Blood Pres...","[127, 74]","[mmHg, mmHg]",Patient/1834
...,...,...,...,...,...,...
115,145436,2012-12-11T03:13:58-05:00,"[Systolic Blood Pressure, Diastolic Blood Pres...","[124, 87]","[mmHg, mmHg]",Patient/145351
116,145444,2014-01-26T21:33:05-05:00,"[Systolic Blood Pressure, Diastolic Blood Pres...","[119, 82]","[mmHg, mmHg]",Patient/145351
117,145454,2015-01-16T07:30:36-05:00,"[Systolic Blood Pressure, Diastolic Blood Pres...","[107, 75]","[mmHg, mmHg]",Patient/145351
118,145468,2016-02-02T03:33:14-05:00,"[Systolic Blood Pressure, Diastolic Blood Pres...","[112, 81]","[mmHg, mmHg]",Patient/145351


As you may see, the DataFrame now contains all the information we need, but it requires a tiny of
post-processing to be able to be a bit nicer. We have another option though, which is to use
processing functions instead of FHIR paths.

We can first use `explode` to get the values in single rows.

In [7]:
observation_df.explode(
    [
        "component.code.coding.display",
        "component.valueQuantity.value",
        "component.valueQuantity.unit",
    ]
)

Unnamed: 0,id,effectiveDateTime,component.code.coding.display,component.valueQuantity.value,component.valueQuantity.unit,patient.reference
0,1860,2008-07-10T10:38:49-04:00,Systolic Blood Pressure,111,mmHg,Patient/1834
0,1860,2008-07-10T10:38:49-04:00,Diastolic Blood Pressure,72,mmHg,Patient/1834
1,1879,2009-04-17T10:51:37-04:00,Systolic Blood Pressure,120,mmHg,Patient/1834
1,1879,2009-04-17T10:51:37-04:00,Diastolic Blood Pressure,81,mmHg,Patient/1834
2,1904,2010-05-04T11:15:19-04:00,Systolic Blood Pressure,102,mmHg,Patient/1834
...,...,...,...,...,...,...
117,145454,2015-01-16T07:30:36-05:00,Diastolic Blood Pressure,75,mmHg,Patient/145351
118,145468,2016-02-02T03:33:14-05:00,Systolic Blood Pressure,112,mmHg,Patient/145351
118,145468,2016-02-02T03:33:14-05:00,Diastolic Blood Pressure,81,mmHg,Patient/145351
119,145479,2017-02-15T23:19:16-05:00,Systolic Blood Pressure,122,mmHg,Patient/145351


Or we could build a processing function, which would give us a nicer naming scheme and a better
relationship between the patients and the columns.

In [8]:
def get_observation_info(bundle: FHIRObj) -> List[Dict]:
    records = []
    for entry in bundle.entry or []:
        resource = entry.resource
        # Store the ID
        base_dict = {"observation_id": resource.id}
        for component in resource.component or []:
            # Go through the code.codings of the current components to get a name for our value
            # and store the display value
            resource_name = next(
                iter([coding.display for coding in component.code.coding or []]), None
            )
            if component.valueQuantity is not None:
                # If the component is a valueQuantity, get the value
                base_dict[resource_name] = component.valueQuantity.value
                base_dict[resource_name + " Unit"] = component.valueQuantity.unit
        records.append(base_dict)
    return records


observation_df = search.trade_rows_for_dataframe(
    df=patient_df,
    resource_type="Observation",
    request_params={
        "_count": 100,
        "code": "http://loinc.org%7C55284-4",
        "_sort": "_id",
    },
    df_constraints={"subject": "patient.reference"},
    process_function=get_observation_info,  # Use processing function instead of FHIRPath
)
observation_df

Query & Build DF: 100%|██████████| 31/31 [00:05<00:00,  5.94it/s]


Unnamed: 0,observation_id,Systolic Blood Pressure,Systolic Blood Pressure Unit,Diastolic Blood Pressure,Diastolic Blood Pressure Unit,patient.reference
0,1860,111.0,mmHg,72.0,mmHg,Patient/1834
1,1879,120.0,mmHg,81.0,mmHg,Patient/1834
2,1904,102.0,mmHg,85.0,mmHg,Patient/1834
3,1929,130.0,mmHg,77.0,mmHg,Patient/1834
4,1947,127.0,mmHg,74.0,mmHg,Patient/1834
...,...,...,...,...,...,...
115,145436,124.0,mmHg,87.0,mmHg,Patient/145351
116,145444,119.0,mmHg,82.0,mmHg,Patient/145351
117,145454,107.0,mmHg,75.0,mmHg,Patient/145351
118,145468,112.0,mmHg,81.0,mmHg,Patient/145351


And here they are, our results neatly organized in separate rows for each patient!