Skip to content

Commit

Permalink
121 improve documentation (#122)
Browse files Browse the repository at this point in the history
* Add logic to disable multiprocessing if processes=1

* Update examples

* Add explanation about multiple resources output

* Add example for multi dataframe output

---------

Co-authored-by: Giulia Baldini <Giulia.Baldini@uk-essen.de>
  • Loading branch information
giuliabaldini and Giulia Baldini committed Mar 17, 2023
1 parent 004036b commit ad3eefe
Show file tree
Hide file tree
Showing 6 changed files with 174 additions and 140 deletions.
70 changes: 65 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -203,10 +203,12 @@ The Pirate functions do one of three things:
| trade_rows_for_dataframe | 3 | Yes | Yes | DataFrame |


**BETA FEATURE**: It is also possible to cache the bundles using the `bundle_caching` parameter,
which specifies a caching folder. This has not yet been tested extensively and does not have any
cache invalidation mechanism.

**CACHING**: It is also possible to cache the bundles using the `cache_folder` parameter.
This unfortunately does not currently work with multiprocessing, but saves a lot of time if you
need to download a lot of data and you are always doing the same requests.
You can also specify how long the cache should be valid with the `cache_expiry_time` parameter.
Additionally, you can also specify whether the requests should be retried using the `retry_requests`
parameter. There is an example of this in the docstrings of the Pirate class.

A toy request for ImagingStudy:

Expand Down Expand Up @@ -396,7 +398,65 @@ parameters specified in `df_constraints` as columns of the final DataFrame.
You can find an example in [Example 3](https://github.com/UMEssen/FHIR-PYrate/blob/main/examples/3-patients-for-condition.ipynb).
Additionally, you can specify the `with_columns` parameter, which can add any columns from the original
DataFrame. The columns can be either specified as a list of columns `[col1, col2, ...]` or as a
list of tuples `[(new_name_for_col1, col1), (new_name_for_col2, col2), ...]`
list of tuples `[(new_name_for_col1, col1), (new_name_for_col2, col2), ...]`.

Currently, whenever a column is completely empty (i.e., no resources
have a corresponding value for that column), it is just removed from the DataFrame.
This is to ensure that we output clean DataFrames when we are handling multiple resources.
More on that in the following section.

#### Note on Querying Multiple Resources

Not all FHIR servers allow this (at least not the public ones that we have tried),
but it is also possible to obtain multiple resources with just one query:
```python
search = ...
result_dfs = search.steal_bundles_to_dataframe(
resource_type="ImagingStudy",
request_params={
"_lastUpdated": "ge2022-12",
"_count": "3",
"_include": "ImagingStudy:subject",
},
fhir_paths=[
"id",
"started",
("modality", "modality.code"),
("procedureCode", "procedureCode.coding.code"),
(
"study_instance_uid",
"identifier.where(system = 'urn:dicom:uid').value.replace('urn:oid:', '')",
),
("series_instance_uid", "series.uid"),
("series_code", "series.modality.code"),
("numberOfInstances", "series.numberOfInstances"),
("family_first", "name[0].family"),
("given_first", "name[0].given"),
],
num_pages=1,
)
```
In this case, a dictionary of DataFrames is returned, where the keys are the resource types.
You can then select the single dictionary by doing `result_dfs["ImagingStudy"]`
or `result_dfs["Patient"]`.
You can find an example of this in [Example 2](https://github.com/UMEssen/FHIR-PYrate/blob/main/examples/2-condition-to-imaging-study.ipynb)
where the `ImagingStudy` resource is queried.

In theory, it would be smarter to specify the resource name in front of the FHIRPaths,
e.g. `ImagingStudy.series.uid` instead of `series.uid`, and for each DataFrame only return the
corresponding attributes.
However, we do not want to force the user to always specify the resource type, and in the current
version the DataFrames
coming from multiple resources have the same columns, because
we cannot filter which resource was actually intended.
Currently, we solved this by just removing all columns that do not have any results.
Which means however, that if you are actually requesting an attribute for a specific resource and it
is not found, that that column will not appear.
In the future, [we plan to do a smarter filtering of the FHIRPaths](https://github.com/UMEssen/FHIR-PYrate/issues/120),
such that only the ones containing
the actual resource name are kept if the resource name is specified in the path,
and that a column full of `None`s is obtained in case no resource type is specified.


### [Miner](https://github.com/UMEssen/FHIR-PYrate/blob/main/fhir_pyrate/miner.py)

Expand Down
18 changes: 9 additions & 9 deletions examples/1-simple-json-to-df.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 4,
"outputs": [],
"source": [
"from fhir_pyrate import Pirate\n",
Expand Down Expand Up @@ -65,28 +65,28 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 5,
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"http://hapi.fhir.org/baseDstu2/Observation?_id=86092\n"
"http://hapi.fhir.org/baseDstu2/Observation/86092\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Query & Build DF: 100%|██████████| 1/1 [00:00<00:00, 5882.61it/s]\n"
"Query & Build DF (Observation): 100%|██████████| 1/1 [00:00<00:00, 12372.58it/s]\n"
]
},
{
"data": {
"text/plain": " resourceType id meta_versionId meta_lastUpdated status \\\n0 Observation 86092 1 2018-11-19T12:59:31.238+00:00 final \n\n category_coding_0_system category_coding_0_code \\\n0 http://hl7.org/fhir/observation-category vital-signs \n\n code_coding_0_system code_coding_0_code code_coding_0_display code_text \\\n0 http://loinc.org 29463-7 Body Weight Body Weight \n\n subject_reference encounter_reference effectiveDateTime \\\n0 Patient/86079 Encounter/86090 2011-03-10T20:47:29-05:00 \n\n issued valueQuantity_value valueQuantity_unit \\\n0 2011-03-10T20:47:29-05:00 6.079781 kg \n\n valueQuantity_system valueQuantity_code \n0 http://unitsofmeasure.org/ kg ",
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>resourceType</th>\n <th>id</th>\n <th>meta_versionId</th>\n <th>meta_lastUpdated</th>\n <th>status</th>\n <th>category_coding_0_system</th>\n <th>category_coding_0_code</th>\n <th>code_coding_0_system</th>\n <th>code_coding_0_code</th>\n <th>code_coding_0_display</th>\n <th>code_text</th>\n <th>subject_reference</th>\n <th>encounter_reference</th>\n <th>effectiveDateTime</th>\n <th>issued</th>\n <th>valueQuantity_value</th>\n <th>valueQuantity_unit</th>\n <th>valueQuantity_system</th>\n <th>valueQuantity_code</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>Observation</td>\n <td>86092</td>\n <td>1</td>\n <td>2018-11-19T12:59:31.238+00:00</td>\n <td>final</td>\n <td>http://hl7.org/fhir/observation-category</td>\n <td>vital-signs</td>\n <td>http://loinc.org</td>\n <td>29463-7</td>\n <td>Body Weight</td>\n <td>Body Weight</td>\n <td>Patient/86079</td>\n <td>Encounter/86090</td>\n <td>2011-03-10T20:47:29-05:00</td>\n <td>2011-03-10T20:47:29-05:00</td>\n <td>6.079781</td>\n <td>kg</td>\n <td>http://unitsofmeasure.org/</td>\n <td>kg</td>\n </tr>\n </tbody>\n</table>\n</div>"
},
"execution_count": 4,
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
Expand Down Expand Up @@ -120,28 +120,28 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 6,
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"http://hapi.fhir.org/baseDstu2/Observation?_count=1&_id=86092\n"
"http://hapi.fhir.org/baseDstu2/Observation/86092\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Query & Build DF: 100%|██████████| 1/1 [00:00<00:00, 1197.69it/s]\n"
"Query & Build DF (Observation): 100%|██████████| 1/1 [00:00<00:00, 1379.71it/s]\n"
]
},
{
"data": {
"text/plain": " id effectiveDateTime value unit patient\n0 86092 2011-03-10T20:47:29-05:00 6.079781 kg 86079",
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>id</th>\n <th>effectiveDateTime</th>\n <th>value</th>\n <th>unit</th>\n <th>patient</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>86092</td>\n <td>2011-03-10T20:47:29-05:00</td>\n <td>6.079781</td>\n <td>kg</td>\n <td>86079</td>\n </tr>\n </tbody>\n</table>\n</div>"
},
"execution_count": 5,
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
Expand Down
Loading

0 comments on commit ad3eefe

Please sign in to comment.