Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

121 improve documentation #122

Merged
merged 4 commits into from
Mar 17, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 65 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -203,10 +203,12 @@ The Pirate functions do one of three things:
| trade_rows_for_dataframe | 3 | Yes | Yes | DataFrame |


**BETA FEATURE**: It is also possible to cache the bundles using the `bundle_caching` parameter,
which specifies a caching folder. This has not yet been tested extensively and does not have any
cache invalidation mechanism.

**CACHING**: It is also possible to cache the bundles using the `cache_folder` parameter.
This unfortunately does not currently work with multiprocessing, but saves a lot of time if you
need to download a lot of data and you are always doing the same requests.
You can also specify how long the cache should be valid with the `cache_expiry_time` parameter.
Additionally, you can also specify whether the requests should be retried using the `retry_requests`
parameter. There is an example of this in the docstrings of the Pirate class.

A toy request for ImagingStudy:

Expand Down Expand Up @@ -396,7 +398,65 @@ parameters specified in `df_constraints` as columns of the final DataFrame.
You can find an example in [Example 3](https://github.com/UMEssen/FHIR-PYrate/blob/main/examples/3-patients-for-condition.ipynb).
Additionally, you can specify the `with_columns` parameter, which can add any columns from the original
DataFrame. The columns can be either specified as a list of columns `[col1, col2, ...]` or as a
list of tuples `[(new_name_for_col1, col1), (new_name_for_col2, col2), ...]`
list of tuples `[(new_name_for_col1, col1), (new_name_for_col2, col2), ...]`.

Currently, whenever a column is completely empty (i.e., no resources
have a corresponding value for that column), it is just removed from the DataFrame.
This is to ensure that we output clean DataFrames when we are handling multiple resources.
More on that in the following section.

#### Note on Querying Multiple Resources

Not all FHIR servers allow this (at least not the public ones that we have tried),
but it is also possible to obtain multiple resources with just one query:
```python
search = ...
result_dfs = search.steal_bundles_to_dataframe(
resource_type="ImagingStudy",
request_params={
"_lastUpdated": "ge2022-12",
"_count": "3",
"_include": "ImagingStudy:subject",
},
fhir_paths=[
"id",
"started",
("modality", "modality.code"),
("procedureCode", "procedureCode.coding.code"),
(
"study_instance_uid",
"identifier.where(system = 'urn:dicom:uid').value.replace('urn:oid:', '')",
),
("series_instance_uid", "series.uid"),
("series_code", "series.modality.code"),
("numberOfInstances", "series.numberOfInstances"),
("family_first", "name[0].family"),
("given_first", "name[0].given"),
],
num_pages=1,
)
```
In this case, a dictionary of DataFrames is returned, where the keys are the resource types.
You can then select the single dictionary by doing `result_dfs["ImagingStudy"]`
or `result_dfs["Patient"]`.
You can find an example of this in [Example 2](https://github.com/UMEssen/FHIR-PYrate/blob/main/examples/2-condition-to-imaging-study.ipynb)
where the `ImagingStudy` resource is queried.

In theory, it would be smarter to specify the resource name in front of the FHIRPaths,
e.g. `ImagingStudy.series.uid` instead of `series.uid`, and for each DataFrame only return the
corresponding attributes.
However, we do not want to force the user to always specify the resource type, and in the current
version the DataFrames
coming from multiple resources have the same columns, because
we cannot filter which resource was actually intended.
Currently, we solved this by just removing all columns that do not have any results.
Which means however, that if you are actually requesting an attribute for a specific resource and it
is not found, that that column will not appear.
In the future, [we plan to do a smarter filtering of the FHIRPaths](https://github.com/UMEssen/FHIR-PYrate/issues/120),
such that only the ones containing
the actual resource name are kept if the resource name is specified in the path,
and that a column full of `None`s is obtained in case no resource type is specified.


### [Miner](https://github.com/UMEssen/FHIR-PYrate/blob/main/fhir_pyrate/miner.py)

Expand Down
18 changes: 9 additions & 9 deletions examples/1-simple-json-to-df.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 4,
"outputs": [],
"source": [
"from fhir_pyrate import Pirate\n",
Expand Down Expand Up @@ -65,28 +65,28 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 5,
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"http://hapi.fhir.org/baseDstu2/Observation?_id=86092\n"
"http://hapi.fhir.org/baseDstu2/Observation/86092\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Query & Build DF: 100%|██████████| 1/1 [00:00<00:00, 5882.61it/s]\n"
"Query & Build DF (Observation): 100%|██████████| 1/1 [00:00<00:00, 12372.58it/s]\n"
]
},
{
"data": {
"text/plain": " resourceType id meta_versionId meta_lastUpdated status \\\n0 Observation 86092 1 2018-11-19T12:59:31.238+00:00 final \n\n category_coding_0_system category_coding_0_code \\\n0 http://hl7.org/fhir/observation-category vital-signs \n\n code_coding_0_system code_coding_0_code code_coding_0_display code_text \\\n0 http://loinc.org 29463-7 Body Weight Body Weight \n\n subject_reference encounter_reference effectiveDateTime \\\n0 Patient/86079 Encounter/86090 2011-03-10T20:47:29-05:00 \n\n issued valueQuantity_value valueQuantity_unit \\\n0 2011-03-10T20:47:29-05:00 6.079781 kg \n\n valueQuantity_system valueQuantity_code \n0 http://unitsofmeasure.org/ kg ",
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>resourceType</th>\n <th>id</th>\n <th>meta_versionId</th>\n <th>meta_lastUpdated</th>\n <th>status</th>\n <th>category_coding_0_system</th>\n <th>category_coding_0_code</th>\n <th>code_coding_0_system</th>\n <th>code_coding_0_code</th>\n <th>code_coding_0_display</th>\n <th>code_text</th>\n <th>subject_reference</th>\n <th>encounter_reference</th>\n <th>effectiveDateTime</th>\n <th>issued</th>\n <th>valueQuantity_value</th>\n <th>valueQuantity_unit</th>\n <th>valueQuantity_system</th>\n <th>valueQuantity_code</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>Observation</td>\n <td>86092</td>\n <td>1</td>\n <td>2018-11-19T12:59:31.238+00:00</td>\n <td>final</td>\n <td>http://hl7.org/fhir/observation-category</td>\n <td>vital-signs</td>\n <td>http://loinc.org</td>\n <td>29463-7</td>\n <td>Body Weight</td>\n <td>Body Weight</td>\n <td>Patient/86079</td>\n <td>Encounter/86090</td>\n <td>2011-03-10T20:47:29-05:00</td>\n <td>2011-03-10T20:47:29-05:00</td>\n <td>6.079781</td>\n <td>kg</td>\n <td>http://unitsofmeasure.org/</td>\n <td>kg</td>\n </tr>\n </tbody>\n</table>\n</div>"
},
"execution_count": 4,
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
Expand Down Expand Up @@ -120,28 +120,28 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 6,
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"http://hapi.fhir.org/baseDstu2/Observation?_count=1&_id=86092\n"
"http://hapi.fhir.org/baseDstu2/Observation/86092\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Query & Build DF: 100%|██████████| 1/1 [00:00<00:00, 1197.69it/s]\n"
"Query & Build DF (Observation): 100%|██████████| 1/1 [00:00<00:00, 1379.71it/s]\n"
]
},
{
"data": {
"text/plain": " id effectiveDateTime value unit patient\n0 86092 2011-03-10T20:47:29-05:00 6.079781 kg 86079",
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>id</th>\n <th>effectiveDateTime</th>\n <th>value</th>\n <th>unit</th>\n <th>patient</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>86092</td>\n <td>2011-03-10T20:47:29-05:00</td>\n <td>6.079781</td>\n <td>kg</td>\n <td>86079</td>\n </tr>\n </tbody>\n</table>\n</div>"
},
"execution_count": 5,
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
Expand Down
Loading