In [2]:
!apt-get install -yq jq > /dev/null

The system cannot find the path specified.


In [3]:
%%capture [--no-stdout]
import numpy as np
import pandas as pd
from itables import init_notebook_mode, show
init_notebook_mode(all_interactive=True)
import itables.options as opt

opt.classes="display nowrap compact"
opt.buttons=["copyHtml5", "csvHtml5", "excelHtml5"]
opt.maxBytes=0
%env FHIR_BASE= https://google-fhir.fhir-aggregator.org

!fq vocabulary vocabulary.tsv --fhir-base-url $FHIR_BASE
df = pd.read_csv('vocabulary.tsv', sep='\t').fillna('')


FileNotFoundError: [Errno 2] No such file or directory: 'vocabulary.tsv'

# Vocabulary filtering examples

It's easy enough to get a straightforward table response - just utilize Pandas to filter appropriately to your interest. Some examples below:

# Exploring a study
Finding the vocabulary specific study is easy - filter according to the identifier in the `research_study_identifiers` column.

In [None]:
scratch_df = df[df['research_study_identifiers'].str.contains('1KG')]
scratch_df

Unnamed: 0,research_study_identifiers,path,extension_url,low,high,url
Loading ITables v2.2.5 from the init_notebook_mode cell... (need help?),,,,,,


# Exploring available conditions
Or query the vocab table for the available condition codes of the various patients in each research study.

In [None]:
scratch_df = df[df['path'].str.contains('Condition.code')]
scratch_df

# Something more elaborate
In lieu of pandas filtering, we can use curl to perform more elaborate filtering. In this case we're searching for a specific condition, Neuroblastoma. Here's what each part of the below code is doing:
1. FHIR Query: The curl command sends the FHIR query to the server.
2. Data Extraction and Formatting: jq extracts the system, code, and display information from the coding elements within each Condition resource and formats them as TSV.
3. Sorting: The output is piped to sort to arrange the entries alphabetically.
4. Deduplication: The -u option in sort removes any duplicate entries, leaving only unique combinations of system, code, and display.
5. Output: The final result is a sorted and deduplicated list of coding information for conditions starting with "Adenocarcinoma," presented as TSV.

In [None]:
!curl -s $FHIR_BASE'/Condition?code:text=Neuroblastoma&_count=1000&_total=accurate&_elements=subject,extension,code'  | jq -rc ' .entry[] | .resource | .code.coding[] | [.system, .code, .display] | @tsv' | sort | uniq -c
#| jq -rc '.entry[] | .resource | [.subject.reference, (.extension[] | .valueReference.reference)] '