In [1]:
!sudo -s apt-get install -yq jq > /dev/null
!jq --version

The system cannot find the path specified.
'jq' is not recognized as an internal or external command,
operable program or batch file.


In [2]:
%%capture [--no-stdout]
!pip install fhir-aggregator-client --no-cache-dir --quiet

In [18]:
%%capture [--no-stdout]
import numpy as np
import pandas as pd
import itables
init_notebook_mode(connected=True)
import itables.options as opt

opt.classes="display nowrap compact"
opt.buttons=["copyHtml5", "csvHtml5", "excelHtml5"]
opt.maxBytes=0
%env FHIR_BASE= https://google-fhir.fhir-aggregator.org

!fq vocabulary vocabulary.tsv --fhir-base-url $FHIR_BASE
df = pd.read_csv('vocabulary.tsv', sep='\t').fillna('')

# Vocabulary filtering examples

It's easy enough to get a straightforward table response - just utilize Pandas to filter appropriately to your interest. Some examples below:

# Exploring a study
Finding the vocabulary specific study is easy - filter according to the desired identifier in the `research_study_identifiers` column.

In [19]:
scratch_df = df[df['research_study_identifiers'].str.contains('1KG')]
scratch_df

Unnamed: 0,research_study_identifiers,path,documentation,code,display,system,extension_url,count,low,high,url,research_study_title,research_study_description,observation,research_study
3740,1KG,ServiceRequest.category,https://hl7.org/fhir/R4B/servicerequest-defini...,108252007,Laboratory procedure,http://snomed.info/sct,,1.0,,,https://google-fhir.fhir-aggregator.org/Servic...,1000 Genomes Project Sample Metadata,,Observation/db717d0f-1ca1-5c2f-b01c-17ee8c73fbf6,ResearchStudy/4502d1f5-5275-5be7-9942-21f7fb8a...
3741,1KG,ServiceRequest.code,https://hl7.org/fhir/R4B/servicerequest-defini...,15220000,Laboratory test,http://snomed.info/sct,,1.0,,,https://google-fhir.fhir-aggregator.org/Servic...,1000 Genomes Project Sample Metadata,,Observation/db717d0f-1ca1-5c2f-b01c-17ee8c73fbf6,ResearchStudy/4502d1f5-5275-5be7-9942-21f7fb8a...
3742,1KG,DocumentReference.type,https://hl7.org/fhir/R4B/documentreference-def...,VCF,VCF,https://ftp.1000genomes.ebi.ac.uk/data_format,,46.0,,,https://google-fhir.fhir-aggregator.org/Docume...,1000 Genomes Project Sample Metadata,,Observation/db717d0f-1ca1-5c2f-b01c-17ee8c73fbf6,ResearchStudy/4502d1f5-5275-5be7-9942-21f7fb8a...
3743,1KG,DocumentReference.type,https://hl7.org/fhir/R4B/documentreference-def...,NEW,NEW,https://ftp.1000genomes.ebi.ac.uk/data_format,,2.0,,,https://google-fhir.fhir-aggregator.org/Docume...,1000 Genomes Project Sample Metadata,,Observation/db717d0f-1ca1-5c2f-b01c-17ee8c73fbf6,ResearchStudy/4502d1f5-5275-5be7-9942-21f7fb8a...
3744,1KG,DocumentReference.category,https://hl7.org/fhir/R4B/documentreference-def...,1,Chromosome 1,https://ftp.1000genomes.ebi.ac.uk/chromosome,,2.0,,,https://google-fhir.fhir-aggregator.org/Docume...,1000 Genomes Project Sample Metadata,,Observation/db717d0f-1ca1-5c2f-b01c-17ee8c73fbf6,ResearchStudy/4502d1f5-5275-5be7-9942-21f7fb8a...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3821,1KG,Patient.extension,https://hl7.org/fhir/R4B/patient-definitions.h...,LWK,LWK,,https://nih-ncpi.github.io/ncpi-fhir-ig-2/Stru...,116.0,,,https://google-fhir.fhir-aggregator.org/Patien...,1000 Genomes Project Sample Metadata,,Observation/db717d0f-1ca1-5c2f-b01c-17ee8c73fbf6,ResearchStudy/4502d1f5-5275-5be7-9942-21f7fb8a...
3822,1KG,Patient.extension,https://hl7.org/fhir/R4B/patient-definitions.h...,ASW,ASW,,https://nih-ncpi.github.io/ncpi-fhir-ig-2/Stru...,112.0,,,https://google-fhir.fhir-aggregator.org/Patien...,1000 Genomes Project Sample Metadata,,Observation/db717d0f-1ca1-5c2f-b01c-17ee8c73fbf6,ResearchStudy/4502d1f5-5275-5be7-9942-21f7fb8a...
3823,1KG,Patient.extension,https://hl7.org/fhir/R4B/patient-definitions.h...,MXL,MXL,,https://nih-ncpi.github.io/ncpi-fhir-ig-2/Stru...,107.0,,,https://google-fhir.fhir-aggregator.org/Patien...,1000 Genomes Project Sample Metadata,,Observation/db717d0f-1ca1-5c2f-b01c-17ee8c73fbf6,ResearchStudy/4502d1f5-5275-5be7-9942-21f7fb8a...
3824,1KG,Patient.extension,https://hl7.org/fhir/R4B/patient-definitions.h...,TSI,TSI,,https://nih-ncpi.github.io/ncpi-fhir-ig-2/Stru...,112.0,,,https://google-fhir.fhir-aggregator.org/Patien...,1000 Genomes Project Sample Metadata,,Observation/db717d0f-1ca1-5c2f-b01c-17ee8c73fbf6,ResearchStudy/4502d1f5-5275-5be7-9942-21f7fb8a...


# Exploring available conditions
Or query the vocab table for the available condition codes of the various patients in each research study.

In [20]:
scratch_df = df[df['path'].str.contains('Condition.code')]
scratch_df

Unnamed: 0,research_study_identifiers,path,documentation,code,display,system,extension_url,count,low,high,url,research_study_title,research_study_description,observation,research_study
11,ICGC-LUCA_KR,Condition.code,https://hl7.org/fhir/R4B/condition-definitions...,C15.5,C15.5,https://terminology.hl7.org/5.1.0/NamingSystem...,,400.0,,,https://google-fhir.fhir-aggregator.org/Condit...,,ICGC DCC & Argo Personalised Genomic Character...,Observation/4fc6bcaa-bb07-51e9-99b2-a85f679d81ab,ResearchStudy/92f949c7-4244-5218-8569-59707365...
230,TNP SARDANA,Condition.code,https://hl7.org/fhir/R4B/condition-definitions...,Mucous adenocarcinoma,Mucous adenocarcinoma,https://data.humantumoratlas.org,,1.0,,,https://google-fhir.fhir-aggregator.org/Condit...,,,Observation/e6d6c1dc-d227-5b29-ab6f-3c4102589475,ResearchStudy/b7015858-983c-5e51-bcd6-af788edd...
231,TNP SARDANA,Condition.code,https://hl7.org/fhir/R4B/condition-definitions...,Not Reported,Not Reported,https://data.humantumoratlas.org,,1.0,,,https://google-fhir.fhir-aggregator.org/Condit...,,,Observation/e6d6c1dc-d227-5b29-ab6f-3c4102589475,ResearchStudy/b7015858-983c-5e51-bcd6-af788edd...
534,HTAPP,Condition.code,https://hl7.org/fhir/R4B/condition-definitions...,Lobular and ductal carcinoma,Lobular and ductal carcinoma,https://data.humantumoratlas.org,,9.0,,,https://google-fhir.fhir-aggregator.org/Condit...,,,Observation/165e6f12-8c34-5cec-9964-57222cc93b42,ResearchStudy/64eafc92-6a65-5fb6-9d0e-15778b9a...
535,HTAPP,Condition.code,https://hl7.org/fhir/R4B/condition-definitions...,Ductal carcinoma NOS,Ductal carcinoma NOS,https://data.humantumoratlas.org,,43.0,,,https://google-fhir.fhir-aggregator.org/Condit...,,,Observation/165e6f12-8c34-5cec-9964-57222cc93b42,ResearchStudy/64eafc92-6a65-5fb6-9d0e-15778b9a...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
20801,TCGA-HNSC,Condition.code,https://hl7.org/fhir/R4B/condition-definitions...,"Verrucous carcinoma, NOS","Verrucous carcinoma, NOS",https://gdc.cancer.gov/primary_diagnosis,,1.0,,,https://google-fhir.fhir-aggregator.org/Condit...,Head and Neck Squamous Cell Carcinoma,,Observation/22fb4482-ed2a-54a6-a06f-511995ba1512,ResearchStudy/1644bfad-85da-56c7-8408-52f18f81...
20802,TCGA-HNSC,Condition.code,https://hl7.org/fhir/R4B/condition-definitions...,"Melanoma, NOS","Melanoma, NOS",https://gdc.cancer.gov/primary_diagnosis,,1.0,,,https://google-fhir.fhir-aggregator.org/Condit...,Head and Neck Squamous Cell Carcinoma,,Observation/22fb4482-ed2a-54a6-a06f-511995ba1512,ResearchStudy/1644bfad-85da-56c7-8408-52f18f81...
20803,TCGA-HNSC,Condition.code,https://hl7.org/fhir/R4B/condition-definitions...,Combined small cell carcinoma,Combined small cell carcinoma,https://gdc.cancer.gov/primary_diagnosis,,1.0,,,https://google-fhir.fhir-aggregator.org/Condit...,Head and Neck Squamous Cell Carcinoma,,Observation/22fb4482-ed2a-54a6-a06f-511995ba1512,ResearchStudy/1644bfad-85da-56c7-8408-52f18f81...
20804,TCGA-HNSC,Condition.code,https://hl7.org/fhir/R4B/condition-definitions...,4797003,"Papillary adenocarcinoma, NOS",http://snomed.info/sct,,2.0,,,https://google-fhir.fhir-aggregator.org/Condit...,Head and Neck Squamous Cell Carcinoma,,Observation/22fb4482-ed2a-54a6-a06f-511995ba1512,ResearchStudy/1644bfad-85da-56c7-8408-52f18f81...


# Something more elaborate
In lieu of pandas filtering, we can use curl to perform more elaborate filtering. In this case we're searching for a specific condition, Neuroblastoma. We'll leverage the `jq` command line utility for the purposes of formatting the output appropriately. 
 
Here's what each part of the below code is doing:
1. FHIR Query: The curl command sends the FHIR query to the server.
2. Data Extraction and Formatting: jq extracts the system, code, and display information from the coding elements within each Condition resource and formats them as TSV.
3. Sorting: The output is piped to sort to arrange the entries alphabetically.
4. Deduplication: The -u option in sort removes any duplicate entries, leaving only unique combinations of system, code, and display.
5. Output: The final result is a sorted and deduplicated list of coding information for conditions starting with "Adenocarcinoma," presented as TSV.

In [21]:
!curl -s $FHIR_BASE'/Condition?code:text=Neuroblastoma&_count=1000&_total=accurate&_elements=subject,extension,code'  | jq -rc ' .entry[] | .resource | .code.coding[] | [.system, .code, .display] | @tsv' | sort | uniq -c
#| jq -rc '.entry[] | .resource | [.subject.reference, (.extension[] | .valueReference.reference)] '

'_count' is not recognized as an internal or external command,
operable program or batch file.
'_total' is not recognized as an internal or external command,
operable program or batch file.
'_elements' is not recognized as an internal or external command,
operable program or batch file.
