# Using the fhir-query tool

## FHIR-Aggregator: A Catalog of Research Data
The FHIR Aggregator acts as a centralized repository for diverse healthcare data, organized using the FHIR (Fast Healthcare Interoperability Resources) standard. It provides researchers access to a wide range of information, including:

* Clinical data: Patient demographics, conditions, medications, observations, and procedures.
* Research studies: Information about research projects, participants, and study protocols.
* OMICS data associated with Specimens


## fq (fhir-query): Your FHIR Querying Assistant
The fq utility, short for "fhir-query," is a command-line tool specifically designed to simplify the process of interacting with FHIR servers. It provides researchers with a convenient way to:

1. Retrieve the vocabulary of a FHIR server: With the vocabulary command, fq fetches and summarizes the key data elements (CodeableConcepts and Extensions) used within the FHIR data. This creates a central vocabulary Dataframe that helps researchers identify important data elements and their usage within the server.


2. Execute queries to retrieve FHIR resources: Researchers can then use fq to execute FHIR queries using a readable syntax. This helps to retrieve and filter data from the FHIR Server based on various search parameters and criteria.

## Install the query tool

```
!pip install fhir-aggregator-client==0.1.8rc8 --no-cache-dir --quiet

```
### Check the version

```
!pip freeze | grep fhir_aggregator_client
```

### Verify the install

```bash {title="command line"}
fq
```

```python {title="ipython"}
!fq
```

```python {title="python"}
import os
os.system('fq')
```

## Utilizing FHIR GraphDefinition
We can use **[FHIR GraphDefinition](https://hl7.org/fhir/graphdefinition.html)** objects to define and execute graph-based traversals across multiple interconnected FHIR resource graphs. The data retrieved is written to a **local SQLite database** for persistence and later transformed into **analyst-friendly dataframes** for analysis using tools like Python’s pandas library. fhir-query comes with some **GraphDefinitions** pre-installed.

In [None]:
!fq ls --fhir_base_url https://google-fhir.fhir-aggregator.org

#### Run a GraphDefinition

In [None]:
!fq run condition-graph '/Condition?code:text=cholangiocarcinoma' --fhir_base_url https://google-fhir.fhir-aggregator.org

### Visualize Results

In [None]:
!fq results visualize --fhir_base_url https://google-fhir.fhir-aggregator.org

In [None]:
# create a graph of the results

from IPython.display import HTML
with open('fhir-graph.html', 'r') as file:
    html_content = file.read()

# Set the display height (in pixels)
display(HTML("<div style='height: 800px;'>{}</div>".format(html_content)))

### Create a dataframe of results

In [None]:
!fq results dataframe --fhir_base_url https://google-fhir.fhir-aggregator.org

In [None]:
import pandas as pd

df = pd.read_csv('fhir-graph.tsv', sep='\t')

df

## Using fhir-query with other servers
You can use `fq` with other FHIR servers. The below example retrieves a study from dbGAP.

In [None]:
# delete the previous results, start with a fresh database
!rm ~/.fhir-aggregator/fhir-graph.sqlite
!fq run  --fhir-base-url https://dbgap-api.ncbi.nlm.nih.gov/fhir-jpa-pilot/x1  research-study-link-iterate  '/ResearchStudy?_id=phs001232'

In [None]:
# use the same commands to analyse results
!fq results visualize

In [None]:
# create a graph of the results

from IPython.display import HTML
with open('fhir-graph.html', 'r') as file:
    html_content = file.read()

# Set the display height (in pixels)
display(HTML("<div style='height: 800px;'>{}</div>".format(html_content)))

In [None]:
# create a dataframe of results
!fq results dataframe

In [None]:
import pandas as pd

df = pd.read_csv('fhir-graph.tsv', sep='\t')

df

## Start to Finish Example: Plotting Survival Curves with fhir-query
In this example, we'll look through the TCGA ResearchStudy for breast cancer patients and divide these patients into two cohors, White & Under 50 and African American & Under 50. After obtaining the necessary slices we'll compare the Kaplan-Meier curves of the two cohorts.

In [None]:
!pip install lifelines -q

We provide our own GraphDefinition here.

In [None]:
!wget https://raw.githubusercontent.com/FHIR-Aggregator/fhir-query/refs/heads/main/graph-definitions/R5/ResearchStudyGraph.yaml

### Export TCGA-BRCA data to a local database and create a dataframe

In [None]:
%env  FHIR_BASE=https://google-fhir.fhir-aggregator.org
# export a study using a set of stored queries
!fq --fhir-base-url $FHIR_BASE  --graph-definition-file-path  ResearchStudyGraph.yaml  --path '/ResearchStudy?identifier=TCGA-BRCA'
!fq dataframe

### Survival analysis
After retrieving the data, we use the previous installed library lifelines to plot Kaplan-Meier curves of the two cohorts.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from lifelines import KaplanMeierFitter
kmf = KaplanMeierFitter()

# read the data into a dataframe
df = pd.read_csv('/tmp/fhir-graph.tsv')

# get days to death data in the necessary formay
df['days_to_death'] = (
    df['patient_observation_days_between_diagnosis_and_death']
    .str.replace(' days', '', regex=False)
    .replace('', np.nan)
    .astype(float)
)
# get age data in the necessary format
df['age_at_diagnosis'] = (
    df['patient_observation_days_between_birth_and_diagnosis']
    .str.replace(' days', '', regex=False)
    .replace('', np.nan)
    .astype(float)
)

# group by patient_id
df_unique = df.drop_duplicates(subset=['patient_id'])

In [None]:
df_cohort = df_unique[ (df_unique['age_at_diagnosis'] >= -50*365 )
                      & (df_unique['patient_us_core_race'].isin(['black or african american','white']) )
                      & (df_unique['patient_us_core_ethnicity'] == 'not hispanic or latino')   ]

In [None]:
# Fill in NAs in days_to_death with the max from the days to death
T = df_cohort['days_to_death'].fillna(df_cohort['days_to_death'].max())

# Convert the vital status to numbers
E = df_cohort['patient_deceasedBoolean'].astype(bool)

In [None]:
fig=plt.figure(figsize=(13, 8), dpi= 80)
#plt.style.use('seaborn-colorblind')
ax = plt.subplot(111,
                 title = "Survival Curve")

for r in  df_cohort['patient_us_core_race'].sort_values().unique() :
  if (r != None):
    cohort = df_cohort['patient_us_core_race'] == r
    kmf.fit(T.loc[cohort], E.loc[cohort], label=r)
    kmf.plot(ax=ax, )
  else:
    print("")

ax.set_ylabel("Percent Survival")
ax.set_xlabel("Days")