# FHIR-Aggregator: A Catalog of Research Data
The FHIR Aggregator acts as a centralized repository for diverse healthcare data, organized using the FHIR (Fast Healthcare Interoperability Resources) standard. It provides researchers access to a wide range of information, including:

* Clinical data: Patient demographics, conditions, medications, observations, and procedures.
* Research studies: Information about research projects, participants, and study protocols.
* OMICS data associated with Specimens

## Specify the endpoint
* We need to select the FHIR Server's URL https://google-fhir.fhir-aggregator.org

  * This line of code tells the notebook, "Remember this address: https://google-fhir.fhir-aggregator.org, and label it FHIR_BASE. We'll use it later to talk to a server that stores healthcare data."

  * By setting this environment variable, the URL to the FHIR Aggregator server is conveniently stored for later use within the notebook. This way you won't need to repeat the URL every time it's needed.

* From there we have access to search the data in the server using FHIR queries

In [1]:
%env FHIR_BASE=https://google-fhir.fhir-aggregator.org

env: FHIR_BASE=https://google-fhir.fhir-aggregator.org


## Example FHIR query

Now that you have the endpoint, if you are comfortable with FHIR, that is all you need.  For example:

This query returns the official [identifier](https://hl7.org/fhir/R4B/datatypes.html#Identifier) for all [ResearchStudy](https://hl7.org/fhir/R4B/researchstudy.html) resources.

* $FHIR_BASE is the environment variable we set earlier, which holds the FHIR server's base URL. It's expanded to the actual URL during execution.
* /ResearchStudy is the FHIR resource type we are interested in (in this case, "ResearchStudy").
* ?_elements=identifier is a FHIR search parameter that limits the returned data to only include the 'identifier' element of the ResearchStudy resources.

In [2]:
# Install the jq json formatter tool
# e.g. !apt-get install -yq jq > /dev/null
!jq --version

! curl -s $FHIR_BASE'/ResearchStudy?_elements=identifier&identifier.use=official' | jq -rc '.entry[] | [ (.resource.identifier[] | .value), .fullUrl]' | sort

jq-1.6
["CDA","https://google-fhir.fhir-aggregator.org/ResearchStudy/b86ee080-2f2f-54c6-b6a8-c1674bb95979"]
["mouse_mammary","https://google-fhir.fhir-aggregator.org/ResearchStudy/fa05d8c1-472d-501b-a813-c3f50ddc7916"]
["nlst","https://google-fhir.fhir-aggregator.org/ResearchStudy/83652a42-40ad-5105-8e71-b76df6b91923"]
["nsclc_radiomics","https://google-fhir.fhir-aggregator.org/ResearchStudy/8caad5b9-698a-5338-9e9e-8e68cda5c158"]
["nsclc_radiomics_genomics","https://google-fhir.fhir-aggregator.org/ResearchStudy/cad149f4-0891-5bfa-b379-f03113ec4685"]
["nsclc_radiomics_interobserver1","https://google-fhir.fhir-aggregator.org/ResearchStudy/d00272bc-d8ea-5283-98d1-d735ea28d63a"]
["pancreas_ct","https://google-fhir.fhir-aggregator.org/ResearchStudy/b4f53f0d-e81c-54f5-b35b-4221ea53b153"]
["pancreatic_ct_cbct_seg","https://google-fhir.fhir-aggregator.org/ResearchStudy/c217c14e-358b-5097-bef2-ca5815628fb2"]
["pediatric_ct_seg","https://google-fhir.fhir-aggregator.org/ResearchStudy/b26b67e5-48f

* Let's craft the code to query the FHIR server and load the results into a Pandas DataFrame.

In [3]:
import requests
import pandas as pd
import json

# Assuming FHIR_BASE is already set as an environment variable
fhir_base_url = %env FHIR_BASE

# Define the API endpoint
endpoint = f"{fhir_base_url}/ResearchStudy?_elements=identifier&identifier.use=official"

# Make the request
response = requests.get(endpoint)

# Check for successful response
if response.status_code == 200:
    # Parse the JSON response
    data = response.json()

    # Extract identifiers
    identifiers = []
    for entry in data.get('entry', []):
        resource = entry.get('resource', {})
        for identifier in resource.get('identifier', []):
            # add the url RearchStudy to the dataframe
            identifier['url'] = entry.get('fullUrl')
            identifiers.append(identifier)

    # Create a Pandas DataFrame
    print(f"Found {len(identifiers)} ResearchStudy identifiers. Use the 'url' field to retrieve the data.")
    df = pd.DataFrame(identifiers)
    display(df)  # Display the DataFrame
else:
    print(f"Error: Request failed with status code {response.status_code}")

Found 100 ResearchStudy identifiers. Use the 'url' field to retrieve the data.


Unnamed: 0,system,use,value,url
0,https://cda.readthedocs.io/associated_project,official,upenn_gbm,https://google-fhir.fhir-aggregator.org/Resear...
1,https://cda.readthedocs.io/associated_project,official,victre,https://google-fhir.fhir-aggregator.org/Resear...
2,https://cda.readthedocs.io/system,official,CDA,https://google-fhir.fhir-aggregator.org/Resear...
3,https://cda.readthedocs.io/associated_project,official,vestibular_schwannoma_seg,https://google-fhir.fhir-aggregator.org/Resear...
4,https://cda.readthedocs.io/associated_project,official,tcga_uvm,https://google-fhir.fhir-aggregator.org/Resear...
...,...,...,...,...
95,https://cda.readthedocs.io/associated_project,official,nsclc_radiomics_interobserver1,https://google-fhir.fhir-aggregator.org/Resear...
96,https://cda.readthedocs.io/associated_project,official,nsclc_radiomics,https://google-fhir.fhir-aggregator.org/Resear...
97,https://cda.readthedocs.io/associated_project,official,nlst,https://google-fhir.fhir-aggregator.org/Resear...
98,https://cda.readthedocs.io/associated_project,official,nsclc_radiomics_genomics,https://google-fhir.fhir-aggregator.org/Resear...


Explore the notebooks in the sidebar to learn about our command line tool fhir-query and our Vocabulary dataframe.