# fhir-query man page

FHIR-Aggregator is a command line package. The following commands are all shown an ipython notebook.
- to run commands in python you will need to use export the `fq` command to the terminal using the `os` package:
   ```python
      import os
      os.system('fq --help')
   ```
- to run commands on the command line do not add the `!`, as in:
`fq --help`

To see help pages while you work type:

```
!fq <subcommand> --help
```

example:

```
!fq ls --help
```

## Install `fhir-query`

`pip install fhir-aggregator-client`

## URL for FHIR-Aggregator server

`%env FHIR_BASE=https://google-fhir.fhir-aggregator.org`

## Get a tsv of all FHIR-Aggregator vocabulary

`!fq vocabulary vocabulary.tsv --fhir-base-url $FHIR_BASE`

## fq options

In [None]:
!fq --help

[0mUsage: fq [OPTIONS] COMMAND [ARGS]...

  FHIR-Aggregator utilities.

Options:
  --version  Show the version and exit.
  --help     Show this message and exit.

Commands:
  ls          List all the installed GraphDefinitions.
  run         Run GraphDefinition queries.
  results     Work with the results of a GraphDefinition query.
  vocabulary  FHIR-Aggregator's key Resources and CodeSystems.
[0m[0m

## list and run GraphDefinitions

In [None]:
!fq ls --help

[0mUsage: fq ls [OPTIONS]

  List all the installed GraphDefinitions.

Options:
  -f, --format [table|yaml|json]  Output format
  --help                          Show this message and exit.
[0m[0m

In [None]:
!fq ls --format yaml

- description: (FHIR-Aggregator) Condition to ResearchStudy and children focusing
    on Observations [NCIT_C156418,NCIT_C156419,NCIT_C164934]. fhir-query '/Condition?code:code=70179006'
  id: cholangiocarcinoma-graph
  path: /Users/walsbr/FHIR-Aggregator/helpdesk/venv/lib/python3.12/site-packages/fhir_aggregator_client/graph-definitions/CDACholangiocarcinomaGraph.yaml
- description: (FHIR-Aggregator) Retrieve Patient and Observations [NCIT_C156418,NCIT_C156419].
    fhir-query '/ResearchStudy?identifier=TCGA-BRCA'
  id: patient-survival-graph
  path: /Users/walsbr/FHIR-Aggregator/helpdesk/venv/lib/python3.12/site-packages/fhir_aggregator_client/graph-definitions/PatientSurvivalGraph.yaml
- description: (FHIR-Aggregator) Condition to ResearchStudy and children. fhir-query
    '/Condition?code:code=70179006'
  id: condition-graph
  path: /Users/walsbr/FHIR-Aggregator/helpdesk/venv/lib/python3.12/site-packages/fhir_aggregator_client/graph-definitions/ConditionGraph.yaml
- description: (F

## Lets run a retrieval of a research study, ex. 1000 Genomes 
`!fq run research-study-part-of '/ResearchStudy?identifier=1KG'`



```
 
research-study-part-of is valid FHIR R5 GraphDefinition
ℹ Fetching https://google-fhir.fhir-aggregator.org/ResearchStudy?identifier=1KG
ℹ Processing ResearchStudy with 1 resources
ℹ Processing 11 links for ResearchStudy in parallel.
ℹ Processing link: ResearchSubject/study={ref} with 1 ResearchStudy(s)
ℹ Processing link: Group/part-of-study={ref}&_count=1000&_total=accurate with 1 ResearchStudy(s)
ℹ Processing link: Patient/part-of-study={ref}&_count=1000&_total=accurate with 1 ResearchStudy(s)
ℹ Processing link: Specimen/part-of-study={ref}&_count=1000&_total=accurate with 1 ResearchStudy(s)
ℹ Processing link: Observation/part-of-study={ref}&_count=1000&_total=accurate with 1 ResearchStudy(s)
ℹ Processing link: Procedure/part-of-study={ref}&_count=1000&_total=accurate with 1 ResearchStudy(s)
ℹ Processing link: DocumentReference/part-of-study={ref}&_count=1000&_total=accurate with 1 ResearchStudy(s)
ℹ Processing link: ServiceRequest/part-of-study={ref}&_count=1000&_total=accurate with 1 ResearchStudy(s)
ℹ Processing link: ImagingStudy/part-of-study={ref}&_count=1000&_total=accurate with 1 ResearchStudy(s)
ℹ Processing link: Condition/part-of-study={ref}&_count=1000&_total=accurate with 1 ResearchStudy(s)
ℹ Processing link: MedicationAdministration/part-of-study={ref}&_count=1000&_total=accurate with 1 ResearchStudy(s)
✔ Processed link: MedicationAdministration/part-of-study={ref}&_count=1000&_total=accurate
✔ Processed link: ImagingStudy/part-of-study={ref}&_count=1000&_total=accurate
✔ Processed link: Group/part-of-study={ref}&_count=1000&_total=accurate
✔ Processed link: Observation/part-of-study={ref}&_count=1000&_total=accurate
✔ Processed link: Condition/part-of-study={ref}&_count=1000&_total=accurate
✔ Processed link: Procedure/part-of-study={ref}&_count=1000&_total=accurate
✔ Processed link: ServiceRequest/part-of-study={ref}&_count=1000&_total=accurate
✔ Processed link: Patient/part-of-study={ref}&_count=1000&_total=accurate
✔ Processed link: Specimen/part-of-study={ref}&_count=1000&_total=accurate
ℹ Fetching ResearchSubject page 10 of 35
✔ Processed link: DocumentReference/part-of-study={ref}&_count=1000&_total=accurate
ℹ Fetching ResearchSubject page 20 of 35
ℹ Fetching ResearchSubject page 30 of 35
✔ Processed link: ResearchSubject/study={ref}
Aggregated Results: {'DocumentReference': 48, 'Observation': 1, 'Patient': 3500, 'ResearchStudy': 1, 'ResearchSubject': 3500, 'ServiceRequest': 1, 'Specimen': 3500}
database available at: ~/.fhir-aggregator/fhir-graph.sqlite

```

## Work with the retrieved FHIR resources

## fq results
```
fq results
Usage: fq results [OPTIONS] COMMAND [ARGS]...

  Work with the results of a GraphDefinition query.

Options:
  --help  Show this message and exit.

Commands:
  dataframe  Create dataframe from the local db.
  summarize  Summarize the aggregation results.
  visualize  Visualize the FHIR Resources in the database.

```

`fq results  summarize`
```
DocumentReference:
  count: 48
  references:
    Patient:
      count: 48
    ResearchStudy:
      count: 48
    ServiceRequest:
      count: 48
    Specimen:
      count: 120192
Observation:
  count: 1
  references:
    ResearchStudy:
      count: 2
Patient:
  count: 3500
  references:
    ResearchStudy:
      count: 3500
ResearchStudy:
  count: 1
  references:
    ResearchStudy:
      count: 1
ResearchSubject:
  count: 3500
  references:
    Patient:
      count: 3500
    ResearchStudy:
      count: 7000
ServiceRequest:
  count: 1
  references:
    Patient:
      count: 1
    ResearchStudy:
      count: 1
    Specimen:
      count: 2504
Specimen:
  count: 3500
  references:
    Patient:
      count: 3500
    ResearchStudy:
      count: 3500

```


## fq results visualize

```fq results visualize

Wrote: fhir-graph.html
```

![1KG visualization](/images/1KG-visualization.png)



## fq results dataframe
Create a dataframe of the results
 * We can create a tsv
```
fq results dataframe
Saved fhir-graph.tsv
```
*  Or launch dtale for review
```fq results dataframe --dtale```

![1KG dataframe](/images/1KG-dataframe.png)

## fq vocabulary 
```
fq vocabulary --help
Usage: fq vocabulary [OPTIONS] [OUTPUT_PATH]

  FHIR-Aggregator's key Resources and CodeSystems. 

  OUTPUT_PATH: Path to the output file. If not provided, the output will be
  printed to stdout.

Options:
  --fhir-base-url TEXT          Base URL of the FHIR server. default: env
                                $FHIR_BASE  [required]
  -f, --format [tsv|yaml|json]  Output format
  --debug                       Enable debug mode.
  --log-file TEXT               Path to the log file.
                                default=/Users/walsbr/.fhir-aggregator/app.log
  --dtale                       Open the graph in a browser using the dtale
                                package for interactive data exploration.
  --help                        Show this message and exit.
```
* This retrieves a histogram of all codeable concepts and extensions for all projects, rendered as a TSV.  Alternatively use the `--dtale` option to browse and filter results

![1KG population vocabulary](/images/1KG-population-vocabulary.png)

Here’s a **FHIR architect’s technical note** describing this Observation resource and its relationship to the `ResearchStudy`:

---

# Technical Note: "Vocabulary" Observation Summary

## Overview

The Observation resource (`Observation/2034d746-6758-59aa-bd72-2ff659008ec8`) represents an **aggregated, study-level summary** of clinical and research data derived from multiple FHIR domains (Observation, Condition, Specimen, MedicationAdministration, Patient). Rather than describing a single patient instance, this Observation captures **population-level counts** of coded values that occurred within a referenced `ResearchStudy`.

The Observation is marked with `status = final`, signaling that it is a stable, reportable aggregation.

---

## Observation Contents

### 1. **Observation.code**

* The top-level `code` = `"vocabulary"` (`http://fhir-aggregator.org/fhir/CodeSystem/vocabulary`), indicating this Observation is reporting vocabulary usage frequencies within the study cohort.

* To retrieve all vocabulary resources.
```
https://google-fhir.fhir-aggregator.org/Observation?code=vocabulary
```

### 2. **Observation.component[]**

Each component encodes:

* A **path** (e.g., `Specimen.type`, `Observation.category`, `Condition.code`, `MedicationAdministration.medicationCodeableConcept`, `Patient.extension`).
* A **terminology code** from standard vocabularies (e.g., SNOMED CT, LOINC, NCIt, US Core Race/Ethnicity, HL7 clinical status).
* A **count** (`valueInteger`), representing how many study participants or records carried that specific coding.

Examples:

* **Specimen Types**: Tumor (22,722) vs Normal (12,433).
* **Observation Categories**: Laboratory (36,311), Survey (1,114).
* **Genomics**: LOINC 81247-9 (genetic variant reporting panel) observed 35,155 times.
* **Disease Progression**: NCIT “Days Between Birth and Diagnosis” (978), “Days Between Diagnosis and Death” (136).
* **Stage**: SNOMED AJCC pathological stages (e.g., Stage IIA = 351 cases, Stage IV = 22).
* **Conditions**: Cancer histologies (e.g., Lobular carcinoma, NOS = 219; Infiltrating ductular carcinoma = 828).
* **Medications**: Chemotherapeutics and hormonals (e.g., Cyclophosphamide = 453, Tamoxifen = 239).
* **Demographics**: Patient extensions (sex, race, ethnicity). Example: Female (982), Male (10), White (688), Black or African American (174), Hispanic/Latino (36).

### 3. **Extensions**


# Aggregating Extensions in a Study-Level Observation

In this Observation pattern, **FHIR extensions** such as *US Core Birth Sex, Race, and Ethnicity* are represented as aggregated counts. Each extension value is encoded as a separate `component`:

* **Context coding**:
  A synthetic path indicator shows where the value was found, for example:

  ```json
  {
    "system": "http://fhir-aggregator.org/fhir/CodeSystem/vocabulary/path",
    "code": "Patient.extension",
    "display": "Patient.extension"
  }
  ```

* **Value coding**:
  The actual extension value is captured using the extension’s canonical system. Examples include:

  ```json
  { "system": "http://hl7.org/fhir/us/core/StructureDefinition/us-core-birthsex",
    "code": "F", "display": "F" }
  ```

  ```json
  { "system": "http://hl7.org/fhir/us/core/StructureDefinition/us-core-race",
    "code": "White", "display": "White" }
  ```

  ```json
  { "system": "http://hl7.org/fhir/us/core/StructureDefinition/us-core-ethnicity",
    "code": "hispanic or latino", "display": "hispanic or latino" }
  ```

* **Count**:
  The `valueInteger` holds the frequency observed in the study cohort.


This structure makes extension values **queryable and comparable** at the **ResearchStudy level** while preserving both the **context** (path in FHIR) and the **value semantics** (extension-defined code).



### 4. **Focus**

* Direct reference to the same `ResearchStudy`.

Together, the extension and focus explicitly bind this Observation to a specific ResearchStudy, framing it as **study-level aggregate data**.

---

## Relationship to ResearchStudy

This Observation is **not a per-patient measurement** but a **study-wide analytic artifact**. The relationship can be summarized as:

* **ResearchStudy** defines the study cohort, protocol, and design.
* **Observation** acts as an **analytic output of the study**, reporting distributions of values across participants.
* The `focus` element ensures the Observation is semantically tied to the ResearchStudy, while the `extension:part-of-study` makes the linkage explicit for query and aggregation pipelines.
* Effectively, this Observation provides **coded frequency distributions** that allow researchers to understand cohort characteristics, disease staging, treatments, and demographics without accessing raw patient-level data.

---

## Architectural Implications

* This pattern supports **privacy-preserving research data sharing**, since counts are aggregated.
* It enables **cross-study comparisons**, as standardized terminologies (LOINC, SNOMED CT, NCIt, US Core) anchor the codes.
* It extends FHIR’s Observation to serve as a **vocabulary distribution index** within research cohorts, bridging patient-level data and study-level metadata.

---

✅ **Summary:**
This Observation is a **study-level frequency table** of clinical, demographic, and treatment vocabularies, linked to a `ResearchStudy`. It operationalizes the connection between patient-level records and aggregated study metadata, supporting discovery, cohort characterization, and secondary research reuse.
 