# OMOP query API tutorial

This notebook shows examples of how to use the cyclops.query API to query EHR databases that follow the OMOP common data model. We showcase the examples on:

1. [Synthea](https://github.com/synthetichealth/synthea) in OMOP format.

    * First, generate synthea data using their releases. We used [v2.7.0](https://github.com/synthetichealth/synthea/releases/tag/v2.7.0) to generate data .
    * Follow instructions provided in [ETL-Synthea](https://github.com/OHDSI/ETL-Synthea) to load the CSV data into a postgres database, and perform ETL to load the data into OMOP format.

## Imports and instantiate `OMOPQuerier`.

Pass in the `schema_name` which is the name of the postgres schema which houses all the OMOP tables.

In [1]:
import pandas as pd

import cyclops.query.ops as qo
from cyclops.query import OMOPQuerier

querier = OMOPQuerier(
    dbms="postgresql",
    port=5432,
    host="localhost",
    database="synthea_integration_test",
    user="postgres",
    password="pwd",
    schema_name="cdm_synthea10",
)
# List all tables.
querier.list_tables()

2023-03-21 11:11:48,995 [1;37mINFO[0m cyclops.query.orm - Database setup, ready to run queries!


['cdm_synthea10.source_to_standard_vocab_map',
 'cdm_synthea10.source_to_source_vocab_map',
 'cdm_synthea10.all_visits',
 'cdm_synthea10.assign_all_visit_ids',
 'cdm_synthea10.final_visit_ids',
 'cdm_synthea10.device_exposure',
 'cdm_synthea10.measurement',
 'cdm_synthea10.person',
 'cdm_synthea10.observation_period',
 'cdm_synthea10.visit_occurrence',
 'cdm_synthea10.visit_detail',
 'cdm_synthea10.condition_occurrence',
 'cdm_synthea10.drug_exposure',
 'cdm_synthea10.procedure_occurrence',
 'cdm_synthea10.observation',
 'cdm_synthea10.death',
 'cdm_synthea10.note',
 'cdm_synthea10.note_nlp',
 'cdm_synthea10.specimen',
 'cdm_synthea10.fact_relationship',
 'cdm_synthea10.location',
 'cdm_synthea10.care_site',
 'cdm_synthea10.provider',
 'cdm_synthea10.payer_plan_period',
 'cdm_synthea10.cost',
 'cdm_synthea10.drug_era',
 'cdm_synthea10.dose_era',
 'cdm_synthea10.condition_era',
 'cdm_synthea10.episode',
 'cdm_synthea10.episode_event',
 'metadata.name',
 'cdm_synthea10.cdm_source',
 'cdm

## Example 1. Get all patient visits in or after 2010.

In [2]:
ops = qo.Sequential([qo.ConditionAfterDate("visit_start_date", "2010-01-01")])
visits = querier.visit_occurrence(ops=ops).run()
print(f"{len(visits)} rows extracted!")
pd.to_datetime(visits["visit_start_date"]).dt.year.value_counts().sort_index()

2023-03-21 11:11:50,202 [1;37mINFO[0m cyclops.query.orm - Query returned successfully!
2023-03-21 11:11:50,203 [1;37mINFO[0m cyclops.utils.profile - Finished executing function run_query in 0.105052 s


3119 rows extracted!


2010     52
2011     55
2012     97
2013    246
2014    385
2015    258
2016    279
2017    261
2018    279
2019    253
2020    295
2021    375
2022    284
Name: visit_start_date, dtype: int64

## Example 2. Get measurements for all visits in or after 2020.

In [3]:
ops = qo.Sequential([qo.ConditionAfterDate("visit_start_date", "2020-01-01")])
visits = querier.visit_occurrence(ops=ops)
measurements = querier.measurement(
    join=qo.JoinArgs(join_table=visits.query, on="visit_occurrence_id")
).run()
print(f"{len(measurements)} rows extracted!")

2023-03-21 11:11:51,741 [1;37mINFO[0m cyclops.query.orm - Query returned successfully!
2023-03-21 11:11:51,742 [1;37mINFO[0m cyclops.utils.profile - Finished executing function run_query in 1.427614 s


7555 rows extracted!


2. [MIMIC-III v1.4](https://physionet.org/content/mimiciii/1.4/) in OMOP format.

* First, setup the MIMIC-III database according to the instructions in [mimic-code](https://github.com/MIT-LCP/mimic-code/tree/main/mimic-iii/buildmimic/postgres).
* Perform the ETL in the [mimic-omop](https://github.com/MIT-LCP/mimic-omop) repo.
* The database is assumed to be hosted using postgres. Update the config parameters such as username and password, passed to `MIMICIIIQuerier` accordingly.

## Imports and instantiate `OMOPQuerier`.

Pass in the `schema_name` which is the name of the postgres schema which houses all the OMOP tables.

In [4]:
querier = OMOPQuerier(
    dbms="postgresql",
    port=5432,
    host="localhost",
    database="mimiciii",
    user="postgres",
    password="pwd",
    schema_name="omop",
)
# List all schemas.
querier.list_schemas()

2023-03-21 11:12:02,468 [1;37mINFO[0m cyclops.query.orm - Database setup, ready to run queries!


['information_schema', 'mimiciii', 'omop', 'public']

## Example 1. Get all patient visits that ended in a mortality outcome in or after 2010.

In [5]:
ops = qo.Sequential(
    [
        qo.ConditionAfterDate("visit_start_date", "2010-01-01"),
    ]
)
visits = querier.visit_occurrence(ops=ops)
visits_concept_mapped = querier.map_concept_ids_to_name(
    visits.query,
    [
        "discharge_to_concept_id",
        "admitting_concept_id",
    ],
)
visits_ops = qo.Sequential(
    [
        qo.ConditionSubstring("discharge_to_concept_name", "died"),
    ]
)
visits = querier.get_interface(visits_concept_mapped, ops=visits_ops).run()
print(f"{len(visits)} rows extracted!")

2023-03-21 11:12:09,878 [1;37mINFO[0m cyclops.query.orm - Query returned successfully!
2023-03-21 11:12:09,879 [1;37mINFO[0m cyclops.utils.profile - Finished executing function run_query in 1.185365 s


5815 rows extracted!


## Example 2. Get all measurements for female patient visits with `sepsis` diagnoses, that ended in a mortality outcome (limit to 10000 rows).

In [6]:
persons_ops = qo.Sequential(
    [
        qo.ConditionSubstring("gender_concept_name", "FEMALE"),
    ]
)
cohort = querier.person(ops=persons_ops)
cohort = querier.visit_occurrence(join=qo.JoinArgs(cohort.query, on="person_id"))
cohort = querier.omop.condition_occurrence(
    join=qo.JoinArgs(cohort.query, on="visit_occurrence_id", isouter=True)
)
cohort = querier.measurement(
    join=qo.JoinArgs(cohort.query, on="visit_occurrence_id", isouter=True)
)
cohort_query = querier.map_concept_ids_to_name(
    cohort.query,
    [
        "discharge_to_concept_id",
        "admitting_concept_id",
        "condition_concept_id",
    ],
)
sepsis_died_filter = qo.Sequential(
    [
        qo.ConditionSubstring("discharge_to_concept_name", "died"),
        qo.ConditionSubstring("condition_concept_name", "sepsis"),
    ]
)
cohort = querier.get_interface(cohort_query, ops=sepsis_died_filter).run(limit=10000)
print(f"{len(cohort)} rows extracted!")

2023-03-21 11:13:36,869 [1;37mINFO[0m cyclops.query.orm - Query returned successfully!
2023-03-21 11:13:36,870 [1;37mINFO[0m cyclops.utils.profile - Finished executing function run_query in 86.392976 s


10000 rows extracted!


In [7]:
cohort["measurement_concept_name"].value_counts()

No matching concept                                                     7281
Respiratory rate                                                         312
Heart rate                                                               207
Systolic blood pressure                                                  199
Oxygen saturation in Arterial blood                                      198
                                                                        ... 
Color of Urine                                                             2
Urobilinogen [Presence] in Urine by Test strip                             2
Leukocytes [#/area] in Urine sediment by Microscopy high power field       2
Bacteria identified in Sputum by Culture                                   1
Intracranial pressure (ICP)                                                1
Name: measurement_concept_name, Length: 130, dtype: int64