# OMOP query API tutorial

This notebook shows examples of how to use the cyclops.query API to query EHR databases that follow the OMOP common data model. Each query is limit to 100 rows (for quick results).

We showcase the examples on:

1. [Synthea](https://github.com/synthetichealth/synthea) in OMOP format.

    * First, generate synthea data using their releases. We used [v2.7.0](https://github.com/synthetichealth/synthea/releases/tag/v2.7.0) to generate data .
    * Follow instructions provided in [ETL-Synthea](https://github.com/OHDSI/ETL-Synthea) to load the CSV data into a postgres database, and perform ETL to load the data into OMOP format.

## Imports and instantiate `OMOPQuerier`.

Pass in the `schema_name` which is the name of the postgres schema which houses all the OMOP tables.

In [1]:
import pandas as pd

import cyclops.query.ops as qo
from cyclops.query import OMOPQuerier

querier = OMOPQuerier(
    dbms="postgresql",
    port=5432,
    host="localhost",
    database="synthea_integration_test",
    user="postgres",
    password="pwd",
    schema_name="cdm_synthea10",
)
# List all tables.
querier.list_tables()

2023-07-11 09:37:33,117 [1;37mINFO[0m cyclops.query.orm - Database setup, ready to run queries!


['cdm_synthea10.source_to_standard_vocab_map',
 'cdm_synthea10.source_to_source_vocab_map',
 'cdm_synthea10.all_visits',
 'cdm_synthea10.assign_all_visit_ids',
 'cdm_synthea10.final_visit_ids',
 'cdm_synthea10.device_exposure',
 'cdm_synthea10.measurement',
 'cdm_synthea10.person',
 'cdm_synthea10.observation_period',
 'cdm_synthea10.visit_occurrence',
 'cdm_synthea10.visit_detail',
 'cdm_synthea10.condition_occurrence',
 'cdm_synthea10.drug_exposure',
 'cdm_synthea10.procedure_occurrence',
 'cdm_synthea10.observation',
 'cdm_synthea10.death',
 'cdm_synthea10.note',
 'cdm_synthea10.note_nlp',
 'cdm_synthea10.specimen',
 'cdm_synthea10.fact_relationship',
 'cdm_synthea10.location',
 'cdm_synthea10.care_site',
 'cdm_synthea10.provider',
 'cdm_synthea10.payer_plan_period',
 'cdm_synthea10.cost',
 'cdm_synthea10.drug_era',
 'cdm_synthea10.dose_era',
 'cdm_synthea10.condition_era',
 'cdm_synthea10.episode',
 'cdm_synthea10.episode_event',
 'metadata.name',
 'cdm_synthea10.cdm_source',
 'cdm

## Example 1. Get all patient visits in or after 2010.

In [2]:
ops = qo.Sequential([qo.ConditionAfterDate("visit_start_date", "2010-01-01")])
visits = querier.visit_occurrence(ops=ops).run(limit=100)
print(f"{len(visits)} rows extracted!")
pd.to_datetime(visits["visit_start_date"]).dt.year.value_counts().sort_index()

2023-07-11 09:37:34,374 [1;37mINFO[0m cyclops.query.orm - Query returned successfully!


2023-07-11 09:37:34,376 [1;37mINFO[0m cyclops.utils.profile - Finished executing function run_query in 0.047688 s


100 rows extracted!


2010     5
2011     1
2012     6
2013    12
2014    12
2015    12
2016    10
2017     6
2018     8
2019     5
2020     5
2021    13
2022     5
Name: visit_start_date, dtype: int64

## Example 2. Get measurements for all visits in or after 2020.

In [3]:
ops = qo.Sequential([qo.ConditionAfterDate("visit_start_date", "2020-01-01")])
visits = querier.visit_occurrence(ops=ops)
measurements = querier.measurement(
    join=qo.JoinArgs(join_table=visits.query, on="visit_occurrence_id")
).run(limit=100)
print(f"{len(measurements)} rows extracted!")

2023-07-11 09:37:34,551 [1;37mINFO[0m cyclops.query.orm - Query returned successfully!


2023-07-11 09:37:34,552 [1;37mINFO[0m cyclops.utils.profile - Finished executing function run_query in 0.044035 s


100 rows extracted!


2. [MIMIC-III v1.4](https://physionet.org/content/mimiciii/1.4/) in OMOP format.

* First, setup the MIMIC-III database according to the instructions in [mimic-code](https://github.com/MIT-LCP/mimic-code/tree/main/mimic-iii/buildmimic/postgres).
* Perform the ETL in the [mimic-omop](https://github.com/MIT-LCP/mimic-omop) repo.
* The database is assumed to be hosted using postgres. Update the config parameters such as username and password, passed to `MIMICIIIQuerier` accordingly.

## Imports and instantiate `OMOPQuerier`.

Pass in the `schema_name` which is the name of the postgres schema which houses all the OMOP tables.

In [4]:
querier = OMOPQuerier(
    dbms="postgresql",
    port=5432,
    host="localhost",
    database="mimiciii",
    user="postgres",
    password="pwd",
    schema_name="omop",
)
# List all schemas.
querier.list_schemas()

2023-07-11 09:37:40,216 [1;37mINFO[0m cyclops.query.orm - Database setup, ready to run queries!


['information_schema', 'mimiciii', 'omop', 'public']

## Example 1. Get all patient visits that ended in a mortality outcome in or after 2010.

In [5]:
ops = qo.Sequential(
    [
        qo.ConditionAfterDate("visit_start_date", "2010-01-01"),
    ]
)
visits = querier.visit_occurrence(ops=ops)
visits_concept_mapped = querier.map_concept_ids_to_name(
    visits.query,
    [
        "discharge_to_concept_id",
        "admitting_concept_id",
    ],
)
visits_ops = qo.Sequential(
    [
        qo.ConditionSubstring("discharge_to_concept_name", "died"),
    ]
)
visits = querier.get_interface(visits_concept_mapped, ops=visits_ops).run(limit=100)
print(f"{len(visits)} rows extracted!")

2023-07-11 09:37:45,189 [1;37mINFO[0m cyclops.query.orm - Query returned successfully!


2023-07-11 09:37:45,191 [1;37mINFO[0m cyclops.utils.profile - Finished executing function run_query in 0.081492 s


100 rows extracted!


## Example 2. Get all measurements for female patient visits with `sepsis` diagnoses, that ended in a mortality outcome.

In [6]:
persons_ops = qo.Sequential(
    [
        qo.ConditionSubstring("gender_concept_name", "FEMALE"),
    ]
)
cohort = querier.person(ops=persons_ops)
cohort = querier.visit_occurrence(join=qo.JoinArgs(cohort.query, on="person_id"))
cohort = querier.omop.condition_occurrence(
    join=qo.JoinArgs(cohort.query, on="visit_occurrence_id", isouter=True)
)
cohort = querier.measurement(
    join=qo.JoinArgs(cohort.query, on="visit_occurrence_id", isouter=True)
)
cohort_query = querier.map_concept_ids_to_name(
    cohort.query,
    [
        "discharge_to_concept_id",
        "admitting_concept_id",
        "condition_concept_id",
    ],
)
sepsis_died_filter = qo.Sequential(
    [
        qo.ConditionSubstring("discharge_to_concept_name", "died"),
        qo.ConditionSubstring("condition_concept_name", "sepsis"),
    ]
)
cohort = querier.get_interface(cohort_query, ops=sepsis_died_filter).run(limit=100)
print(f"{len(cohort)} rows extracted!")

2023-07-11 09:39:18,611 [1;37mINFO[0m cyclops.query.orm - Query returned successfully!


2023-07-11 09:39:18,612 [1;37mINFO[0m cyclops.utils.profile - Finished executing function run_query in 92.768036 s


100 rows extracted!


In [7]:
cohort["measurement_concept_name"].value_counts()

Body weight                 43
No matching concept         38
Glasgow coma score motor    19
Name: measurement_concept_name, dtype: int64