# OMOP query API tutorial

This notebook shows examples of how to use the cyclops.query API to query EHR databases that follow the OMOP common data model. Each query is limit to 100 rows (for quick results).

We showcase the examples on:

1. [Synthea](https://github.com/synthetichealth/synthea) in OMOP format.

    * First, generate synthea data using their releases. We used [v2.7.0](https://github.com/synthetichealth/synthea/releases/tag/v2.7.0) to generate data .
    * Follow instructions provided in [ETL-Synthea](https://github.com/OHDSI/ETL-Synthea) to load the CSV data into a postgres database, and perform ETL to load the data into OMOP format.

## Imports and instantiate `OMOPQuerier`.

Pass in the `schema_name` which is the name of the postgres schema which houses all the OMOP tables.

In [1]:
"""OMOP query API tutorial."""

import pandas as pd

import cyclops.query.ops as qo
from cyclops.query import OMOPQuerier


querier = OMOPQuerier(
    dbms="postgresql",
    port=5432,
    host="localhost",
    database="synthea_integration_test",
    user="postgres",
    password="pwd",
    schema_name="cdm_synthea10",
)
# List all tables.
querier.list_tables("cdm_synthea10")

  from .autonotebook import tqdm as notebook_tqdm


2023-09-19 12:00:32,882 [1;37mINFO[0m cyclops.query.orm - Database setup, ready to run queries!


['cdm_synthea10.person',
 'cdm_synthea10.observation_period',
 'cdm_synthea10.visit_occurrence',
 'cdm_synthea10.visit_detail',
 'cdm_synthea10.condition_occurrence',
 'cdm_synthea10.drug_exposure',
 'cdm_synthea10.procedure_occurrence',
 'cdm_synthea10.device_exposure',
 'cdm_synthea10.measurement',
 'cdm_synthea10.observation',
 'cdm_synthea10.death',
 'cdm_synthea10.note',
 'cdm_synthea10.note_nlp',
 'cdm_synthea10.specimen',
 'cdm_synthea10.fact_relationship',
 'cdm_synthea10.location',
 'cdm_synthea10.care_site',
 'cdm_synthea10.provider',
 'cdm_synthea10.payer_plan_period',
 'cdm_synthea10.cost',
 'cdm_synthea10.drug_era',
 'cdm_synthea10.dose_era',
 'cdm_synthea10.condition_era',
 'cdm_synthea10.episode',
 'cdm_synthea10.episode_event',
 'cdm_synthea10.cdm_source',
 'cdm_synthea10.concept',
 'cdm_synthea10.vocabulary',
 'cdm_synthea10.domain',
 'cdm_synthea10.concept_class',
 'cdm_synthea10.concept_relationship',
 'cdm_synthea10.relationship',
 'cdm_synthea10.concept_synonym',
 

## Example 1. Get all patient visits in or after 2010.

In [2]:
visits = querier.visit_occurrence()
visits = visits.ops(qo.ConditionAfterDate("visit_start_date", "2010-01-01"))
visits = visits.run(limit=100)
print(f"{len(visits)} rows extracted!")
pd.to_datetime(visits["visit_start_date"]).dt.year.value_counts().sort_index()

2023-09-19 12:00:33,614 [1;37mINFO[0m cyclops.query.orm - Query returned successfully!


2023-09-19 12:00:33,615 [1;37mINFO[0m cyclops.utils.profile - Finished executing function run_query in 0.101939 s


100 rows extracted!


2010     2
2011     1
2012     3
2013     7
2014    13
2015    15
2016    13
2017     7
2018     8
2019     7
2020     9
2021     9
2022     3
2023     3
Name: visit_start_date, dtype: int64

## Example 2. Get measurements for all visits in or after 2020.

In [3]:
visits = querier.visit_occurrence()
visits = visits.ops(qo.ConditionAfterDate("visit_start_date", "2020-01-01"))
measurements = querier.measurement()
visits_measurements = visits.join(
    join_table=measurements,
    on="visit_occurrence_id",
).run(limit=100)
print(f"{len(visits_measurements)} rows extracted!")

2023-09-19 12:00:33,746 [1;37mINFO[0m cyclops.query.orm - Query returned successfully!


2023-09-19 12:00:33,747 [1;37mINFO[0m cyclops.utils.profile - Finished executing function run_query in 0.062469 s


100 rows extracted!


2. [MIMIC-III v1.4](https://physionet.org/content/mimiciii/1.4/) in OMOP format.

* First, setup the MIMIC-III database according to the instructions in [mimic-code](https://github.com/MIT-LCP/mimic-code/tree/main/mimic-iii/buildmimic/postgres).
* Perform the ETL in the [mimic-omop](https://github.com/MIT-LCP/mimic-omop) repo.
* The database is assumed to be hosted using postgres. Update the config parameters such as username and password, passed to `MIMICIIIQuerier` accordingly.

## Imports and instantiate `OMOPQuerier`.

Pass in the `schema_name` which is the name of the postgres schema which houses all the OMOP tables.

In [4]:
querier = OMOPQuerier(
    dbms="postgresql",
    port=5432,
    host="localhost",
    database="mimiciii",
    user="postgres",
    password="pwd",
    schema_name="omop",
)
# List all schemas.
querier.list_schemas()

2023-09-19 12:00:38,853 [1;37mINFO[0m cyclops.query.orm - Database setup, ready to run queries!


['information_schema', 'mimiciii', 'omop', 'public']

## Example 1. Get all patient visits that ended in a mortality outcome in or after 2010.

In [5]:
visits = querier.visit_occurrence()
visits = visits.ops(qo.ConditionAfterDate("visit_start_date", "2010-01-01"))
visits_concept_mapped = querier.map_concept_ids_to_name(
    visits,
    [
        "discharge_to_concept_id",
        "admitting_concept_id",
    ],
)
visits_concept_mapped_died = visits_concept_mapped.ops(
    qo.ConditionSubstring("discharge_to_concept_name", "died"),
).run()
print(f"{len(visits_concept_mapped_died)} rows extracted!")

2023-09-19 12:00:44,563 [1;37mINFO[0m cyclops.query.orm - Query returned successfully!


2023-09-19 12:00:44,564 [1;37mINFO[0m cyclops.utils.profile - Finished executing function run_query in 1.444780 s


5815 rows extracted!


## Example 2. Get all measurements for female patient visits with `sepsis` diagnoses, that ended in a mortality outcome.

In [6]:
persons = querier.person()
persons = persons.ops(qo.ConditionSubstring("gender_concept_name", "FEMALE"))
visits = querier.visit_occurrence()
person_visits = persons.join(visits, on="person_id")
conditions = querier.omop.condition_occurrence()
person_visits_conditions = person_visits.join(
    conditions,
    on="visit_occurrence_id",
    isouter=True,
)
measurement = querier.measurement()
person_visits_conditions_measurements = person_visits_conditions.join(
    measurement,
    on="visit_occurrence_id",
    isouter=True,
)
person_visits_conditions_measurements = querier.map_concept_ids_to_name(
    person_visits_conditions_measurements,
    [
        "discharge_to_concept_id",
        "admitting_concept_id",
        "condition_concept_id",
    ],
)
ops = qo.Sequential(
    qo.ConditionSubstring("discharge_to_concept_name", "died"),
    qo.ConditionSubstring("condition_concept_name", "sepsis"),
)
cohort = person_visits_conditions_measurements.ops(ops).run(limit=100)
print(f"{len(cohort)} rows extracted!")

2023-09-19 12:01:06,304 [1;37mINFO[0m cyclops.query.orm - Query returned successfully!


2023-09-19 12:01:06,305 [1;37mINFO[0m cyclops.utils.profile - Finished executing function run_query in 21.518226 s


100 rows extracted!


In [7]:
cohort["measurement_concept_name"].value_counts()

No matching concept    95
Body temperature        5
Name: measurement_concept_name, dtype: int64