# CMS SynPuf: How to extract drug treatments containing a specific molecule

This notebook queries the [CMS SynPuf dataset](https://console.cloud.google.com/marketplace/product/hhs/synpuf?pli=1), a public synthetic patient data in OMOP. This notebook is intended to be used as an example for how to query the public OMOP dataset, and how to do basic visualizations.

> If you are **previewing** this notebook from Verily Workbench, please create a cloud environment and look for this file in the `repos/terra-axon-examples/omop_examples/` directory. Instructions for creating a cloud environment are available in the workspace description.

## Import python libraries

In [None]:
import pandas as pd
from google.cloud import bigquery

# Enable IPython to display matplotlib graphs.
import matplotlib.pyplot as plt
%matplotlib inline

## Notebook setup

In [None]:
'''
Resolves BQ dataset from reference in workspace.
'''
def get_bq_dataset_from_reference(resource_name):
    BQ_CMD_OUTPUT = !terra resolve --name={resource_name}
    BQ_DATASET = BQ_CMD_OUTPUT[0]
    return BQ_DATASET

## Connect to the BQ database

In [None]:
# The following line resolves the workspace resource named cms_synthetic_patient_data_omop. 
BQ_dataset = get_bq_dataset_from_reference('cms_synthetic_patient_data_omop')
# The above line will fail if you don't have this resource in your workspace.

# If that is the case, you can hard code the BQ_dataset instead by uncommenting the following line. 
# BQ_dataset = 'bigquery-public-data.cms_synthetic_patient_data_omop'

In [None]:
job_query_config = bigquery.QueryJobConfig(default_dataset=BQ_dataset)
client = bigquery.Client(default_query_job_config=job_query_config)

## Example for medications containing Quinapril

The purpose of this example is to calculate the number of participants who were treated with drugs containing
the Quinapril molecule, a Beta Blocker.
As a starting point for this study, we need to find the RxCUI code of the Quinapril molecule with the [RxNav tool](https://mor.nlm.nih.gov/RxNav/) (https://mor.nlm.nih.gov/RxNav/search?searchBy=String&searchTerm=quinapril):
- Quinapril RxCUI: `35208`

For this purpose, it is necessary to perform the following steps:
1. **Converting the RxCUI code of Quinapril molecule to concept ID**:
This step consists of finding the concept ID associated with the `35208` RxCUI code. To do this, we will use the `concept` table of the OMOP vocabulary and the `concept_code` equal to the RxCUI code for the `vocabulary_id` equal to "RxNorm". See the CTE `quinapril_RxNorm_concept_id` in the SQL query below.

2. **Find the drug standard concept IDs**:
Drugs are coded with a standard concept ID (corresponding to a RxNorm code). Therefore, We need to find the concept IDs linked to drugs containing the Quinapril molecule. The CTE `medications_with_quinapril` in the SQL query below consists of extracting all the descendants of the ingredient concept ID from step 1, which are all the drugs containing the Metropol ingredient.

3. **Find all drug concept IDs**:
Because some of the drugs may be coded using non-standard concept IDs, we recommend mapping the standard concept IDs identified in step 2 to obtain a comprehensive set of relevant concept IDs. This mapping is performed using the concept_relationship table of the OMOP vocabulary, where concept_id_2 is the standard concept ID(s) identified in step 1, and the relationship_id is 'Maps to'. See the CTE `all_medications_with_quinapril` in the SQL query below.

4. **Calculate the number of participants who were treated with Quinapril**:
The next step is to extract and count the participants with at least one Quinapril drug exposure. We will use the `drug_exposure` table and filter only the drugs coded with a `drug_concept_id` corresponding to a concept ID of the previously extracted list. See the CTE `nb_of_participants_treated_with_quinapril` in the SQL query below.

5. **Calculate the percentage of participants who were treated with Quinapril**:
Finally, the last step is to calculate the percentage who were treated with Quinapril out of the total number of participants. We will use the number of participants in the `person` table and calculate the percentage. See the CTE `nb_total_of_participants` in the SQL query below.

In [None]:
query = """
    WITH quinapril_RxNorm_concept_id AS (
        SELECT 
            concept_id
        FROM
            `concept`
        WHERE
            concept_code = "35208"
            AND vocabulary_id = "RxNorm"
    ),
    medications_with_quinapril AS (
        SELECT
            ancestor.descendant_concept_id AS concept_id
        FROM
            `concept_ancestor` AS ancestor
        INNER JOIN
            quinapril_RxNorm_concept_id AS quinapril
        ON
            ancestor.ancestor_concept_id = quinapril.concept_id
        INNER JOIN
            `concept` AS concept
        ON
            ancestor.descendant_concept_id = concept.concept_id
        WHERE
            concept.standard_concept = 'S'
    ),
    all_medications_with_quinapril AS (
        SELECT
            DISTINCT concept_id
        FROM (
            SELECT
                *
            FROM
                medications_with_quinapril
        ) UNION ALL (
            SELECT
                concept_id_1 AS concept_id
            FROM
                `concept_relationship`
            WHERE
                relationship_id = 'Maps to' 
                AND concept_id_2 IN (
                    SELECT
                        concept_id
                    FROM
                        medications_with_quinapril
                )
        )
    ),
    nb_of_participants_treated_with_quinapril AS (
        SELECT
            COUNT(DISTINCT person_id) AS nb_of_participants_with_quinapril
        FROM
            `drug_exposure`
        INNER JOIN 
            all_medications_with_quinapril
        ON
            drug_concept_id = concept_id
    ),
    nb_total_of_participants AS (
        SELECT
            COUNT(DISTINCT person_id) AS nb_of_participants
        FROM
            `person`
    )
    SELECT
        100*nb_of_participants_with_quinapril/nb_of_participants AS with_quinapril,
        100-(100*nb_of_participants_with_quinapril/nb_of_participants) AS without_quinapril,
    FROM
        nb_total_of_participants,
        nb_of_participants_treated_with_quinapril
"""

# Execute query
The below code will send a request to BigQuery to execute the query. The results will be stored in a Pandas dataframe.

In [None]:
df = client.query(query).result().to_dataframe()
df

# Plot visualization
The below code uses matplotlib to plot a simple histogram of the results.

In [None]:
ax = df.transpose().plot.pie(y=0, autopct='%.2f%%', title='Percentage of participants who were treated with Quinapril', ylabel='', legend=False)
plt.show()