# CMS SynPuf: Example demographics dashboard

This notebook queries the [CMS SynPuf dataset](https://console.cloud.google.com/marketplace/product/hhs/synpuf?pli=1), a public synthetic patient data in OMOP. This notebook is intended to be used as an example for how to query the public OMOP dataset, and how to do create an example dashboard.

> If you are **previewing** this notebook from Verily Workbench, please create a cloud environment and look for this file in the `~/repos/terra-axon-examples/omop_examples/` directory. Instructions for creating a cloud environment are available in the workspace description.

## Import python libraries

In [None]:
import pandas as pd
from google.cloud import bigquery

import numpy as np 
import seaborn as sns 
import matplotlib.pyplot as plt 
from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets

## Notebook setup

In [None]:
'''
Resolves BQ dataset from reference in workspace.
'''
def get_bq_dataset_from_reference(resource_name):
    BQ_CMD_OUTPUT = !terra resolve --name={resource_name}
    BQ_DATASET = BQ_CMD_OUTPUT[0]
    return BQ_DATASET

## Connect to the BQ database

In [None]:
# The following line resolves the workspace resource named cms_synthetic_patient_data_omop. 
BQ_dataset = get_bq_dataset_from_reference('cms_synthetic_patient_data_omop')
# The above line will fail if you don't have this resource in your workspace.

# If that is the case, you can hard code the BQ_dataset instead by uncommenting the following line. 
# BQ_dataset = 'bigquery-public-data.cms_synthetic_patient_data_omop'

In [None]:
job_query_config = bigquery.QueryJobConfig(default_dataset=BQ_dataset)
client = bigquery.Client(default_query_job_config=job_query_config)

# Execute queries
The below code will send a request to BigQuery to execute the query. The results will be stored in a Pandas dataframe.

In [None]:
race_concept_query = """
SELECT concept_id as race_concept_id, concept_name as race FROM `concept` where domain_id = "Race"
"""
race_concept_df = client.query(race_concept_query).result().to_dataframe()
race_concept_df.head(5)

In [None]:
gender_concept_query = """
SELECT concept_id as gender_concept_id, concept_name as gender FROM `concept` where domain_id = "Gender"
"""
gender_concept_df = client.query(gender_concept_query).result().to_dataframe()
gender_concept_df.head(5)

In [None]:
people_query = """
SELECT 
  person_id,
  race_concept_id,
  gender_concept_id,
  year_of_birth,
  month_of_birth
FROM 
  `person`
WHERE RAND() < 10000/2326856
"""

cms_syn_df = client.query(people_query).result().to_dataframe()
cms_syn_df.head(5)

In [None]:
merged_df = pd.merge(cms_syn_df, race_concept_df, on="race_concept_id")
merged_df = pd.merge(merged_df, gender_concept_df, on="gender_concept_id")
merged_df.head(5)

# Build example interactive dashboard

In [None]:
def plot_histogram(bins = 10, hue = 'race', palette = 'Blues', x_range_1 = (1900,2000)): 
    plt.figure(dpi = 120)
    sns.histplot(data = merged_df, 
                        x = 'year_of_birth',
                        palette=palette, 
                        bins = bins, 
                        hue = hue,
                       )
    plt.xlim(x_range_1)

In [None]:
_ = interact(
    plot_histogram,
    palette = widgets.Dropdown(
        options = ['pastel','husl','Set2','flare','crest','magma','icefire']
    ),
    hue = widgets.ToggleButtons(
        options = ['race','gender'],
        disabled = False,
        button_style = 'success'
    ),
    bins = widgets.IntSlider(
        value = 10,
        min = 3,
        max = 15,
        step = 1
    ),
    x_range_1 = widgets.IntRangeSlider(
        value = [1900,2000], 
        min = 1900,
        max = 2000,
    ),
)

## Provenance

Generate information about this notebook environment and the packages installed.

In [None]:
!date

Conda and pip installed packages:


In [None]:
!conda env export

JupyterLab extensions:

In [None]:
!jupyter labextension list

Number of cores:

In [None]:
!grep ^processor /proc/cpuinfo | wc -l

Memory:

In [None]:
!grep "^MemTotal:" /proc/meminfo

---

Copyright 2023 Verily Life Sciences LLC

Use of this source code is governed by a BSD-style  
license that can be found in the LICENSE file or at  
https://developers.google.com/open-source/licenses/bsd
