# Execute VWB Data Explorer queries

This notebook is intended to be a guide on how to view and further analyze cohorts built from [VWB Data Explorer](https://tanagra-dev.api.verily.com/#/underlays/cms_synpuf).

This notebook guides you through:
  1. Run a cohort query in BigQuery and save the results to a dataframe.
  2. Display simple visualizations on the resulting data.
  3. View exported VWB Data Explorer files in a dataframe.
  
> If you are **previewing** this notebook from Verily Workbench, please create a cloud environment and look for this file in the `repos/terra-axon-examples/omop_examples/` directory. Instructions for creating a cloud environment are available in the workspace description.

In [None]:
import pandas as pd
import os
from google.cloud import bigquery

# Enable IPython to display matplotlib graphs.
import matplotlib.pyplot as plt
%matplotlib inline

## Copy and paste query from VWB Data Explorer below

In [None]:
query = """

"""

In [None]:
job_query_config = bigquery.QueryJobConfig()
client = bigquery.Client()

In [None]:
df = client.query(query).result().to_dataframe()
df

## Display visualizations

Now that the results are stored in a dataframe, we can use libraries like pandas and matplotlib to display visualizations.

Provided are histogram examples.

In [None]:
df.t_display_gender.value_counts().plot(kind='bar')

In [None]:
df.t_display_ethnicity.value_counts().plot(kind='bar')

## View exported VWB Data Explorer files
1. Replace the BUCKET_NAME variable with the name of the bucket contaning the exports.
4. Run the code.

In [None]:
BUCKET_NAME = "??"

In [None]:
def find_first_csv_string(strings_list):
    for string in strings_list:
        if string.endswith(".csv"):
            return string
    return None

terra_resource_output = !terra resource resolve --name={BUCKET_NAME}
gcs_bucket = terra_resource_output[0]
gcs_path = f"{gcs_bucket}/storage.googleapis.com/verily-tanagra-dev-export-bucket/"
filelist = !gsutil ls {gcs_path}
csv_full_path = find_first_csv_string(filelist)
!gsutil cp {csv_full_path} .
csv_filename = os.path.basename(csv_full_path)
csv_filename

In [None]:
csv_df = pd.read_csv(csv_filename)
csv_df