<a href="https://colab.research.google.com/github/MIT-LCP/2019_hack_aotearoa_eicu/blob/master/01_access_the_data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# eICU Collaborative Research Database

# Notebook 1: Accessing the data

The aim of this notebook is to get set up with access to a demo version of the [eICU Collaborative Research Database](http://eicu-crd.mit.edu/). The demo is a subset of the full database, limited to 100 patients.

## Prerequisites

- If you do not have a Gmail account, please create one at http://www.gmail.com. 
- If you have not yet signed the data use agreement (DUA) sent by the organizers, please do so now to get access to the dataset.

## Setup

To run the queries in this notebook, you will need to create a copy by clicking *File" > "Save a copy in Drive..."* from the menu. Before running a cell in the notebook, check for the green "CONNECTED" check mark in top right corner.

Next we'll start playing with the data. First, you need to run some initialization code. You can run the following cell by clicking on the triangle button when you hover over the [ ] space on the top-left corner of the code cell below.

In [0]:
# Import libraries
import numpy as np
import os
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import matplotlib.path as path

# Make pandas dataframes prettier
from IPython.display import display, HTML

# Access data using Google BigQuery.
from google.colab import auth
from google.cloud import bigquery

Before running any queries, you need to first authenticate yourself by running the following cell. If you are running it for the first time, it will ask you to follow a link to log in using your Gmail account, and accept the data access requests to your profile. Once this is done, it will generate a string of verification code, which you should paste back to the cell below and press enter.


In [0]:
auth.authenticate_user()

## Querying the dataset

Now we are ready to load the data from the cloud server. The data-hosting project `physionet-data` allows you read-only access to the eICU Collaborative Research Database demo dataset. Let's see which datasets are available in this project. 

In [0]:
project_id='physionet-data'
os.environ["GOOGLE_CLOUD_PROJECT"]=project_id

In [0]:
# create a connection to the database
client = bigquery.Client(project=project_id)

# load the dataset list
datasets = client.list_datasets()

# iterate the datasets list
for dataset in datasets:
    did = dataset.dataset_id
    # print the dataset name
    print('Dataset "{}" has the following tables: '.format(did))
    # iterate the tables on the dataset
    for table in client.list_tables(client.dataset(did)):
        # print the table name
        print('- {}'.format(table.table_id))