# MIMIC-IV: Summary statistics

This notebook shows how summary statistics can be computed for a patient cohort using the `tableone` package. Usage instructions for tableone are at: https://pypi.org/project/tableone/

## Load libraries and connect to the database

In [0]:
# Import libraries
import numpy as np
import os
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import matplotlib.path as path

# Make pandas dataframes prettier
from IPython.display import display, HTML

# Access data using Google BigQuery.
from google.colab import auth
from google.cloud import bigquery

In [0]:
# authenticate
auth.authenticate_user()

In [0]:
# Set up environment variables
project_id='tdothealthhack-team'
os.environ["GOOGLE_CLOUD_PROJECT"]=project_id

In [0]:
# Helper function to read data from BigQuery into a DataFrame.
def run_query(query):
    return pd.io.gbq.read_gbq(query, project_id=project_id, dialect="standard")

## Install and load the `tableone` package

The tableone package can be used to create summary statistics for a patient cohort. Unlike the previous packages it isn't installed by default in Colab, so we'll need install it first.

In [0]:
!pip install tableone

In [0]:
# Import the tableone class
from tableone import TableOne

## Load the patient cohort

Let's load some basic demographic details from the `patients` and `admissions` tables

In [0]:
# Link the patient and apachepatientresult tables on patientunitstayid
# using an inner join.
query = """
SELECT p.gender, p.anchor_age, a.admission_type, 
    a.insurance, a.ethnicity, hospital_expire_flag
FROM `physionet-data.mimic_core.patients` p
INNER JOIN `physionet-data.mimic_core.admissions` a
ON p.subject_id = a.subject_id
"""

cohort = run_query(query)

In [0]:
cohort.head()

## Calculate summary statistics

In [0]:
columns = ['gender', 'anchor_age', 'admission_type', 'ethnicity', 'insurance']

categorical = ['gender', 'admission_type', 'ethnicity', 'insurance']

In [0]:
TableOne(cohort, columns=columns, categorical=categorical,
         groupby='hospital_expire_flag',
         label_suffix=True, limit=5)