# patient

The patinet table is a core part of the eICU-CRD and contains all information related to tracking patient unit stays. The table also contains patient demographics and hospital level information.

In [17]:
! pip3 install scikit-survival

Collecting scikit-survival
  Using cached scikit-survival-0.14.0.tar.gz (2.3 MB)
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Collecting osqp!=0.6.0,!=0.6.1
  Using cached osqp-0.6.2.post0-cp38-cp38-win_amd64.whl (162 kB)
Collecting cvxopt
  Using cached cvxopt-1.2.6-cp38-cp38-win_amd64.whl (9.5 MB)
Collecting cvxpy>=1.0
  Using cached cvxpy-1.1.10-cp38-cp38-win_amd64.whl (824 kB)
Collecting qdldl
  Using cached qdldl-0.1.5.post0-cp38-cp38-win_amd64.whl (74 kB)
Collecting ecos>=2
  Using cached ecos-2.0.7.post1.tar.gz (126 kB)
Collecting scs>=1.1.6
  Using cached scs-2.1.2.tar.gz (3.5 MB)
Building wheels for collected packages: scikit-survival, ecos, scs
  Building wheel for scikit-survival (PEP 517): started
  Building whe

  ERROR: Command errored out with exit status 1:
   command: 'C:\Users\jimmy\anaconda3\python.exe' 'C:\Users\jimmy\anaconda3\lib\site-packages\pip\_vendor\pep517\_in_process.py' build_wheel 'C:\Users\jimmy\AppData\Local\Temp\tmpzyauy16p'
       cwd: C:\Users\jimmy\AppData\Local\Temp\pip-install-rowihxxv\scikit-survival
  Complete output (460 lines):
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build\lib.win-amd64-3.8
  creating build\lib.win-amd64-3.8\sksurv
  copying sksurv\base.py -> build\lib.win-amd64-3.8\sksurv
  copying sksurv\column.py -> build\lib.win-amd64-3.8\sksurv
  copying sksurv\compare.py -> build\lib.win-amd64-3.8\sksurv
  copying sksurv\exceptions.py -> build\lib.win-amd64-3.8\sksurv
  copying sksurv\functions.py -> build\lib.win-amd64-3.8\sksurv
  copying sksurv\metrics.py -> build\lib.win-amd64-3.8\sksurv
  copying sksurv\nonparametric.py -> build\lib.win-amd64-3.8\sksurv
  copying sksurv\preprocessing.py -> build\lib.win-amd64

In [8]:
# Import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import psycopg2
import getpass
import pdvega

# for configuring connection 
from configobj import ConfigObj
import os

%matplotlib inline

In [9]:
# Create a database connection using settings from config file
config='../db/config.ini'

# connection info
conn_info = dict()
if os.path.isfile(config):
    config = ConfigObj(config)
    conn_info["sqluser"] = config['username']
    conn_info["sqlpass"] = config['password']
    conn_info["sqlhost"] = config['host']
    conn_info["sqlport"] = config['port']
    conn_info["dbname"] = config['dbname']
    conn_info["schema_name"] = config['schema_name']
else:
    conn_info["sqluser"] = 'postgres'
    conn_info["sqlpass"] = ''
    conn_info["sqlhost"] = 'localhost'
    conn_info["sqlport"] = 5432
    conn_info["dbname"] = 'eicu'
    conn_info["schema_name"] = 'public,eicu_crd'
    
# Connect to the eICU database
print('Database: {}'.format(conn_info['dbname']))
print('Username: {}'.format(conn_info["sqluser"]))
if conn_info["sqlpass"] == '':
    # try connecting without password, i.e. peer or OS authentication
    try:
        if (conn_info["sqlhost"] == 'localhost') & (conn_info["sqlport"]=='5432'):
            con = psycopg2.connect(dbname=conn_info["dbname"],
                                   user=conn_info["sqluser"])            
        else:
            con = psycopg2.connect(dbname=conn_info["dbname"],
                                   host=conn_info["sqlhost"],
                                   port=conn_info["sqlport"],
                                   user=conn_info["sqluser"])
    except:
        conn_info["sqlpass"] = getpass.getpass('Password: ')

        con = psycopg2.connect(dbname=conn_info["dbname"],
                               host=conn_info["sqlhost"],
                               port=conn_info["sqlport"],
                               user=conn_info["sqluser"],
                               password=conn_info["sqlpass"])
query_schema = 'set search_path to ' + conn_info['schema_name'] + ';'

Database: eicu
Username: postgres


Password:  ·······


## uniquePid

The `uniquePid` column identifies a single patient across multiple stays. Let's look at a single `uniquepid`.

In [10]:
uniquepid = '002-33870'
query = query_schema + """
select *
from patient
where uniquepid = '{}'
""".format(uniquepid)

df = pd.read_sql_query(query, con)
df.head()

DatabaseError: Execution failed on sql 'set search_path to public,eicu_crd;
select *
from patient
where uniquepid = '002-33870'
': relation "patient" does not exist
LINE 3: from patient
             ^


Here we see two unit stays for a single patient. Note also that both unit stays have the same `patienthealthsystemstayid` - this indicates that they occurred within the same hospitalization.

We can see the `unitstaytype` was 'admit' for one stay, and 'stepdown/other' for another. Other columns can give us more information.

In [7]:
df[['patientunitstayid', 'wardid', 'unittype', 'unitstaytype', 'hospitaladmitoffset', 'unitdischargeoffset']]

Unnamed: 0,patientunitstayid,wardid,unittype,unitstaytype,hospitaladmitoffset,unitdischargeoffset
0,141178,83,Med-Surg ICU,admit,-14,8
1,141179,83,Med-Surg ICU,stepdown/other,-22,2042


Note that it's not explicitly obvious which stay occurred first. Earlier stays will be closer to hospital admission, and therefore have a *higher* hospitaladmitoffset. Above, the stay with a `hospitaladmitoffset` of -14 was first (occurring 14 minutes after hospital admission), followed by the next stay with a `hospitaladmitoffset` of 22 (which occurred 22 minutes after hospital admission). Practically, we wouldn't consider the first admission a "real" ICU stay, and it's likely an idiosyncrasy of the administration system at this particular hospital. Notice how both rows have the same `wardid`.

## Age

As ages over 89 are required to be deidentified by HIPAA, the `age` column is actually a string field, with ages over 89 replaced with the string value '> 89'.

In [8]:
query = query_schema + """
select age, count(*) as n
from patient
group by age
order by n desc
"""

df = pd.read_sql_query(query, con)
df.head()

Unnamed: 0,age,n
0,> 89,7081
1,67,5078
2,68,4826
3,72,4804
4,71,4764


As is common in eICU-CRD, there are a subset of hospitals who routinely utilize this portion of the medical record (and thus have 90-100% data completion), while there are other hospitals who rarely use this interface and thus have poor data completion (0-10%).