## What relationships exist between type of admission, admission diagnosis, length of stay, age, gender, readmittance within 30 days, amount of bedside attention (e.g. patient is bathed), and occurence of sepsis?

EDA: Pairplot with every pairing of the eight attributes mentioned above. The data from ADMISSIONS, PATIENTS, and CHARTEVENTS would have to be merged and derived attributes added. The data would be primarily segmented by ADMISSIONS record. Visualization: Seaborn.

In [1]:
# Imports for DFs & connecting to Postgres
import pandas as pd
import psycopg2

- Xtype of admission -> admissions.admission_type
- Xadmission diagnosis -> admissions.diagnosis
- Xlength of stay -> admissions.dischtime minus admissions.admittime
- 0readmittance within 30 days -> admissions.admittime? (will have to calculate for each admission whether another admission took place within 30 days)
- ?age -> patients.dob (will have to calculate age at time of admission)
- Xgender -> patients.gender
- !amount of bedside attention (e.g. patient is bathed) -> chartevents.itemid.label
- Xoccurence of sepsis -> multiple

### Pull in Admissions data

In [2]:
# Connect to Postgres & get all records for ADMISSIONS
try:
    con = psycopg2.connect("host='localhost' dbname='mimic' user='postgres' password='postgres'")
    cur = con.cursor()
    cur.execute ("""SELECT * FROM mimiciii.admissions;""")
    con.commit()
    print('OK')
except Exception as e:
    print(e)  

OK


In [3]:
# Store ADMISSIONS result in var
admissions_all = cur.fetchall()

In [4]:
# Convert ADMISSIONS result to DF
admissions_df = pd.DataFrame(admissions_all, columns = ['row_id','subject_id', 'hadm_id', 'admittime', 'dischtime', 'deathtime',
 'admission_type', 'admission_location', 'discharge_location',
 'insurance', 'language', 'religion', 'marital_status', 'ethnicity',
 'edregtime', 'edouttime', 'diagnosis', 'hospital_expire_flag',
 'has_chartevents_data'])

In [5]:
# Create shortened DF of relevant cols
admissions_short = admissions_df[['subject_id', 'hadm_id', 'admission_type', 'diagnosis', 'dischtime', 'admittime']]

### Pull in Patients data

In [6]:
# Connect to Postgres & get all records for PATIENTS
try:
    con = psycopg2.connect("host='localhost' dbname='mimic' user='postgres' password='postgres'")
    cur = con.cursor()
    cur.execute ("""SELECT * FROM mimiciii.patients;""")
    con.commit()
    print('OK')
except Exception as e:
    print(e)

OK


In [7]:
# Store PATIENTS result in var
patients_all = cur.fetchall()

In [8]:
# Convert PATIENTS result to DF
patients_df = pd.DataFrame(patients_all, columns = ['row_id', 'subject_id', 'gender', 'dob', 'dod', 'dod_hosp', 'dod_ssn', 'expire_flag'])

In [9]:
# Create shortened DF of relevant cols
patients_short = patients_df[['subject_id', 'gender', 'dob']]

### Merge Patient details onto Admissions

In [10]:
# Merge shortened Patients DF onto shortened Admissions DF using 'subject_id'
adm_pat_merge = admissions_short.merge(patients_short, how='left', on='subject_id')

In [11]:
# Create new col to indicate length of stay, type is Timedelta
adm_pat_merge['adm_los'] = adm_pat_merge['dischtime']-adm_pat_merge['admittime']

In [12]:
# Add new col that converts timedelta to seconds & then to hours
adm_pat_merge['adm_los_hrs'] = adm_pat_merge['adm_los'].apply(lambda x: ((x.seconds)+(x.days*86400))/3600)

In [13]:
adm_pat_merge.head()

Unnamed: 0,subject_id,hadm_id,admission_type,diagnosis,dischtime,admittime,gender,dob,adm_los,adm_los_hrs
0,22,165315,EMERGENCY,BENZODIAZEPINE OVERDOSE,2196-04-10 15:54:00,2196-04-09 12:26:00,F,2131-05-07,1 days 03:28:00,27.466667
1,23,152223,ELECTIVE,CORONARY ARTERY DISEASE\CORONARY ARTERY BYPASS...,2153-09-08 19:10:00,2153-09-03 07:15:00,M,2082-07-17,5 days 11:55:00,131.916667
2,23,124321,EMERGENCY,BRAIN MASS,2157-10-25 14:00:00,2157-10-18 19:34:00,M,2082-07-17,6 days 18:26:00,162.433333
3,24,161859,EMERGENCY,INTERIOR MYOCARDIAL INFARCTION,2139-06-09 12:48:00,2139-06-06 16:14:00,M,2100-05-31,2 days 20:34:00,68.566667
4,25,129635,EMERGENCY,ACUTE CORONARY SYNDROME,2160-11-05 14:55:00,2160-11-02 02:06:00,M,2101-11-21,3 days 12:49:00,84.816667


In [14]:
# adm_pat_merge_copy = adm_pat_merge.copy()

In [15]:
# adm_pat_merge_copy.head()

In [16]:
# adm_pat_merge['admittime'][0]

In [17]:
# xyz = adm_pat_merge['admittime'][0]+pd.Timedelta(days=30)

In [18]:
# adm_pat_merge['admittime'][0] <= xyz

In [19]:
# Create DF of admissions by patient to check against
# adm_subj = admissions_df[['subject_id','admittime']].sort_values(by='subject_id')

In [20]:
# def readmit(row):
#     subject_id = row[0]
#     admittime = row[5]
#     print(subject_id, admittime)

In [21]:
# adm_pat_merge.apply(readmit)

### Pull ICD9 codes for Sepsis

In [22]:
# Connect to Postgres & get all d_icd_diagnoses where short or long title indicates 'sepsis' or 'septicemia'
try:
    con = psycopg2.connect("host='localhost' dbname='mimic' user='postgres' password='postgres'")
    cur = con.cursor()
    cur.execute ("""SELECT icd9_code, short_title, long_title
	FROM mimiciii.d_icd_diagnoses
	WHERE long_title LIKE ANY(ARRAY['Sepsi%', 'Septi%','sepsi%', 'septi%', 'Severe sepsis', 'severe sepsis', 'Puerperal sep%', 'puerperal sep%']) OR
	short_title LIKE ANY(ARRAY['Sepsi%', 'Septi%','sepsi%', 'septi%', 'Severe sepsis', 'severe sepsis', 'Puerperal sep%', 'puerperal sep%']);""")
    con.commit()
    print('OK')
except Exception as e:
    print(e)  

OK


In [23]:
# Store ICD9_CODE result in var
sepsis_all = cur.fetchall()

In [24]:
# Convert ICD9_CODE result to DF
sepsis_df = pd.DataFrame(sepsis_all, columns = ['icd9_code', 'short_title', 'long_title'])

In [25]:
# Get list of relevant ICD9 codes, just here for reference since it's pasted below
sepsis_codes_list = sepsis_df['icd9_code'].to_list()
# sepsis_codes_list

### Pull Diagnoses ICD data to find admissions with sepsis

In [26]:
# Connect to Postgres & get all admissions where a sepsis code was used
try:
    con = psycopg2.connect("host='localhost' dbname='mimic' user='postgres' password='postgres'")
    cur = con.cursor()
    cur.execute ("""SELECT hadm_id, icd9_code
	FROM mimiciii.diagnoses_icd
	WHERE icd9_code = ANY(ARRAY['0383',
 '03840',
 '03841',
 '03842',
 '03843',
 '03844',
 '0388',
 '0389',
 '0202',
 '449',
 '41512',
 '42292',
 '65930',
 '65931',
 '65933',
 '77181',
 '99591',
 '99592',
 '78552',
 '67020',
 '67022',
 '67024',
 '67030',
 '67032',
 '67034']);""")
    con.commit()
    print('OK')
except Exception as e:
    print(e)  

OK


In [27]:
# Store DIAGNOSES_ICD result in var
adm_sepsis_all = cur.fetchall()

In [28]:
# Convert DIAGNOSES_ICD result to DF
adm_sepsis_df = pd.DataFrame(adm_sepsis_all, columns = ['hadm_id', 'icd9_code'])

In [29]:
# Create DF that contains every admission where sepsis was diagnosed & tally number of those diagnoses for the given admission
adm_sepsis_cnt = adm_sepsis_df.groupby(by='hadm_id').agg({'icd9_code':'count'})
# Rename column to 'sepsis_count'
adm_sepsis_cnt.rename(columns={'icd9_code':'sepsis_count'}, inplace=True)

### Merge count of sepsis into admission-patient DF

In [30]:
# Create new merged DF with admissions, patient, & sepsis count data
adm_pat_sep = adm_pat_merge.merge(adm_sepsis_cnt, how='left', on='hadm_id')
# Convert NaNs in 'sepsis_count' to zeroes
adm_pat_sep['sepsis_count'].fillna(value=0, inplace=True)

In [45]:
adm_pat_sep.sample(5)

Unnamed: 0,subject_id,hadm_id,admission_type,diagnosis,dischtime,admittime,gender,dob,adm_los,adm_los_hrs,sepsis_count
58141,93651,191388,EMERGENCY,FREE AIR,2132-02-02 17:25:00,2132-02-01 22:13:00,F,1832-02-01,0 days 19:12:00,19.2,0.0
44004,54006,130060,EMERGENCY,LEFT ANKLE INFECTION,2197-05-09 16:19:00,2197-04-17 13:13:00,F,2133-11-30,22 days 03:06:00,531.1,2.0
9182,9248,183800,ELECTIVE,ABDOMINAL AORTIC ANEURYSM,2174-05-06 12:24:00,2174-04-24 13:55:00,M,2092-05-05,11 days 22:29:00,286.483333,0.0
35166,30018,124022,EMERGENCY,IVH,2168-12-22 05:39:00,2168-12-10 10:51:00,M,2082-04-23,11 days 18:48:00,282.8,0.0
11279,6741,100191,EMERGENCY,INTRACRANIAL HEMORRHAGE,2146-11-25 16:30:00,2146-11-17 23:08:00,M,2063-06-10,7 days 17:22:00,185.366667,1.0


### Pull D_ITEMS data related to "bedside patient care"

In [32]:
# Connect to Postgres & get all d_items where label indicates "bedside patient care"
try:
    con = psycopg2.connect("host='localhost' dbname='mimic' user='postgres' password='postgres'")
    cur = con.cursor()
    cur.execute ("""SELECT itemid, label
	FROM mimiciii.d_items
	WHERE label LIKE ANY(ARRAY['Dressing','Dressing Applied%','Dressing Change','Dressing change','catheter reposi%',
    'Catheter','bath%','show%','shav%', 'teeth%', 'bedbath', 'bed/bath%','activi%', 'Food%']);""")
    con.commit()
    print('OK')
except Exception as e:
    print(e)  

OK


In [33]:
# Store D_ITEMS result in var
items_care_all = cur.fetchall()

In [34]:
# Convert D_ITEMS result to DF
items_care_df = pd.DataFrame(items_care_all, columns = ['itemid', 'label'])

In [35]:
# Get list of relevant itemids, just here for reference since it's pasted below
items_care_list = items_care_df['itemid'].to_list()

In [46]:
# items_care_list

### Get all "bedside patient care" entries from CHARTEVENTS

In [37]:
# Connect to Postgres & get all chartevents where itemid matches those from "bedside patient care" list
try:
    con = psycopg2.connect("host='localhost' dbname='mimic' user='postgres' password='postgres'")
    cur = con.cursor()
    cur.execute ("""SELECT subject_id, hadm_id, icustay_id, itemid
	FROM mimiciii.chartevents
	WHERE itemid = ANY(ARRAY[1053,
 1063,
 1066,
 1202,
 5548,
 7896,
 4605,
 1382,
 3058,
 5678,
 3013,
 3014,
 6269,
 7652,
 44555,
 228482,
 227955]);""")
    con.commit()
    print('OK')
except Exception as e:
    print(e) 

OK


In [38]:
# Store CHARTEVENTS result in var
chart_care_all = cur.fetchall()

In [39]:
chart_care_all

[]

### Not finding any "bedside patient care" entries in Chartevents, strange.

## Check if readmitted within 30 cal days

In [58]:
adm_pat_sep_copy = adm_pat_sep.copy()

In [76]:
# adm_pat_sep['readmit_30'] = adm_pat_sep.apply(lambda row: [6])
# adm_pat_sep['readmit_30'] = 'yes'

In [77]:
# adm_pat_sep.head()

Unnamed: 0,subject_id,hadm_id,admission_type,diagnosis,dischtime,admittime,gender,dob,adm_los,adm_los_hrs,sepsis_count,readmit_30
0,22,165315,EMERGENCY,BENZODIAZEPINE OVERDOSE,2196-04-10 15:54:00,2196-04-09 12:26:00,F,2131-05-07,1 days 03:28:00,27.466667,0.0,
1,23,152223,ELECTIVE,CORONARY ARTERY DISEASE\CORONARY ARTERY BYPASS...,2153-09-08 19:10:00,2153-09-03 07:15:00,M,2082-07-17,5 days 11:55:00,131.916667,0.0,
2,23,124321,EMERGENCY,BRAIN MASS,2157-10-25 14:00:00,2157-10-18 19:34:00,M,2082-07-17,6 days 18:26:00,162.433333,0.0,
3,24,161859,EMERGENCY,INTERIOR MYOCARDIAL INFARCTION,2139-06-09 12:48:00,2139-06-06 16:14:00,M,2100-05-31,2 days 20:34:00,68.566667,0.0,
4,25,129635,EMERGENCY,ACUTE CORONARY SYNDROME,2160-11-05 14:55:00,2160-11-02 02:06:00,M,2101-11-21,3 days 12:49:00,84.816667,0.0,


In [90]:
len(adm_pat_sep['subject_id'])

58976

In [88]:
adms_w_readmit_30 = []

for index, row in adm_pat_sep.iterrows():
    hadm_id = row[1]
    subject_id = row[0]
    admittime = row[5]
    admittime_30 = admittime+pd.Timedelta(days=30)
    
    if subject_id in adm_pat_sep_copy['subject_id']:
        adms_w_readmit_30.append(hadm_id)
    
    

In [91]:
len(adms_w_readmit_30)

45912

In [93]:
adms_w_readmit_30[:10]

[165315,
 152223,
 124321,
 161859,
 129635,
 197661,
 134931,
 162569,
 104557,
 128652]

In [104]:
def readmit(row):
    subject_id = row[0]
    admittime = row[5]
#     print(subject_id, admittime)
    return 'yes'

In [105]:
adm_pat_sep['readmit_30'] = adm_pat_sep.apply(lambda row: readmit(row))

In [106]:
adm_pat_sep.head()

Unnamed: 0,subject_id,hadm_id,admission_type,diagnosis,dischtime,admittime,gender,dob,adm_los,adm_los_hrs,sepsis_count,readmit_30
0,22,165315,EMERGENCY,BENZODIAZEPINE OVERDOSE,2196-04-10 15:54:00,2196-04-09 12:26:00,F,2131-05-07,1 days 03:28:00,27.466667,0.0,
1,23,152223,ELECTIVE,CORONARY ARTERY DISEASE\CORONARY ARTERY BYPASS...,2153-09-08 19:10:00,2153-09-03 07:15:00,M,2082-07-17,5 days 11:55:00,131.916667,0.0,
2,23,124321,EMERGENCY,BRAIN MASS,2157-10-25 14:00:00,2157-10-18 19:34:00,M,2082-07-17,6 days 18:26:00,162.433333,0.0,
3,24,161859,EMERGENCY,INTERIOR MYOCARDIAL INFARCTION,2139-06-09 12:48:00,2139-06-06 16:14:00,M,2100-05-31,2 days 20:34:00,68.566667,0.0,
4,25,129635,EMERGENCY,ACUTE CORONARY SYNDROME,2160-11-05 14:55:00,2160-11-02 02:06:00,M,2101-11-21,3 days 12:49:00,84.816667,0.0,


# STILL CANNOT GET READMITT 30 TO WORK