# Opioid Study Data Collection
#### Defined study plan: https://docs.google.com/document/d/1kkTbheDP5vS8rh_C6W1U7qzthkScqrzKBjd0jSY3j48/edit
The main pieces consist of:

1. Inclusion/Exclusion Criteria
2. Demographic Data
3. Clinical Data

#### About: 
- Below I use PostgresSQL to query MIMIC-3 through a PostgresSQL adapter for Python
- Each step for collection is outlined below and the steps build on eachother
- Each step will have some relevant info to give context. In the SQL code, if the line starts with "--" it's a readable comment for more clarification on what the code does
- Once I write the SQL code, it is rendered as a Pandas dataframe, which is a really flexible library for data manipulation. From here, I can export the data to a common output format (.csv, .xslm, .tsv, etc.)

In [32]:
# Import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import psycopg2
from IPython.display import display, HTML # used to print out pretty pandas dataframes
display(HTML("<style>.container { width:100% !important; }</style>")) # widest display
import matplotlib.dates as dates
import finddrugs
import matplotlib.lines as mlines

pd.options.display.max_colwidth = 1000
pd.options.display.width = 1000
pd.options.display.max_columns = 1000
pd.options.display.max_rows = 10


%matplotlib inline
plt.style.use('ggplot') 

# specify user/password/where the database is
sqluser = 'eightiesfanjan'
sqlpass = 'squiggle'
dbname = 'mimic'
schema_name = 'mimiciii'
host = 'localhost'

query_schema = 'SET search_path to ' + schema_name + ';'

# connect to the database
con = psycopg2.connect(dbname=dbname, user=sqluser, password=sqlpass, host=host)

## Phase 1: Inclusion Exclusion Criteria


### Step 1) Let's get earliest icu admits and admits with 180+ day intervals
- 61,532 original records from icustays table (takes into account those directly in ICU and those who transferred between floors)
- After filtering for earliest record or those with 180 day interals, there are 51,373 records
- More info on icustays table: https://mimic.physionet.org/mimictables/icustays/

In [428]:
query = query_schema + """
WITH icu_admits AS (
    SELECT subject_id
        ,hadm_id
        ,intime
        ,outtime
        ,LAG (outtime) OVER (PARTITION BY subject_id ORDER BY outtime ASC) AS last_out_time
        ,extract(days FROM (intime - LAG (outtime) OVER (PARTITION BY subject_id ORDER BY outtime ASC))) AS diff_last_outtime
    FROM icustays
    GROUP BY 1,2,3,4
    ORDER BY 4 ASC
), valid_icu_admits AS (
    SELECT *
    FROM icu_admits
    WHERE (diff_last_outtime is null) OR (diff_last_outtime > 180)
)
SELECT *
FROM valid_icu_admits

"""
df_demo= pd.read_sql_query(query,con)
df_demo



Unnamed: 0,subject_id,hadm_id,intime,outtime,last_out_time,diff_last_outtime
0,82574,118464,2100-06-07 20:00:22,2100-06-08 14:59:31,,
1,21081,159656,2100-06-14 14:33:55,2100-06-15 17:36:37,,
...,...,...,...,...,...,...
51371,14712,188201,2110-01-29 23:41:00,NaT,,
51372,5216,130232,2114-02-26 05:41:00,NaT,,



### Step 2) Get 18+ year olds and no death within 24 hours of admittance
- From 51,373 records in Step 1, we go down to 42,211 (18% decrease) after filtering for no death in 24 hrs and 18+ yr olds
- More info on patients table: https://mimic.physionet.org/mimictables/patients/
- More info on icustays table: https://mimic.physionet.org/mimictables/icustays/


In [19]:
query = query_schema + """
WITH icu_admits AS (
    SELECT icu.row_id 
        , icu.subject_id
        ,icu.hadm_id
        ,intime
        ,outtime
        ,ROUND((CAST(icu.intime as DATE) - cast(pat.dob as DATE))/365.242, 2) AS age
        ,EXTRACT(epoch FROM(dod - intime))/3600.00 AS diff_death_admit_hrs        
        ,EXTRACT(days FROM (intime - LAG (outtime) OVER (PARTITION BY icu.subject_id ORDER BY outtime ASC))) AS diff_last_outtime
    FROM icustays icu
    INNER JOIN patients pat
    ON icu.subject_id = pat.subject_id
    GROUP BY 1,2,3,4,5,6,7
    ORDER BY 1 ASC
)
SELECT *
FROM icu_admits
WHERE age > 18 AND 
    -- exclusion criteria: < 24 hr death
    (diff_death_admit_hrs > 24 OR diff_death_admit_hrs is null) AND
    -- inclusion criteria: unique earliest icu admit, with 180 day offset if multiple records
    (diff_last_outtime is null OR diff_last_outtime > 180)
"""
df_clean= pd.read_sql_query(query,con)
df_clean



Unnamed: 0,row_id,subject_id,hadm_id,intime,outtime,age,diff_death_admit_hrs,diff_last_outtime
0,2,3,145834,2101-10-20 19:10:11,2101-10-26 20:43:09,76.52,5668.830278,
1,3,4,185777,2191-03-16 00:29:31,2191-03-17 16:46:31,47.84,,
2,5,6,107064,2175-05-30 21:30:54,2175-06-03 13:39:54,65.94,,
3,9,9,150750,2149-11-09 13:07:02,2149-11-14 20:52:14,41.79,106.882778,
4,11,11,194540,2178-04-16 06:19:32,2178-04-17 20:21:05,50.15,5081.674444,
...,...,...,...,...,...,...,...,...
42206,61528,99985,176670,2181-01-29 05:33:34,2181-02-09 12:45:20,53.81,,
42207,61529,99991,151118,2184-12-28 17:30:58,2184-12-31 20:56:20,47.73,,
42208,61530,99992,197084,2144-07-25 18:04:42,2144-07-27 17:27:55,65.77,,
42209,61531,99995,137810,2147-02-08 13:53:58,2147-02-10 17:46:30,88.70,5578.100556,



### Step 3) Map patients to ALL relevant ICD9 codes in relevant admission
- Admits mapped to an array of ICD9 codes, ordered by priority level. 
- Result: 42,211 records
- The first icd9 code in this array is their reason for admission
    - See here for more understanding: https://github.com/MIT-LCP/mimic-code/issues/199
- To keep things readable, I add a table called flags that are binary values to filter out people
- More info on patients table: https://mimic.physionet.org/mimictables/patients/
- More info on icustays table: https://mimic.physionet.org/mimictables/icustays/


In [20]:
query = query_schema + """
WITH icu_admits AS (
    SELECT icu.row_id
        , icu.subject_id
        ,icu.hadm_id
        ,intime
        ,outtime
        ,ROUND((CAST(icu.intime as DATE) - cast(pat.dob as DATE))/365.242, 2) AS age
        ,EXTRACT(epoch FROM(dod - intime))/3600.00 AS diff_death_admit_hrs        
        ,EXTRACT(days FROM (intime - LAG (outtime) OVER (PARTITION BY icu.subject_id ORDER BY outtime ASC))) AS diff_last_outtime
    FROM icustays icu
    INNER JOIN patients pat
    ON icu.subject_id = pat.subject_id
    GROUP BY 1,2,3,4,5,6,7
    ORDER BY 1 ASC
), icd_codes AS (
    SELECT icu.*
        , array_agg(icd.icd9_code ORDER BY icd.seq_num) AS icd9_codes
        , array_agg(icd.seq_num ORDER BY icd.seq_num) AS seq_num
        , array_agg(d_names.short_title ORDER BY icd.seq_num) AS short_titles
        , array_agg(d_names.long_title ORDER BY icd.seq_num) AS long_titles
    FROM icu_admits icu
    INNER JOIN diagnoses_icd as icd
    ON icu.subject_id = icd.subject_id AND icu.hadm_id = icd.hadm_id
    INNER JOIN d_icd_diagnoses as d_names
    ON icd.icd9_code = d_names.icd9_code
    GROUP BY 1,2,3,4,5,6,7,8
), flags AS (
    SELECT icd_codes.*
        , CASE
            -- inclusion: unique earliest icu admit, with 180 day offset if multiple records
            WHEN (diff_last_outtime is null OR diff_last_outtime > 180)
            THEN 1
            ELSE 0
            END AS valid_icu_admit        
        , CASE
            -- inclusion: age > 18
            WHEN age > 18
            THEN 1
            ELSE 0
            END AS valid_age
        , CASE
            -- inclusion: death time > 24 hrs of admit
            WHEN (diff_death_admit_hrs > 24 OR diff_death_admit_hrs is null)
            THEN 1
            ELSE 0
            END AS valid_death  
            
    FROM icd_codes
)
SELECT *
FROM flags
WHERE valid_icu_admit = 1 AND valid_age = 1 AND valid_death = 1 
ORDER BY subject_id, hadm_id
"""
df_clean= pd.read_sql_query(query,con)
df_clean



Unnamed: 0,row_id,subject_id,hadm_id,intime,outtime,age,diff_death_admit_hrs,diff_last_outtime,icd9_codes,seq_num,short_titles,long_titles,valid_icu_admit,valid_age,valid_death
0,2,3,145834,2101-10-20 19:10:11,2101-10-26 20:43:09,76.52,5668.830278,,"[0389, 78559, 5849, 4275, 41071, 4280, 6826, 4254, 2639]","[1, 2, 3, 4, 5, 6, 7, 8, 9]","[Septicemia NOS, Shock w/o trauma NEC, Acute kidney failure NOS, Cardiac arrest, Subendo infarct, initial, CHF NOS, Cellulitis of leg, Prim cardiomyopathy NEC, Protein-cal malnutr NOS]","[Unspecified septicemia, Other shock without mention of trauma, Acute kidney failure, unspecified, Cardiac arrest, Subendocardial infarction, initial episode of care, Congestive heart failure, unspecified, Cellulitis and abscess of leg, except foot, Other primary cardiomyopathies, Unspecified protein-calorie malnutrition]",1,1,1
1,3,4,185777,2191-03-16 00:29:31,2191-03-17 16:46:31,47.84,,,"[042, 1363, 7994, 2763, 7907, 5715, 04111, V090, E9317]","[1, 2, 3, 4, 5, 6, 7, 8, 9]","[Human immuno virus dis, Pneumocystosis, Cachexia, Alkalosis, Bacteremia, Cirrhosis of liver NOS, Mth sus Stph aur els/NOS, Inf mcrg rstn pncllins, Adv eff antiviral drugs]","[Human immunodeficiency virus [HIV] disease, Pneumocystosis, Cachexia, Alkalosis, Bacteremia, Cirrhosis of liver without mention of alcohol, Methicillin susceptible Staphylococcus aureus in conditions classified elsewhere and of unspecified site, Infection with microorganisms resistant to penicillins, Antiviral drugs causing adverse effects in therapeutic use]",1,1,1
2,5,6,107064,2175-05-30 21:30:54,2175-06-03 13:39:54,65.94,,,"[40391, 9972, 2767, 2859, 2753, V1582]","[1, 3, 5, 6, 7, 8]","[Hyp kid NOS w cr kid V, Surg comp-peri vasc syst, Hyperpotassemia, Anemia NOS, Dis phosphorus metabol, History of tobacco use]","[Hypertensive chronic kidney disease, unspecified, with chronic kidney disease stage V or end stage renal disease, Peripheral vascular complications, not elsewhere classified, Hyperpotassemia, Anemia, unspecified, Disorders of phosphorus metabolism, Personal history of tobacco use]",1,1,1
3,9,9,150750,2149-11-09 13:07:02,2149-11-14 20:52:14,41.79,106.882778,,"[431, 5070, 4280, 5849, 4019]","[1, 2, 3, 4, 6]","[Intracerebral hemorrhage, Food/vomit pneumonitis, CHF NOS, Acute kidney failure NOS, Hypertension NOS]","[Intracerebral hemorrhage, Pneumonitis due to inhalation of food or vomitus, Congestive heart failure, unspecified, Acute kidney failure, unspecified, Unspecified essential hypertension]",1,1,1
4,11,11,194540,2178-04-16 06:19:32,2178-04-17 20:21:05,50.15,5081.674444,,[1913],[1],[Mal neo parietal lobe],[Malignant neoplasm of parietal lobe],1,1,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
42206,61528,99985,176670,2181-01-29 05:33:34,2181-02-09 12:45:20,53.81,,,"[0389, 51881, 48241, 4870, 78552, V4281, 99592, 2449, 2724, 2859, 53081, V1072, 23871]","[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]","[Septicemia NOS, Acute respiratry failure, Meth sus pneum d/t Staph, Influenza with pneumonia, Septic shock, Trnspl status-bne marrow, Severe sepsis, Hypothyroidism NOS, Hyperlipidemia NEC/NOS, Anemia NOS, Esophageal reflux, Hx-hodgkin's disease, Essntial thrombocythemia]","[Unspecified septicemia, Acute respiratory failure, Methicillin susceptible pneumonia due to Staphylococcus aureus, Influenza with pneumonia, Septic shock, Bone marrow replaced by transplant, Severe sepsis, Unspecified acquired hypothyroidism, Other and unspecified hyperlipidemia, Anemia, unspecified, Esophageal reflux, Personal history of hodgkin's disease, Essential thrombocythemia]",1,1,1
42207,61529,99991,151118,2184-12-28 17:30:58,2184-12-31 20:56:20,47.73,,,"[56211, 0389, 5570, 5849, 99592, 56081, 78959, 5538, 7885, 40291, 4280, 71947, 5644, 25000, V0254, E8788, 27651]","[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17]","[Dvrtcli colon w/o hmrhg, Septicemia NOS, Ac vasc insuff intestine, Acute kidney failure NOS, Severe sepsis, Intestinal adhes w obstr, Ascites NEC, Hernia NEC, Oliguria & anuria, Hyp ht dis NOS w ht fail, CHF NOS, Joint pain-ankle, Postop GI funct dis NEC, DMII wo cmp nt st uncntr, Meth resis Staph carrier, Abn react-surg proc NEC, Dehydration]","[Diverticulitis of colon (without mention of hemorrhage), Unspecified septicemia, Acute vascular insufficiency of intestine, Acute kidney failure, unspecified, Severe sepsis, Intestinal or peritoneal adhesions with obstruction (postoperative) (postinfection), Other ascites, Hernia of other specified sites without mention of obstruction or gangrene, Oliguria and anuria, Unspecified hypertensive heart disease with heart failure, Congestive heart failure, unspecified, Pain in joint, ankle and foot, Other postoperative functional disorders, Diabetes mellitus without mention of complication, type II or unspecified type, not stated as uncontrolled, Carrier or suspected carrier of Methicillin resistant Staphylococcus aureus, Other specified surgical operations and procedures causing abnormal patient reaction, or later complication, without mention of misadventure at time of operation, Dehydration]",1,1,1
42208,61530,99992,197084,2144-07-25 18:04:42,2144-07-27 17:27:55,65.77,,,"[9999, 56881, 5772, 2851, 5849, 5799, 72992, 53081, 4019, 2721, 5699, 3004]","[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]","[Complic med care NEC/NOS, Hemoperitoneum, Pancreat cyst/pseudocyst, Ac posthemorrhag anemia, Acute kidney failure NOS, Intest malabsorption NOS, Nontrauma hema soft tiss, Esophageal reflux, Hypertension NOS, Pure hyperglyceridemia, Intestinal disorder NOS, Dysthymic disorder]","[Other and unspecified complications of medical care, not elsewhere classified, Hemoperitoneum (nontraumatic), Cyst and pseudocyst of pancreas, Acute posthemorrhagic anemia, Acute kidney failure, unspecified, Unspecified intestinal malabsorption, Nontraumatic hematoma of soft tissue, Esophageal reflux, Unspecified essential hypertension, Pure hyperglyceridemia, Unspecified disorder of intestine, Dysthymic disorder]",1,1,1
42209,61531,99995,137810,2147-02-08 13:53:58,2147-02-10 17:46:30,88.70,5578.100556,,"[4414, 42833, 99812, 2851, 4241, 25000, 99811, 9961, E8798, 2724, V4581, 4280, V103, V1582, V5861, 4400, 41401]","[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17]","[Abdom aortic aneurysm, Ac on chr diast hrt fail, Hematoma complic proc, Ac posthemorrhag anemia, Aortic valve disorder, DMII wo cmp nt st uncntr, Hemorrhage complic proc, Malfunc vasc device/graf, Abn react-procedure NEC, Hyperlipidemia NEC/NOS, Aortocoronary bypass, CHF NOS, Hx of breast malignancy, History of tobacco use, Long-term use anticoagul, Aortic atherosclerosis, Crnry athrscl natve vssl]","[Abdominal aneurysm without mention of rupture, Acute on chronic diastolic heart failure, Hematoma complicating a procedure, Acute posthemorrhagic anemia, Aortic valve disorders, Diabetes mellitus without mention of complication, type II or unspecified type, not stated as uncontrolled, Hemorrhage complicating a procedure, Mechanical complication of other vascular device, implant, and graft, Other specified procedures as the cause of abnormal reaction of patient, or of later complication, without mention of misadventure at time of procedure, Other and unspecified hyperlipidemia, Aortocoronary bypass status, Congestive heart failure, unspecified, Personal history of malignant neoplasm of breast, Personal history of tobacco use, Long-term (current) use of anticoagulants, Atherosclerosis of aorta, Coronary atherosclerosis of native coronary artery]",1,1,1



### Step 4**) Get those with no opioid abuse, no anoxic brain injury, no cancer
- As of 12/11/2018, we are just filtering out opioid abuse, anoxic brain, and cancer
- Result: 36,440 records
- Opioid/heroid abuse (or poisoning) ICD9 codes
    - https://www.ncbi.nlm.nih.gov/books/NBK367628/table/sb202.t4/?report=objectonly
- More info on diagnoses table: https://mimic.physionet.org/mimictables/d_icd_diagnoses/
- More info on patients table: https://mimic.physionet.org/mimictables/patients/
- More info on icustays table: https://mimic.physionet.org/mimictables/icustays/


In [21]:
query = query_schema + """
WITH icu_admits AS (
    SELECT icu.row_id 
        , icu.subject_id
        ,icu.hadm_id
        ,intime
        ,outtime
        ,ROUND((CAST(icu.intime as DATE) - cast(pat.dob as DATE))/365.242, 2) AS age
        ,EXTRACT(epoch FROM(dod - intime))/3600.00 AS diff_death_admit_hrs        
        ,EXTRACT(days FROM (intime - LAG (outtime) OVER (PARTITION BY icu.subject_id ORDER BY outtime ASC))) AS diff_last_outtime
    FROM icustays icu
    INNER JOIN patients pat
    ON icu.subject_id = pat.subject_id
    GROUP BY 1,2,3,4,5,6,7
    ORDER BY 1 ASC
), icd_codes AS (
    SELECT icu.*
        , array_agg(icd.icd9_code ORDER BY icd.seq_num) AS icd9_codes
        , array_agg(icd.seq_num ORDER BY icd.seq_num) AS seq_num
        , array_agg(d_names.short_title ORDER BY icd.seq_num) AS short_titles
        , array_agg(d_names.long_title ORDER BY icd.seq_num) AS long_titles
    FROM icu_admits icu
    INNER JOIN diagnoses_icd as icd
    ON icu.subject_id = icd.subject_id AND icu.hadm_id = icd.hadm_id
    INNER JOIN d_icd_diagnoses as d_names
    ON icd.icd9_code = d_names.icd9_code
    GROUP BY 1,2,3,4,5,6,7,8
), flags AS (
    SELECT icd_codes.*
        , CASE
            -- inclusion: unique earliest icu admit, with 180 day offset if multiple records
            WHEN (diff_last_outtime is null OR diff_last_outtime > 180)
            THEN 1
            ELSE 0
            END AS valid_icu_admit        
        , CASE
            -- inclusion: age > 18
            WHEN age > 18
            THEN 1
            ELSE 0
            END AS valid_age
        , CASE
            -- inclusion: death time > 24 hrs of admit
            WHEN (diff_death_admit_hrs > 24 OR diff_death_admit_hrs is null)
            THEN 1
            ELSE 0
            END AS valid_death  
        , CASE
            -- build icd9 poisoning or opiate abuse or heroin use
            WHEN icd9_codes && ARRAY['E8502', 'E9350', '96509', '30550', '30551', '30552', '30553']::varchar[]
            THEN 1
            ELSE 0
            END AS opiate_abuse
        , CASE
            -- anoxic brain injury
            WHEN icd9_codes && ARRAY['3481']::varchar[]
            THEN 1
            ELSE 0
            END AS has_anoxic_brain
        , CASE
            WHEN icd9_codes && (SELECT array_agg(icd9_code)
                                FROM d_icd_diagnoses
                                -- build icd9 cancer codes from: https://www.ncbi.nlm.nih.gov/books/NBK230788/
                                WHERE lower(long_title) LIKE '%cancer%' OR lower(long_title) LIKE '%malignant%'
                                -- but dont grab icd9 codes for screenings, personal history, or family history of cancer icd9_code = 'V.x.x.x.x'
                                AND icd9_code NOT LIKE 'V%')::varchar[]
            THEN 1
            ELSE 0
            END AS has_cancer
            
            
    FROM icd_codes
)

SELECT *
FROM flags
    WHERE 
    valid_icu_admit = 1 AND
    valid_age = 1 AND
    valid_death = 1 AND
    has_anoxic_brain = 0 AND
    has_cancer = 0 AND 
    opiate_abuse= 0
ORDER BY subject_id, hadm_id
"""

df_clean= pd.read_sql_query(query,con)
df_clean

Unnamed: 0,row_id,subject_id,hadm_id,intime,outtime,age,diff_death_admit_hrs,diff_last_outtime,icd9_codes,seq_num,short_titles,long_titles,valid_icu_admit,valid_age,valid_death,opiate_abuse,has_anoxic_brain,has_cancer
0,2,3,145834,2101-10-20 19:10:11,2101-10-26 20:43:09,76.52,5668.830278,,"[0389, 78559, 5849, 4275, 41071, 4280, 6826, 4254, 2639]","[1, 2, 3, 4, 5, 6, 7, 8, 9]","[Septicemia NOS, Shock w/o trauma NEC, Acute kidney failure NOS, Cardiac arrest, Subendo infarct, initial, CHF NOS, Cellulitis of leg, Prim cardiomyopathy NEC, Protein-cal malnutr NOS]","[Unspecified septicemia, Other shock without mention of trauma, Acute kidney failure, unspecified, Cardiac arrest, Subendocardial infarction, initial episode of care, Congestive heart failure, unspecified, Cellulitis and abscess of leg, except foot, Other primary cardiomyopathies, Unspecified protein-calorie malnutrition]",1,1,1,0,0,0
1,3,4,185777,2191-03-16 00:29:31,2191-03-17 16:46:31,47.84,,,"[042, 1363, 7994, 2763, 7907, 5715, 04111, V090, E9317]","[1, 2, 3, 4, 5, 6, 7, 8, 9]","[Human immuno virus dis, Pneumocystosis, Cachexia, Alkalosis, Bacteremia, Cirrhosis of liver NOS, Mth sus Stph aur els/NOS, Inf mcrg rstn pncllins, Adv eff antiviral drugs]","[Human immunodeficiency virus [HIV] disease, Pneumocystosis, Cachexia, Alkalosis, Bacteremia, Cirrhosis of liver without mention of alcohol, Methicillin susceptible Staphylococcus aureus in conditions classified elsewhere and of unspecified site, Infection with microorganisms resistant to penicillins, Antiviral drugs causing adverse effects in therapeutic use]",1,1,1,0,0,0
2,5,6,107064,2175-05-30 21:30:54,2175-06-03 13:39:54,65.94,,,"[40391, 9972, 2767, 2859, 2753, V1582]","[1, 3, 5, 6, 7, 8]","[Hyp kid NOS w cr kid V, Surg comp-peri vasc syst, Hyperpotassemia, Anemia NOS, Dis phosphorus metabol, History of tobacco use]","[Hypertensive chronic kidney disease, unspecified, with chronic kidney disease stage V or end stage renal disease, Peripheral vascular complications, not elsewhere classified, Hyperpotassemia, Anemia, unspecified, Disorders of phosphorus metabolism, Personal history of tobacco use]",1,1,1,0,0,0
3,9,9,150750,2149-11-09 13:07:02,2149-11-14 20:52:14,41.79,106.882778,,"[431, 5070, 4280, 5849, 4019]","[1, 2, 3, 4, 6]","[Intracerebral hemorrhage, Food/vomit pneumonitis, CHF NOS, Acute kidney failure NOS, Hypertension NOS]","[Intracerebral hemorrhage, Pneumonitis due to inhalation of food or vomitus, Congestive heart failure, unspecified, Acute kidney failure, unspecified, Unspecified essential hypertension]",1,1,1,0,0,0
4,13,13,143045,2167-01-08 18:44:25,2167-01-12 10:43:31,39.86,,,"[41401, 4111, 25000, 4019, 2720]","[1, 2, 3, 4, 5]","[Crnry athrscl natve vssl, Intermed coronary synd, DMII wo cmp nt st uncntr, Hypertension NOS, Pure hypercholesterolem]","[Coronary atherosclerosis of native coronary artery, Intermediate coronary syndrome, Diabetes mellitus without mention of complication, type II or unspecified type, not stated as uncontrolled, Unspecified essential hypertension, Pure hypercholesterolemia]",1,1,1,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
36435,61528,99985,176670,2181-01-29 05:33:34,2181-02-09 12:45:20,53.81,,,"[0389, 51881, 48241, 4870, 78552, V4281, 99592, 2449, 2724, 2859, 53081, V1072, 23871]","[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]","[Septicemia NOS, Acute respiratry failure, Meth sus pneum d/t Staph, Influenza with pneumonia, Septic shock, Trnspl status-bne marrow, Severe sepsis, Hypothyroidism NOS, Hyperlipidemia NEC/NOS, Anemia NOS, Esophageal reflux, Hx-hodgkin's disease, Essntial thrombocythemia]","[Unspecified septicemia, Acute respiratory failure, Methicillin susceptible pneumonia due to Staphylococcus aureus, Influenza with pneumonia, Septic shock, Bone marrow replaced by transplant, Severe sepsis, Unspecified acquired hypothyroidism, Other and unspecified hyperlipidemia, Anemia, unspecified, Esophageal reflux, Personal history of hodgkin's disease, Essential thrombocythemia]",1,1,1,0,0,0
36436,61529,99991,151118,2184-12-28 17:30:58,2184-12-31 20:56:20,47.73,,,"[56211, 0389, 5570, 5849, 99592, 56081, 78959, 5538, 7885, 40291, 4280, 71947, 5644, 25000, V0254, E8788, 27651]","[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17]","[Dvrtcli colon w/o hmrhg, Septicemia NOS, Ac vasc insuff intestine, Acute kidney failure NOS, Severe sepsis, Intestinal adhes w obstr, Ascites NEC, Hernia NEC, Oliguria & anuria, Hyp ht dis NOS w ht fail, CHF NOS, Joint pain-ankle, Postop GI funct dis NEC, DMII wo cmp nt st uncntr, Meth resis Staph carrier, Abn react-surg proc NEC, Dehydration]","[Diverticulitis of colon (without mention of hemorrhage), Unspecified septicemia, Acute vascular insufficiency of intestine, Acute kidney failure, unspecified, Severe sepsis, Intestinal or peritoneal adhesions with obstruction (postoperative) (postinfection), Other ascites, Hernia of other specified sites without mention of obstruction or gangrene, Oliguria and anuria, Unspecified hypertensive heart disease with heart failure, Congestive heart failure, unspecified, Pain in joint, ankle and foot, Other postoperative functional disorders, Diabetes mellitus without mention of complication, type II or unspecified type, not stated as uncontrolled, Carrier or suspected carrier of Methicillin resistant Staphylococcus aureus, Other specified surgical operations and procedures causing abnormal patient reaction, or later complication, without mention of misadventure at time of operation, Dehydration]",1,1,1,0,0,0
36437,61530,99992,197084,2144-07-25 18:04:42,2144-07-27 17:27:55,65.77,,,"[9999, 56881, 5772, 2851, 5849, 5799, 72992, 53081, 4019, 2721, 5699, 3004]","[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]","[Complic med care NEC/NOS, Hemoperitoneum, Pancreat cyst/pseudocyst, Ac posthemorrhag anemia, Acute kidney failure NOS, Intest malabsorption NOS, Nontrauma hema soft tiss, Esophageal reflux, Hypertension NOS, Pure hyperglyceridemia, Intestinal disorder NOS, Dysthymic disorder]","[Other and unspecified complications of medical care, not elsewhere classified, Hemoperitoneum (nontraumatic), Cyst and pseudocyst of pancreas, Acute posthemorrhagic anemia, Acute kidney failure, unspecified, Unspecified intestinal malabsorption, Nontraumatic hematoma of soft tissue, Esophageal reflux, Unspecified essential hypertension, Pure hyperglyceridemia, Unspecified disorder of intestine, Dysthymic disorder]",1,1,1,0,0,0
36438,61531,99995,137810,2147-02-08 13:53:58,2147-02-10 17:46:30,88.70,5578.100556,,"[4414, 42833, 99812, 2851, 4241, 25000, 99811, 9961, E8798, 2724, V4581, 4280, V103, V1582, V5861, 4400, 41401]","[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17]","[Abdom aortic aneurysm, Ac on chr diast hrt fail, Hematoma complic proc, Ac posthemorrhag anemia, Aortic valve disorder, DMII wo cmp nt st uncntr, Hemorrhage complic proc, Malfunc vasc device/graf, Abn react-procedure NEC, Hyperlipidemia NEC/NOS, Aortocoronary bypass, CHF NOS, Hx of breast malignancy, History of tobacco use, Long-term use anticoagul, Aortic atherosclerosis, Crnry athrscl natve vssl]","[Abdominal aneurysm without mention of rupture, Acute on chronic diastolic heart failure, Hematoma complicating a procedure, Acute posthemorrhagic anemia, Aortic valve disorders, Diabetes mellitus without mention of complication, type II or unspecified type, not stated as uncontrolled, Hemorrhage complicating a procedure, Mechanical complication of other vascular device, implant, and graft, Other specified procedures as the cause of abnormal reaction of patient, or of later complication, without mention of misadventure at time of procedure, Other and unspecified hyperlipidemia, Aortocoronary bypass status, Congestive heart failure, unspecified, Personal history of malignant neoplasm of breast, Personal history of tobacco use, Long-term (current) use of anticoagulants, Atherosclerosis of aorta, Coronary atherosclerosis of native coronary artery]",1,1,1,0,0,0



### Step 5**) Get discharge summaries for the 36k people
- As of 11/12/2018, we will NOT be using this table to determine those using opiates chronically, will instead use outpatient prescription table
- 40,217 records. Note: Discharge information can be of type report or addendum. Both can exist for a patient. Will filter only for report
- 36,888. Still about 444 duplicates -- need to investigate


In [23]:
query = query_schema + """
WITH icu_admits AS (
    SELECT icu.row_id 
        ,icu.subject_id
        ,icu.hadm_id
        ,intime
        ,outtime
        ,ROUND((CAST(icu.intime as DATE) - cast(pat.dob as DATE))/365.242, 2) AS age
        ,EXTRACT(epoch FROM(dod - intime))/3600.00 AS diff_death_admit_hrs        
        ,EXTRACT(days FROM (intime - LAG (outtime) OVER (PARTITION BY icu.subject_id ORDER BY outtime ASC))) AS diff_last_outtime
    FROM icustays icu
    INNER JOIN patients pat
    ON icu.subject_id = pat.subject_id
    GROUP BY 1,2,3,4,5,6,7
    ORDER BY 1 ASC
), icd_codes AS (
    SELECT icu.*
        , array_agg(icd.icd9_code ORDER BY icd.seq_num) AS icd9_codes
        , array_agg(icd.seq_num ORDER BY icd.seq_num) AS seq_num
        , array_agg(d_names.short_title ORDER BY icd.seq_num) AS short_titles
        , array_agg(d_names.long_title ORDER BY icd.seq_num) AS long_titles
    FROM icu_admits icu
    INNER JOIN diagnoses_icd as icd
    ON icu.subject_id = icd.subject_id AND icu.hadm_id = icd.hadm_id
    INNER JOIN d_icd_diagnoses as d_names
    ON icd.icd9_code = d_names.icd9_code
    GROUP BY 1,2,3,4,5,6,7,8
), flags AS (
    SELECT icd_codes.*
        , CASE
            -- inclusion: unique earliest icu admit, with 180 day offset if multiple records
            WHEN (diff_last_outtime is null OR diff_last_outtime > 180)
            THEN 1
            ELSE 0
            END AS valid_icu_admit        
        , CASE
            -- inclusion: age > 18
            WHEN age > 18
            THEN 1
            ELSE 0
            END AS valid_age
        , CASE
            -- inclusion: death time > 24 hrs of admit
            WHEN (diff_death_admit_hrs > 24 OR diff_death_admit_hrs is null)
            THEN 1
            ELSE 0
            END AS valid_death  
        , CASE
            -- build icd9 poisoning or opiate abuse or heroin use
            WHEN icd9_codes && ARRAY['E8502', 'E9350', '96509', '30550', '30551', '30552', '30553']::varchar[]
            THEN 1
            ELSE 0
            END AS opiate_abuse
        , CASE
            -- anoxic brain injury
            WHEN icd9_codes && ARRAY['3481']::varchar[]
            THEN 1
            ELSE 0
            END AS has_anoxic_brain
        , CASE
            WHEN icd9_codes && (SELECT array_agg(icd9_code)
                                FROM d_icd_diagnoses
                                -- build icd9 cancer codes from: https://www.ncbi.nlm.nih.gov/books/NBK230788/
                                WHERE lower(long_title) LIKE '%cancer%' OR lower(long_title) LIKE '%malignant%'
                                -- but dont grab icd9 codes for screenings, personal history, or family history of cancer icd9_code = 'V.x.x.x.x'
                                AND icd9_code NOT LIKE 'V%')::varchar[]
            THEN 1
            ELSE 0
            END AS has_cancer    
    FROM icd_codes
), discharges AS (
    SELECT flags.*
    , category
    , description
    , text
    FROM noteevents events
    INNER JOIN flags
    ON flags.subject_id = events.subject_id AND flags.hadm_id = events.hadm_id
    WHERE lower(category) like 'discharge summary' AND lower(description) like 'report'
)
SELECT *
FROM discharges
    WHERE 
    valid_icu_admit = 1 AND
    valid_age = 1 AND
    valid_death = 1 AND
    has_anoxic_brain = 0 AND
    has_cancer = 0 AND 
    opiate_abuse= 0
ORDER BY subject_id, hadm_id
"""

df_clean= pd.read_sql_query(query,con)
df_clean

Unnamed: 0,row_id,subject_id,hadm_id,intime,outtime,age,diff_death_admit_hrs,diff_last_outtime,icd9_codes,seq_num,short_titles,long_titles,valid_icu_admit,valid_age,valid_death,opiate_abuse,has_anoxic_brain,has_cancer,category,description,text
0,2,3,145834,2101-10-20 19:10:11,2101-10-26 20:43:09,76.52,5668.830278,,"[0389, 78559, 5849, 4275, 41071, 4280, 6826, 4254, 2639]","[1, 2, 3, 4, 5, 6, 7, 8, 9]","[Septicemia NOS, Shock w/o trauma NEC, Acute kidney failure NOS, Cardiac arrest, Subendo infarct, initial, CHF NOS, Cellulitis of leg, Prim cardiomyopathy NEC, Protein-cal malnutr NOS]","[Unspecified septicemia, Other shock without mention of trauma, Acute kidney failure, unspecified, Cardiac arrest, Subendocardial infarction, initial episode of care, Congestive heart failure, unspecified, Cellulitis and abscess of leg, except foot, Other primary cardiomyopathies, Unspecified protein-calorie malnutrition]",1,1,1,0,0,0,Discharge summary,Report,"Admission Date: [**2101-10-20**] Discharge Date: [**2101-10-31**]\n\nDate of Birth: [**2025-4-11**] Sex: M\n\nService: Medicine\n\nCHIEF COMPLAINT: Admitted from rehabilitation for\nhypotension (systolic blood pressure to the 70s) and\ndecreased urine output.\n\nHISTORY OF PRESENT ILLNESS: The patient is a 76-year-old\nmale who had been hospitalized at the [**Hospital1 190**] from [**10-11**] through [**10-19**] of [**2101**]\nafter undergoing a left femoral-AT bypass graft and was\nsubsequently discharged to a rehabilitation facility.\n\nOn [**2101-10-20**], he presented again to the [**Hospital1 346**] after being found to have a systolic\nblood pressure in the 70s and no urine output for 17 hours.\nA Foley catheter placed at the rehabilitation facility\nyielded 100 cc of murky/brown urine. There may also have\nbeen purulent discharge at the penile meatus at this time.\n\nOn presentation to the Emergency Department, the patient was\nwithout subjective complaints...."
1,3,4,185777,2191-03-16 00:29:31,2191-03-17 16:46:31,47.84,,,"[042, 1363, 7994, 2763, 7907, 5715, 04111, V090, E9317]","[1, 2, 3, 4, 5, 6, 7, 8, 9]","[Human immuno virus dis, Pneumocystosis, Cachexia, Alkalosis, Bacteremia, Cirrhosis of liver NOS, Mth sus Stph aur els/NOS, Inf mcrg rstn pncllins, Adv eff antiviral drugs]","[Human immunodeficiency virus [HIV] disease, Pneumocystosis, Cachexia, Alkalosis, Bacteremia, Cirrhosis of liver without mention of alcohol, Methicillin susceptible Staphylococcus aureus in conditions classified elsewhere and of unspecified site, Infection with microorganisms resistant to penicillins, Antiviral drugs causing adverse effects in therapeutic use]",1,1,1,0,0,0,Discharge summary,Report,"Admission Date: [**2191-3-16**] Discharge Date: [**2191-3-23**]\n\nDate of Birth: [**2143-5-12**] Sex: F\n\nService:\n\nCHIEF COMPLAINT: Shortness of breath and fevers.\n\nHISTORY OF PRESENT ILLNESS: The patient is a 47-year-old\nfemale with a history of human immunodeficiency virus (last\nCD4 count 42 and a viral load of 65,000), cirrhosis,\ndiabetes, and hypothyroidism presented with eight days of\nfevers to 104, chills, shortness of breath, cough, dyspnea on\nexertion, and fatigue.\n\nThe patient states she has become progressively dyspneic to\nthe point where she is short of breath with speaking. She\nhas also had night sweats for the past two days and whitish\nsputum. She complains of myalgias. No recent ill contacts.\n[**Name (NI) **] known tuberculosis exposure.\n\nIn the Emergency Department, the patient was initially 96% on\nroom air, with a respiratory rate of 20, and a heart rate of\n117. A chest x-ray showed diffuse interstitial opacities.\nShe receiv..."
2,5,6,107064,2175-05-30 21:30:54,2175-06-03 13:39:54,65.94,,,"[40391, 9972, 2767, 2859, 2753, V1582]","[1, 3, 5, 6, 7, 8]","[Hyp kid NOS w cr kid V, Surg comp-peri vasc syst, Hyperpotassemia, Anemia NOS, Dis phosphorus metabol, History of tobacco use]","[Hypertensive chronic kidney disease, unspecified, with chronic kidney disease stage V or end stage renal disease, Peripheral vascular complications, not elsewhere classified, Hyperpotassemia, Anemia, unspecified, Disorders of phosphorus metabolism, Personal history of tobacco use]",1,1,1,0,0,0,Discharge summary,Report,"Admission Date: [**2175-5-30**] Discharge Date: [**2175-6-15**]\n\nDate of Birth: Sex: F\n\nService:\n\n\nADMISSION DIAGNOSIS: End stage renal disease, admitted for\ntransplant surgery.\n\nHISTORY OF PRESENT ILLNESS: The patient is a 65 year-old\nwoman with end stage renal disease, secondary to malignant\nhypertension. She was started on dialysis in [**2174-2-7**]. She currently was on peritoneal dialysis and appears\nto be doing well. She has a history of gastric angiectasia\nwhich she requires endoscopy. She was admitted on [**2175-5-30**] for\na scheduled living donor kidney transplant by her son, who is\nthe donor. She does have a donor specific antibody (B-51)\nand will have a final T & B cell class match prior to\ntransplantation.\n\nPAST MEDICAL HISTORY: End stage renal disease, secondary to\nmalignant hypertension on dialysis. History of anemia\nfollowing gastric angiectasia. She has no known history for\ncoronary artery disease for diabe..."
3,9,9,150750,2149-11-09 13:07:02,2149-11-14 20:52:14,41.79,106.882778,,"[431, 5070, 4280, 5849, 4019]","[1, 2, 3, 4, 6]","[Intracerebral hemorrhage, Food/vomit pneumonitis, CHF NOS, Acute kidney failure NOS, Hypertension NOS]","[Intracerebral hemorrhage, Pneumonitis due to inhalation of food or vomitus, Congestive heart failure, unspecified, Acute kidney failure, unspecified, Unspecified essential hypertension]",1,1,1,0,0,0,Discharge summary,Report,"Admission Date: [**2149-11-9**] Discharge Date: [**2149-11-13**]\n\nDate of Birth: [**2108-1-26**] Sex: M\n\nService: NEUROLOGY\n\nCHIEF COMPLAINT: Weakness, inability to talk.\n\nHISTORY OF THE PRESENT ILLNESS: This is a 41-year-old\nAfrican-American male with a history of hypertension who was\nin his usual state of health until about 10:25 a.m. on the\nmorning of admission. He had gone to use the restroom and a\nfew minutes later his family found him slumped onto the\nfloor, apparently unable to talk and with weakness in his\nright arm and leg. EMS was called and he was brought into\nthe Emergency Department at [**Hospital1 18**].\n\nThe patient has not had strokes or previous similar symptoms.\nHe has a history of hypertension but no history of cardiac\nsymptoms. The patient was unable to talk for examination and\nno family members were present at the bedside and were not at\nhome (apparently they were on the way to the Emergency Room).\nThe history was obt..."
4,13,13,143045,2167-01-08 18:44:25,2167-01-12 10:43:31,39.86,,,"[41401, 4111, 25000, 4019, 2720]","[1, 2, 3, 4, 5]","[Crnry athrscl natve vssl, Intermed coronary synd, DMII wo cmp nt st uncntr, Hypertension NOS, Pure hypercholesterolem]","[Coronary atherosclerosis of native coronary artery, Intermediate coronary syndrome, Diabetes mellitus without mention of complication, type II or unspecified type, not stated as uncontrolled, Unspecified essential hypertension, Pure hypercholesterolemia]",1,1,1,0,0,0,Discharge summary,Report,"Admission Date: [**2167-1-8**] Discharge Date: [**2167-1-15**]\n\nDate of Birth: [**2127-2-27**] Sex: F\n\nService: Cardiac surgery\n\nCHIEF COMPLAINT: Chest pain.\n\nHISTORY OF PRESENT ILLNESS: This is a 39-year-old woman with\ndiabetes, hypertension, hyperlipidemia and obesity, with a\none to two months of chest burning with exertion. For the\npast six months, she has been participating in a new vigorous\nexercise program to lose weight. Her symptoms do gradually\nresolve with rest, but they have started to occur now with\nwalking. She does acknowledge that there is associated\nnausea, diaphoresis and shortness of breath. Now recently she\nstarted to get symptoms for the past two days while at rest.\nShe was referred for an outpatient exercise tolerance test,\nwhere she had chest pain and significant EKG changes. She was\nreferred to [**Hospital1 18**] for cardiac catheterization today, which\nrevealed significant left main artery disease. Just prior to\nher t..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
36883,61528,99985,176670,2181-01-29 05:33:34,2181-02-09 12:45:20,53.81,,,"[0389, 51881, 48241, 4870, 78552, V4281, 99592, 2449, 2724, 2859, 53081, V1072, 23871]","[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]","[Septicemia NOS, Acute respiratry failure, Meth sus pneum d/t Staph, Influenza with pneumonia, Septic shock, Trnspl status-bne marrow, Severe sepsis, Hypothyroidism NOS, Hyperlipidemia NEC/NOS, Anemia NOS, Esophageal reflux, Hx-hodgkin's disease, Essntial thrombocythemia]","[Unspecified septicemia, Acute respiratory failure, Methicillin susceptible pneumonia due to Staphylococcus aureus, Influenza with pneumonia, Septic shock, Bone marrow replaced by transplant, Severe sepsis, Unspecified acquired hypothyroidism, Other and unspecified hyperlipidemia, Anemia, unspecified, Esophageal reflux, Personal history of hodgkin's disease, Essential thrombocythemia]",1,1,1,0,0,0,Discharge summary,Report,"Admission Date: [**2181-1-27**] Discharge Date: [**2181-2-12**]\n\nDate of Birth: [**2127-4-8**] Sex: M\n\nService: MEDICINE\n\nAllergies:\nCefepime\n\nAttending:[**First Name3 (LF) 1936**]\nChief Complaint:\nfever\n\nMajor Surgical or Invasive Procedure:\nnone\n\nHistory of Present Illness:\nPt's a 53-year-old male patient of Dr. [**First Name4 (NamePattern1) **] [**Last Name (NamePattern1) **] is here\nfor evaluation of fever. The patient states fever began two\ndays ago along with a mild dry cough, fever was low-grade at\nthat time. Day of admission, pt noticed to be 101.8 has some\nchills as well. No shortness of breath, no chest pain. Denies\nany headache, ear aches, and some scratchy throat. The patient\ndenied any change in stools, ab pain, urinary sx, has some mild\nnausea on [**1-27**] but thought it more due to fever. No arthralgias\nor myalgias or rashes. The patient says cough is nonproductive.\n He did recently have URI symptoms be..."
36884,61529,99991,151118,2184-12-28 17:30:58,2184-12-31 20:56:20,47.73,,,"[56211, 0389, 5570, 5849, 99592, 56081, 78959, 5538, 7885, 40291, 4280, 71947, 5644, 25000, V0254, E8788, 27651]","[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17]","[Dvrtcli colon w/o hmrhg, Septicemia NOS, Ac vasc insuff intestine, Acute kidney failure NOS, Severe sepsis, Intestinal adhes w obstr, Ascites NEC, Hernia NEC, Oliguria & anuria, Hyp ht dis NOS w ht fail, CHF NOS, Joint pain-ankle, Postop GI funct dis NEC, DMII wo cmp nt st uncntr, Meth resis Staph carrier, Abn react-surg proc NEC, Dehydration]","[Diverticulitis of colon (without mention of hemorrhage), Unspecified septicemia, Acute vascular insufficiency of intestine, Acute kidney failure, unspecified, Severe sepsis, Intestinal or peritoneal adhesions with obstruction (postoperative) (postinfection), Other ascites, Hernia of other specified sites without mention of obstruction or gangrene, Oliguria and anuria, Unspecified hypertensive heart disease with heart failure, Congestive heart failure, unspecified, Pain in joint, ankle and foot, Other postoperative functional disorders, Diabetes mellitus without mention of complication, type II or unspecified type, not stated as uncontrolled, Carrier or suspected carrier of Methicillin resistant Staphylococcus aureus, Other specified surgical operations and procedures causing abnormal patient reaction, or later complication, without mention of misadventure at time of operation, Dehydration]",1,1,1,0,0,0,Discharge summary,Report,"Admission Date: [**2184-12-24**] Discharge Date: [**2185-1-5**]\n\nDate of Birth: [**2137-4-7**] Sex: M\n\nService: SURGERY\n\nAllergies:\nPatient recorded as having No Known Allergies to Drugs\n\nAttending:[**First Name3 (LF) 6346**]\nChief Complaint:\nRecurrent diverticulitis\n\nMajor Surgical or Invasive Procedure:\n[**2184-12-24**]: Laparoscopic sigmoid colectomy, splenic flexure\ntakedown, rigid sigmoidoscopy.\n.\n[**2184-12-28**]: Exploratory laparotomy, lysis of adhesions,\nomentectomy, washout of abdomen, drain placement and abdominal\nclosure.\n\n\nHistory of Present Illness:\nMr. [**Known lastname **] is a 47-year-old gentleman with a history of\ndiverticulitis in [**2177**] and again in [**2184-9-9**]. His last\nepisode required a seven day hospital stay on intravenous\nantibiotics. Subsequently, his symptoms resolved and he was\ndischarged on an oral regimen. He had a colonoscopy after his\nfirst attack of [**2177**]. He has no colonos..."
36885,61530,99992,197084,2144-07-25 18:04:42,2144-07-27 17:27:55,65.77,,,"[9999, 56881, 5772, 2851, 5849, 5799, 72992, 53081, 4019, 2721, 5699, 3004]","[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]","[Complic med care NEC/NOS, Hemoperitoneum, Pancreat cyst/pseudocyst, Ac posthemorrhag anemia, Acute kidney failure NOS, Intest malabsorption NOS, Nontrauma hema soft tiss, Esophageal reflux, Hypertension NOS, Pure hyperglyceridemia, Intestinal disorder NOS, Dysthymic disorder]","[Other and unspecified complications of medical care, not elsewhere classified, Hemoperitoneum (nontraumatic), Cyst and pseudocyst of pancreas, Acute posthemorrhagic anemia, Acute kidney failure, unspecified, Unspecified intestinal malabsorption, Nontraumatic hematoma of soft tissue, Esophageal reflux, Unspecified essential hypertension, Pure hyperglyceridemia, Unspecified disorder of intestine, Dysthymic disorder]",1,1,1,0,0,0,Discharge summary,Report,"Admission Date: [**2144-7-25**] Discharge Date: [**2144-7-28**]\n\nDate of Birth: [**2078-10-17**] Sex: F\n\nService: MEDICINE\n\nAllergies:\nBactrim / Norvasc / Lipitor / Cortisone\n\nAttending:[**First Name3 (LF) 2751**]\nChief Complaint:\nChief Complaint: anemia\n.\nReason for MICU transfer: retroperitoneal bleed\n\n\nMajor Surgical or Invasive Procedure:\nCoiling of superior gluteal artery\n\nHistory of Present Illness:\nMs. [**Known lastname 91180**] is a 65 YOF with history of GERD, HTN, and\nhypertriglyceridemia who was recently admitted from [**7-9**] to [**7-22**]\nafter being transferred from an OSH for intractable diarrhea.\nThis hospital course was complicated by PEA arrest in the\nsetting of pH 6.98 and she was intubated and had a right femoral\nCVL placed and was started on a bicarb drip. She was diagnosed\nwith postviral autoimmune enteropathy and was treated with TPN\nand eventually discharged on budesonide and methylprednisolone\nwit..."
36886,61531,99995,137810,2147-02-08 13:53:58,2147-02-10 17:46:30,88.70,5578.100556,,"[4414, 42833, 99812, 2851, 4241, 25000, 99811, 9961, E8798, 2724, V4581, 4280, V103, V1582, V5861, 4400, 41401]","[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17]","[Abdom aortic aneurysm, Ac on chr diast hrt fail, Hematoma complic proc, Ac posthemorrhag anemia, Aortic valve disorder, DMII wo cmp nt st uncntr, Hemorrhage complic proc, Malfunc vasc device/graf, Abn react-procedure NEC, Hyperlipidemia NEC/NOS, Aortocoronary bypass, CHF NOS, Hx of breast malignancy, History of tobacco use, Long-term use anticoagul, Aortic atherosclerosis, Crnry athrscl natve vssl]","[Abdominal aneurysm without mention of rupture, Acute on chronic diastolic heart failure, Hematoma complicating a procedure, Acute posthemorrhagic anemia, Aortic valve disorders, Diabetes mellitus without mention of complication, type II or unspecified type, not stated as uncontrolled, Hemorrhage complicating a procedure, Mechanical complication of other vascular device, implant, and graft, Other specified procedures as the cause of abnormal reaction of patient, or of later complication, without mention of misadventure at time of procedure, Other and unspecified hyperlipidemia, Aortocoronary bypass status, Congestive heart failure, unspecified, Personal history of malignant neoplasm of breast, Personal history of tobacco use, Long-term (current) use of anticoagulants, Atherosclerosis of aorta, Coronary atherosclerosis of native coronary artery]",1,1,1,0,0,0,Discharge summary,Report,"Admission Date: [**2147-2-8**] Discharge Date: [**2147-2-11**]\n\n\nService: SURGERY\n\nAllergies:\nZantac\n\nAttending:[**First Name3 (LF) 6088**]\nChief Complaint:\nAbdominal Aortic Aneurysm\n\nMajor Surgical or Invasive Procedure:\n[**2147-2-8**]: groin cutdown with mass excision and endovascular\nrepair of an aortic aneurysm\n\n\nHistory of Present Illness:\nMs. [**Known lastname **] is an 88-year-old female who is currently being\nevaluated for percutaneous aortic valve replacement due to\nsevere\naortic stenosis. She has a known infrarenal aortic aneurysm.\nThis was in the 4-5 cm range when it was first discovered\napproximately eight years ago. In [**Month (only) **] of this past year, she\nwas evaluated at the [**Hospital3 2358**] and was judged not to be an\nendovascular candidate. For that reason, repair was deferred.\nShe was recently hospitalized in [**Month (only) 956**] of this year for flash\npulmonary edema and back pain related to vertebral compr..."


### Step 6) Parse out files for opiates

In [25]:
# Search the notes
finddrugs.search(df_clean)

Using drugs from /Users/eightiesfanjan/Desktop/research/opioid_mimic_research/opiates.txt
Reading documents...
...index: 0. row_id: 2. subject_id: 3. hadm_id: 145834. 

...index: 100. row_id: 210. subject_id: 152. hadm_id: 117181. 

...index: 200. row_id: 362. subject_id: 265. hadm_id: 101608. 

...index: 300. row_id: 519. subject_id: 389. hadm_id: 134048. 

...index: 400. row_id: 689. subject_id: 525. hadm_id: 128280. 

...index: 500. row_id: 843. subject_id: 664. hadm_id: 181314. 

...index: 600. row_id: 991. subject_id: 780. hadm_id: 140935. 

...index: 700. row_id: 1171. subject_id: 905. hadm_id: 150569. 

...index: 800. row_id: 1353. subject_id: 1042. hadm_id: 177447. 

...index: 900. row_id: 1525. subject_id: 1183. hadm_id: 191513. 

...index: 1000. row_id: 1693. subject_id: 1324. hadm_id: 140065. 

...index: 1100. row_id: 1860. subject_id: 1459. hadm_id: 170103. 

...index: 1200. row_id: 2037. subject_id: 1604. hadm_id: 193058. 

...index: 1300. row_id: 2234. subject_id: 1769. h

...index: 12000. row_id: 21203. subject_id: 16727. hadm_id: 157755. 

...index: 12100. row_id: 21373. subject_id: 16855. hadm_id: 192793. 

...index: 12200. row_id: 21561. subject_id: 16993. hadm_id: 183854. 

...index: 12300. row_id: 21739. subject_id: 17125. hadm_id: 137179. 

...index: 12400. row_id: 21916. subject_id: 17260. hadm_id: 180007. 

...index: 12500. row_id: 22106. subject_id: 17414. hadm_id: 104313. 

...index: 12600. row_id: 22277. subject_id: 17566. hadm_id: 156549. 

...index: 12700. row_id: 22464. subject_id: 17712. hadm_id: 167975. 

...index: 12800. row_id: 22632. subject_id: 17835. hadm_id: 171801. 

...index: 12900. row_id: 22819. subject_id: 17981. hadm_id: 174389. 

...index: 13000. row_id: 23012. subject_id: 18134. hadm_id: 146200. 

...index: 13100. row_id: 23191. subject_id: 18287. hadm_id: 149224. 

...index: 13200. row_id: 23388. subject_id: 18456. hadm_id: 183887. 

...index: 13300. row_id: 23590. subject_id: 18624. hadm_id: 199390. 

...index: 13400. row

...index: 23800. row_id: 41942. subject_id: 41875. hadm_id: 160783. 

...index: 23900. row_id: 42114. subject_id: 42351. hadm_id: 174564. 

...index: 24000. row_id: 42243. subject_id: 42769. hadm_id: 102128. 

...index: 24100. row_id: 42410. subject_id: 43220. hadm_id: 117549. 

...index: 24200. row_id: 42576. subject_id: 43749. hadm_id: 129564. 

...index: 24300. row_id: 42728. subject_id: 44269. hadm_id: 171586. 

...index: 24400. row_id: 42870. subject_id: 44694. hadm_id: 144807. 

...index: 24500. row_id: 43013. subject_id: 45111. hadm_id: 143962. 

...index: 24600. row_id: 43164. subject_id: 45524. hadm_id: 189369. 

...index: 24700. row_id: 43312. subject_id: 46007. hadm_id: 178313. 

...index: 24800. row_id: 43466. subject_id: 46399. hadm_id: 127874. 

...index: 24900. row_id: 43616. subject_id: 46845. hadm_id: 173151. 

...index: 25000. row_id: 43771. subject_id: 47311. hadm_id: 154886. 

...index: 25100. row_id: 43924. subject_id: 47798. hadm_id: 140591. 

...index: 25200. row

...index: 35600. row_id: 59559. subject_id: 94166. hadm_id: 196702. 

...index: 35700. row_id: 59716. subject_id: 94687. hadm_id: 196208. 

...index: 35800. row_id: 59884. subject_id: 95147. hadm_id: 180407. 

...index: 35900. row_id: 60032. subject_id: 95611. hadm_id: 105027. 

...index: 36000. row_id: 60194. subject_id: 96072. hadm_id: 106637. 

...index: 36100. row_id: 60342. subject_id: 96482. hadm_id: 155301. 

...index: 36200. row_id: 60494. subject_id: 96863. hadm_id: 179477. 

...index: 36300. row_id: 60636. subject_id: 97291. hadm_id: 158386. 

...index: 36400. row_id: 60794. subject_id: 97782. hadm_id: 102294. 

...index: 36500. row_id: 60956. subject_id: 98276. hadm_id: 164637. 

...index: 36600. row_id: 61097. subject_id: 98713. hadm_id: 105148. 

...index: 36700. row_id: 61244. subject_id: 99138. hadm_id: 148579. 

...index: 36800. row_id: 61406. subject_id: 99611. hadm_id: 108679. 

Done analyzing 36888 documents in 177.74 seconds (207.54 docs/sec)
Summary file is in /Use

### Step 7) Generate flags for opiates
36,888 admissions
- 4414 with opiates upon admission
- 444 dupes somewhere to investigate


In [26]:
# load the output to a dataframe
medications = pd.read_csv('output.csv')
medications.head()

Unnamed: 0,ROW_ID,SUBJECT_ID,HADM_ID,HIST_FOUND,DEPRESSION,ADMIT_FOUND,DIS_FOUND,GEN_DEPRESS_MEDS_FOUND,GROUP,SSRI,MISC,hydromorphone,hydrocodone,oxycodone,morphine,fentanyl,tramadol,buprenorphine,methadone,oxymorphone,meperidine
0,2,3,145834,1,0,1,1,0,3,1,0,0,1,0,0,0,0,0,0,0,
1,3,4,185777,1,0,1,1,0,2,0,0,0,0,0,0,0,0,0,0,0,
2,5,6,107064,1,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,
3,9,9,150750,1,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,
4,13,13,143045,1,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,


In [27]:
len(medications.index)

36888

In [35]:
# opiates on admission
has_opiates = medications.GROUP == 3
medications.loc[has_opiates]

Unnamed: 0,ROW_ID,SUBJECT_ID,HADM_ID,HIST_FOUND,DEPRESSION,ADMIT_FOUND,DIS_FOUND,GEN_DEPRESS_MEDS_FOUND,GROUP,SSRI,MISC,hydromorphone,hydrocodone,oxycodone,morphine,fentanyl,tramadol,buprenorphine,methadone,oxymorphone,meperidine
0,2,3,145834,1,0,1,1,0,3,1,0,0,1,0,0,0,0,0,0,0,
9,19,20,157681,1,0,1,1,0,3,1,0,0,1,0,0,0,0,0,0,0,
23,35,34,144319,1,0,1,1,0,3,1,0,0,0,0,0,1,0,0,0,0,
25,39,36,165660,1,0,1,1,0,3,1,0,1,1,0,0,0,0,0,0,0,
38,60,59,104130,1,0,1,1,0,3,1,0,0,1,0,0,0,0,0,0,0,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
36861,61498,99893,128349,1,0,1,1,0,3,1,0,0,0,1,0,0,0,0,0,0,
36865,61503,99901,131711,1,0,1,1,0,3,1,0,0,0,0,0,1,0,0,0,0,
36869,61508,99923,164914,1,0,1,1,0,3,1,0,1,0,0,0,0,0,0,0,0,
36871,61513,99936,107913,1,0,1,1,0,3,1,0,1,0,0,0,0,0,0,0,0,


In [36]:
# medications on admission but dont have opiates listed
has_opiates = medications.GROUP == 2
medications.loc[has_opiates]

Unnamed: 0,ROW_ID,SUBJECT_ID,HADM_ID,HIST_FOUND,DEPRESSION,ADMIT_FOUND,DIS_FOUND,GEN_DEPRESS_MEDS_FOUND,GROUP,SSRI,MISC,hydromorphone,hydrocodone,oxycodone,morphine,fentanyl,tramadol,buprenorphine,methadone,oxymorphone,meperidine
1,3,4,185777,1,0,1,1,0,2,0,0,0,0,0,0,0,0,0,0,0,
5,15,17,194023,1,0,1,1,0,2,0,0,0,0,0,0,0,0,0,0,0,
6,17,18,188822,1,0,1,1,0,2,0,0,0,0,0,0,0,0,0,0,0,
8,18,19,109235,1,0,1,1,0,2,0,0,0,0,0,0,0,0,0,0,0,
11,20,21,109451,1,0,1,1,0,2,0,0,0,0,0,0,0,0,0,0,0,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
36883,61528,99985,176670,1,0,1,1,0,2,0,0,0,0,0,0,0,0,0,0,0,
36884,61529,99991,151118,1,0,1,1,0,2,0,0,0,0,0,0,0,0,0,0,0,
36885,61530,99992,197084,1,0,1,1,0,2,0,0,0,0,0,0,0,0,0,0,0,
36886,61531,99995,137810,1,0,1,1,0,2,0,0,0,0,0,0,0,0,0,0,0,


In [38]:
total_med_on_admission_recs = 25222+4414
total_med_on_admission_recs

29636

## Phase 2: Demographic Data
Steps: 
    1. Acquire ICD9 codes for all comorbities
    2. Construct flags based on all specified comorbidities


## Phase 3: Clinical Data
Steps: 
    1. Extract reason for admission from icd9 code array
    2. Locate SOFA score
    3. Locate use of ALL mechanical ventilation
    4. Create flag from mechanical ventilation
    5. Extract duration from ventilation
    6. Locate ICD9 codes for Pressors
    7. Create flags based on specified pressors
    8. Locate ICD9 codes for dialysis
    9. Creates flags for dialysis
