In [3]:
%load_ext autoreload

In [4]:
%autoreload 2

In [6]:
%load_ext chapyter



In [7]:
import os
os.environ["NOTEBOOK_NAME"] = "11_gender_sepsis_steven.ipynb"

# Notebook Summary

## Goal: 
This notebook will attempt to replicate table 1 of a 2019 study on the association of sex with clinical outcome in critically ill sepsis patients. 
The study link is here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6687414/pdf/shk-52-146.pdf

## Summary Metrics: 
* All patients diagnosed with "sepsis," "severe sepsis," or "septic shock" were included
  - The specific methodology is not defined, but presumably ICD-9 codes were used based on the tables they referenced
* Patients under 18yo were excluded

### Step 1 – Pull the correct data from MIMIC

In [10]:
%%mimicSQL2 --notebook_name 11_gender_sepsis_steven.ipynb

I want to understand if patients over 18yo with a diagnosis of sepsis, severe sepsis, or septic shock have an association between age and mortality. Can you help pull the relevant data from MIMIC for me? 

Sure, I can help with that. Here is a SQL query that retrieves the relevant data from the MIMIC-III dataset. This query selects patients over 18 years old with a diagnosis of sepsis, severe sepsis, or septic shock and their mortality status.



This query returns the patient's ID, date of birth, gender, admission ID, admission time, discharge time, death time (if applicable), ICD9 code for the diagnosis, age at the time of admission, and a binary indicator for mortality (1 if the patient died during the hospital stay, 0 otherwise). The ICD9 codes '99591', '99592', and '78552' correspond to sepsis, severe sepsis, and septic shock, respectively.


In [11]:
%%runSQL --notebook_name 11_gender_sepsis_steven.ipynb

SELECT 
    patients.subject_id,
    patients.dob,
    patients.gender,
    admissions.hadm_id,
    admissions.admittime,
    admissions.dischtime,
    admissions.deathtime,
    diagnoses_icd.icd9_code,
    DATEDIFF('year', patients.dob, admissions.admittime) AS age,
    CASE 
        WHEN admissions.deathtime IS NOT NULL THEN 1
        ELSE 0
    END AS mortality
FROM 
    patients
JOIN 
    admissions ON patients.subject_id = admissions.subject_id
JOIN 
    diagnoses_icd ON admissions.hadm_id = diagnoses_icd.hadm_id
WHERE 
    DATEDIFF('year', patients.dob, admissions.admittime) > 18
    AND (
        diagnoses_icd.icd9_code LIKE '99591' OR 
        diagnoses_icd.icd9_code LIKE '99592' OR 
        diagnoses_icd.icd9_code LIKE '78552'
    )

Query failed! State: FAILED
Reason: FUNCTION_NOT_FOUND: line 22:5: Function 'datediff' not registered


AttributeError: 'bool' object has no attribute 'head'

In [13]:
%%mimicSQL2 --notebook_name 11_gender_sepsis_steven.ipynb

we got an error - can you fix it? 

Sure, I can help with that. Here is a SQL query that retrieves the relevant data from the MIMIC-III dataset. This query selects patients over 18 years old with a diagnosis of sepsis, severe sepsis, or septic shock and their mortality status.



This query returns the patient's ID, date of birth, gender, admission ID, admission time, discharge time, death time (if applicable), ICD9 code for the diagnosis, age at the time of admission, and a binary indicator for mortality (1 if the patient died during the hospital stay, 0 otherwise). The ICD9 codes '99591', '99592', and '78552' correspond to sepsis, severe sepsis, and septic shock, respectively.


## This query works!

In [14]:
%%runSQL --notebook_name 11_gender_sepsis_steven.ipynb

SELECT 
    p.subject_id, 
    p.dob, 
    p.gender, 
    a.hadm_id, 
    a.admittime, 
    a.dischtime, 
    a.deathtime, 
    d.icd9_code, 
    EXTRACT(YEAR FROM a.admittime) - EXTRACT(YEAR FROM p.dob) AS age, 
    CASE WHEN a.deathtime IS NOT NULL THEN 1 ELSE 0 END AS mortality
FROM 
    patients p
JOIN 
    admissions a ON p.subject_id = a.subject_id
JOIN 
    diagnoses_icd d ON a.hadm_id = d.hadm_id
WHERE 
    d.icd9_code IN ('99591', '99592', '78552') 
    AND EXTRACT(YEAR FROM a.admittime) - EXTRACT(YEAR FROM p.dob) > 18

Unnamed: 0,subject_id,dob,gender,hadm_id,admittime,dischtime,deathtime,icd9_code,age,mortality
0,66876,2057-06-23 00:00:00.000,M,122645,2133-02-09 14:00:00.000,2133-02-20 13:38:00.000,2133-02-20 13:38:00.000,99592,76,1
1,66876,2057-06-23 00:00:00.000,M,122645,2133-02-09 14:00:00.000,2133-02-20 13:38:00.000,2133-02-20 13:38:00.000,78552,76,1
2,66894,2069-03-07 00:00:00.000,F,157688,2126-08-10 21:10:00.000,2126-08-11 19:29:00.000,2126-08-11 19:29:00.000,99592,57,1
3,66894,2069-03-07 00:00:00.000,F,157688,2126-08-10 21:10:00.000,2126-08-11 19:29:00.000,2126-08-11 19:29:00.000,78552,57,1
4,66965,2116-06-07 00:00:00.000,M,177338,2198-01-22 22:20:00.000,2198-02-14 14:51:00.000,,99591,82,0


### Note:
I was unable to get mechanical ventilation time to work. I didn't try SOFA or SAPS, which are likely too complicated right now. I also did not try vasoactive drugs applied or dialysis demand, which would likely require agents since multi-step thinking is required. 

In [21]:
%%mimicSQL2 --notebook_name 11_gender_sepsis_steven.ipynb 

The last query you wrote is great, but I forgot a few pieces of data we need. Can you get ethnicity, admission type and location, first ICU service, date of ICU admission and discharge?

Absolutely, I can modify the query to include the additional data you need. Here is the updated SQL query:
This query now also returns the patient's ethnicity, admission type, admission location, first ICU service, and the date of ICU admission and discharge.


In [22]:
%%runSQL --notebook_name 11_gender_sepsis_steven.ipynb

SELECT 
    p.subject_id, 
    p.dob, 
    p.gender, 
    a.hadm_id, 
    a.admittime, 
    a.dischtime, 
    a.deathtime, 
    a.admission_type, 
    a.admission_location, 
    a.ethnicity,
    i.first_careunit,
    i.intime as icu_intime,
    i.outtime as icu_outtime,
    d.icd9_code, 
    EXTRACT(YEAR FROM a.admittime) - EXTRACT(YEAR FROM p.dob) AS age_at_admission, 
    CASE WHEN a.deathtime IS NOT NULL THEN 1 ELSE 0 END AS mortality
FROM 
    patients p
JOIN 
    admissions a ON p.subject_id = a.subject_id
JOIN 
    icustays i ON a.hadm_id = i.hadm_id
JOIN 
    diagnoses_icd d ON a.hadm_id = d.hadm_id
WHERE 
    (d.icd9_code = '99591' OR d.icd9_code = '99592' OR d.icd9_code = '78552') 
    AND EXTRACT(YEAR FROM a.admittime) - EXTRACT(YEAR FROM p.dob) > 18

Unnamed: 0,subject_id,dob,gender,hadm_id,admittime,dischtime,deathtime,admission_type,admission_location,ethnicity,first_careunit,icu_intime,icu_outtime,icd9_code,age_at_admission,mortality
0,250,2164-12-27 00:00:00.000,F,124271,2188-11-12 09:22:00.000,2188-11-22 12:00:00.000,2188-11-22 12:00:00.000,EMERGENCY,EMERGENCY ROOM ADMIT,BLACK/AFRICAN AMERICAN,MICU,2188-11-12 09:25:47.000,2188-11-23 01:24:45.000,78552,24,1
1,250,2164-12-27 00:00:00.000,F,124271,2188-11-12 09:22:00.000,2188-11-22 12:00:00.000,2188-11-22 12:00:00.000,EMERGENCY,EMERGENCY ROOM ADMIT,BLACK/AFRICAN AMERICAN,MICU,2188-11-12 09:25:47.000,2188-11-23 01:24:45.000,99592,24,1
2,269,2130-09-30 00:00:00.000,M,106296,2170-11-05 11:04:00.000,2170-11-27 18:00:00.000,,EMERGENCY,EMERGENCY ROOM ADMIT,WHITE,MICU,2170-11-05 11:05:29.000,2170-11-08 17:46:57.000,78552,40,0
3,269,2130-09-30 00:00:00.000,M,106296,2170-11-05 11:04:00.000,2170-11-27 18:00:00.000,,EMERGENCY,EMERGENCY ROOM ADMIT,WHITE,MICU,2170-11-05 11:05:29.000,2170-11-08 17:46:57.000,99592,40,0
4,689,2128-12-27 00:00:00.000,F,129080,2183-04-01 13:56:00.000,2183-04-09 12:00:00.000,,EMERGENCY,EMERGENCY ROOM ADMIT,BLACK/AFRICAN AMERICAN,MICU,2183-04-02 00:43:43.000,2183-04-04 22:30:03.000,99591,55,0


### Step 2 – Get a few extra data fields using python. I think we have everything we need via SQL

#### Additional Data Fields to Get: 
* 1-year mortality (defined as death within 1 year after ICU admission
* Hospital length of stay (calculated from date of admission)
* ICU length of stay
* 90-day mortality

#### Note: 
We do not have access to the date of death from the social security database to determine out-of-hospital mortality. We may need to exclude these patients when we do death calculations. 

In [25]:
%%mimicPython

Can you calculate a few additional fields for me? I'd like help with the following added to the table from MIMIC: 

- 1-year mortality (defined as death within 1 year after ICU admission
- Hospital length of stay (calculated from date of admission)
- ICU length of stay
- 90-day mortality

In the past you tried to use a DoD column but DoD does not exist. Please use only columns that exist in the dataframe

Sure, I can help with that. Here's how you can calculate the additional fields:

```python
import pandas as pd
from datetime import timedelta

# Convert the admission and discharge times to datetime
df['admittime'] = pd.to_datetime(df['admittime'])
df['dischtime'] = pd.to_datetime(df['dischtime'])
df['deathtime'] = pd.to_datetime(df['deathtime'])
df['icu_intime'] = pd.to_datetime(df['icu_intime'])
df['icu_outtime'] = pd.to_datetime(df['icu_outtime'])

# Calculate the hospital length of stay
df['hospital_los'] = (df['dischtime'] - df['admittime']).dt.total_seconds() / (60 * 60 * 24)

# Calculate the ICU length of stay
df['icu_los'] = (df['icu_outtime'] - df['icu_intime']).dt.total_seconds() / (60 * 60 * 24)

# Calculate 1-year mortality
df['1_year_mortality'] = ((df['deathtime'] - df['admittime']) <= pd.Timedelta(days=365)).astype(int)

# Calculate 90-day mortality
df['90_day_mortality'] = ((df['deathtime'] - df['admittime']) <= pd.Timedelta(days=90)).astype(int)
```

This code first co

In [24]:
%%mimicPython

can you fix the errors and try again? 

I apologize for the confusion earlier. It seems like the 'dod' column might not exist in the dataframe or it might not be in the correct format. Let's first check if the 'dod' column exists and if it's in the correct datetime format:

```python
# Check if 'dod' column exists
if 'dod' in df.columns:
    # Check the datatype of 'dod'
    if df['dod'].dtype == 'object':
        # Convert 'dod' to datetime
        df['dod'] = pd.to_datetime(df['dod'])
    else:
        print("'dod' is not in the correct format. It should be a datetime object.")
else:
    print("'dod' column does not exist in the dataframe.")
```

If the 'dod' column exists and is in the correct format, we can proceed with the calculations:

```python
# Calculate the hospital length of stay
df['hospital_los'] = (df['dischtime'] - df['admittime']).dt.total_seconds() / (60 * 60 * 24)

# Calculate the ICU length of stay
df['icu_los'] = (df['icu_outtime'] - df['icu_intime']).dt.total_seconds() / (60 * 60 * 24)

# Calculate 1-ye

NameError: name 'df' is not defined