In [1]:
# https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6687414/
2677 + 3457

6134

In [2]:
%load_ext autoreload

In [3]:
%autoreload 2

In [4]:
%reload_ext chapyter



In [5]:
#theoretically loads openai credentials from my .env file
from dotenv import load_dotenv
load_dotenv()

True

In [9]:
import os
os.environ["NOTEBOOK_NAME"] = "Hou-sepsis-paper.ipynb"

In [10]:
%%mimicSQL --notebook_name Hou-sepsis-paper.ipynb

Retrieve all data from PATIENTS table where patients have been diagnosed with “sepsis,” “severe sepsis,” or “septic shock”.




In [11]:
%%runSQL --notebook_name Hou-sepsis-paper.ipynb

SELECT * 
FROM patients 
WHERE subject_id IN (
    SELECT subject_id 
    FROM diagnoses_icd 
    WHERE icd9_code IN ('99591', '99592', '78552')
)

Unnamed: 0,row_id,subject_id,gender,dob,dod,dod_hosp,dod_ssn,expire_flag
0,235,250,F,2164-12-27 00:00:00.000,2188-11-22 00:00:00.000,2188-11-22 00:00:00.000,,1
1,252,269,M,2130-09-30 00:00:00.000,,,,0
2,650,689,F,2128-12-27 00:00:00.000,,,,0
3,651,690,M,2109-09-24 00:00:00.000,2189-07-30 00:00:00.000,2189-07-30 00:00:00.000,2189-07-30 00:00:00.000,1
4,662,703,M,2032-03-08 00:00:00.000,,,,0


In [13]:
%%mimicPython

A paper Im trying to replicate has n=10704 for "Patients were diagnosed as sepsis, severe sepsis, and septic shock". Any idea why were getting much lower numbers?

There could be several reasons for the discrepancy in numbers:

1. Different data sources: The paper you are trying to replicate might have used a different data source or a larger dataset. The MIMIC-III database is a large, freely-available database comprising deidentified health-related data associated with over forty thousand patients who stayed in critical care units of the Beth Israel Deaconess Medical Center between 2001 and 2012. If the paper used a different database or included more years, it could have a larger number of patients.

2. Different inclusion criteria: The paper might have used different inclusion criteria for sepsis, severe sepsis, and septic shock. For example, they might have included patients with a broader range of ICD codes, or used clinical criteria (like SIRS criteria or SOFA score) in addition to ICD codes.

3. Different definitions: The definitions of sepsis, severe sepsis, and septic shock have changed over time. If the paper used older definitions, it 

In [14]:
%%mimicSQL --notebook_name Hou-sepsis-paper.ipynb

Can you show me a table that shows the subject_id, and all the ICD codes they ever had?




In [15]:
%%runSQL --notebook_name Hou-sepsis-paper.ipynb

SELECT subject_id, STRING_AGG(icd9_code, ', ') AS icd_codes
FROM diagnoses_icd
GROUP BY subject_id


Query failed! State: FAILED
Reason: FUNCTION_NOT_FOUND: line 1:20: Function 'string_agg' not registered


AttributeError: 'bool' object has no attribute 'head'

In [16]:
%%mimicSQL --notebook_name Hou-sepsis-paper.ipynb

Can you show me a table that shows the subject_id, and all the ICD codes they ever had? Please fix the previous error.

SQL

SELECT subject_id, GROUP_CONCAT(icd9_code, ', ') AS icd_codes

FROM diagnoses_icd

GROUP BY subject_id




In [17]:
%%runSQL --notebook_name Hou-sepsis-paper.ipynb

SELECT subject_id, GROUP_CONCAT(icd9_code, ', ') AS icd_codes
FROM diagnoses_icd
GROUP BY subject_id

Query failed! State: FAILED
Reason: FUNCTION_NOT_FOUND: line 1:20: Function 'group_concat' not registered


AttributeError: 'bool' object has no attribute 'head'

In [18]:
%%mimicSQL --notebook_name Hou-sepsis-paper.ipynb

Please try again to fix that error!

SQL

SELECT subject_id, ARRAY_JOIN(array_agg(icd9_code), ', ') AS icd_codes

FROM diagnoses_icd

GROUP BY subject_id




In [19]:
%%runSQL --notebook_name Hou-sepsis-paper.ipynb

SELECT subject_id, ARRAY_JOIN(array_agg(icd9_code), ', ') AS icd_codes
FROM diagnoses_icd
GROUP BY subject_id

Unnamed: 0,subject_id,icd_codes
0,117,"5715, 7895, 07054, 2875, 4280, 2851, 2765, 401..."
1,118,"V3000, V053"
2,120,"V3000, V053, V290"
3,127,"19889, 1972, V103"
4,129,"85300, 78039, E8889"


In [20]:
print(df.shape)
#('99591', '99592', '78552')

(46474, 2)


In [26]:
%%mimicPython

How many subjects had icd code 99591 or 99592 or 78552?




In [27]:
##AI-generated-code


# Create a new column in the dataframe that checks if any of the icd codes '99591', '99592', '78552' are in the 'icd_codes' column for each subject
df['has_sepsis'] = df['icd_codes'].apply(lambda x: any(code in x for code in ['99591', '99592', '78552']))

# Count the number of subjects that have any of the icd codes '99591', '99592', '78552'
num_subjects_with_sepsis = df['has_sepsis'].sum()

print(num_subjects_with_sepsis)


4685


In [29]:
%%mimicPython

Give me a set of all the ICD9 codes we see in this table.




In [32]:
##AI-generated-code


# Split the 'icd_codes' column on ', ' to create a list of icd codes for each subject
df['icd_codes_list'] = df['icd_codes'].apply(lambda x: x.split(', '))

# Flatten the list of lists to get a single list of all icd codes
all_icd_codes = [code for sublist in df['icd_codes_list'] for code in sublist]

# Convert the list to a set to get the unique icd codes
unique_icd_codes = set(all_icd_codes)

print(len(unique_icd_codes))


6984


In [33]:
%%mimicPython

Thats a lot! How many of these are affiliated with sepsis?! How can we tell?

Determining which of these ICD codes are affiliated with sepsis can be a bit tricky. The ICD codes you've mentioned ('99591', '99592', '78552') are directly related to sepsis, severe sepsis, and septic shock. However, there may be other ICD codes that are indirectly related to sepsis. For example, certain types of infections or organ dysfunctions might be associated with sepsis.

One way to find out which ICD codes are related to sepsis is to look up each code in the ICD-9-CM manual or an online ICD-9-CM database. This can be time-consuming if you have a lot of codes.

Another way is to use a mapping file that maps ICD codes to clinical conditions. For example, the Healthcare Cost and Utilization Project (HCUP) provides a set of tools called Clinical Classifications Software (CCS) that groups ICD codes into clinically meaningful categories. You could use the CCS for ICD-9-CM to find out which of your ICD codes are related to sepsis.

Here's some Python code that counts the number of un

In [34]:
##AI-generated-code


# Define a set of ICD codes that are directly related to sepsis
sepsis_icd_codes = {'99591', '99592', '78552'}

# Count the number of unique ICD codes in your dataframe that are directly related to sepsis
num_sepsis_icd_codes = len([code for code in unique_icd_codes if code in sepsis_icd_codes])

print(num_sepsis_icd_codes)

3


In [36]:
%%mimicSQL --notebook_name Hou-sepsis-paper.ipynb

Is there anyway to get data on sepsis patients from MIMIC-III without using the ICD-9 codes? Any other table or columns you think might be useful?

Yes, there are other ways to identify sepsis patients in the MIMIC-III database without using ICD-9 codes. 

One approach is to use the clinical criteria for sepsis. According to the Sepsis-3 definition, sepsis is characterized by a suspected or confirmed infection and an acute increase of ≥ 2 SOFA (Sequential Organ Failure Assessment) points. The SOFA score can be calculated using data from the `chartevents` table (for vital signs and other clinical measurements), the `labevents` table (for laboratory test results), and the `prescriptions` table (for medications indicating treatment for infection).

Another approach is to use the `microbiologyevents` table, which contains data on microbiology tests and their results. Patients with positive culture results might have an infection, which is a key component of sepsis.

Here's an example of a SQL query that retrieves data on patients who have a suspected infection based on the `microbiologyevents` table:

Please note that these methods ar

In [37]:
%%runSQL --notebook_name Hou-sepsis-paper.ipynb


SELECT subject_id
FROM microbiologyevents
WHERE org_name IS NOT NULL


Unnamed: 0,subject_id
0,96
1,101
2,105
3,106
4,106


In [38]:
print(df.shape)

(327688, 1)


In [39]:
from IPython.core.display import display, HTML

# Define JavaScript code to change the background color of a cell
js_code = """
<script>
function changeCellColor(cellIdx, color) {
    var cells = Jupyter.notebook.get_cells();
    var cellElement = cells[cellIdx].element[0];
    cellElement.style.backgroundColor = color;
}
// Change the background color of the next cell (assuming the current cell index is known)
changeCellColor(39, 'yellow');
</script>
"""

# Display the JavaScript code
display(HTML(js_code))

  from IPython.core.display import display, HTML


In [40]:
from IPython.core.display import display, HTML

# Define JavaScript code to change the background color of a cell
js_code = """
<script>
function changeCellColor(cellIdx, color) {
    var cells = Jupyter.notebook.get_cells();
    var cellElement = cells[cellIdx].element[0];
    cellElement.style.backgroundColor = color;
}
// Change the background color of the next cell (assuming the current cell index is known)
changeCellColor(39, 'yellow');
</script>
"""

# Display the JavaScript code
display(HTML(js_code))

  from IPython.core.display import display, HTML


In [46]:
from IPython.display import HTML, display

def set_background(color):    
    script = (
        "var cell = this.closest('.jp-CodeCell');"
        "var editor = cell.querySelector('.jp-Editor');"
        "editor.style.background='{}';"
        "this.parentNode.removeChild(this)"
    ).format(color)
    
    display(HTML('<img src onerror="{}" style="display:none">'.format(script)))

In [54]:
set_background('rgba(148, 0, 211, 0.15)')