In [1]:
# this notebook is based on the following study
# https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6687414/

In [63]:
# make sure changes in code take place immediately
# this is a developer setting, don't need it for production

%load_ext autoreload
%autoreload 2


%reload_ext chapyter

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [3]:
import os
os.environ["NOTEBOOK_NAME"] = "Harutyunyan_4tests_Sept29-Copy1.ipynb"

## Let the study begin!

In [4]:
%%mimicSQL

How can I join patients and chartevents tables in MIMIC-III?

You can join the `patients` and `chartevents` tables on the `subject_id` column, which is common to both tables. Here is a basic SQL query to do this:

This will return all columns from both tables where the `subject_id` matches in both tables. You can replace the `*` with specific column names if you only want to return certain columns.


In [7]:
%%runSQL 


SELECT *
FROM patients p
JOIN chartevents c
ON p.subject_id = c.subject_id


KeyboardInterrupt: 

In [None]:
%%mimicSQL

Great, lets create a table that joins them.

In [None]:
%%mimicSQL

same thing, but include icustay, as well as chart events for

1. Capillary refill rate
2. Diastolic blood pressure
3. Fraction inspired oxygen
4. Glascow coma scale eye opening
5. Glascow coma scale motor response
6. Glascow coma scale total
7. Glascow coma scale verbal response
8. Glucose
9. Heart Rate
10. Height
11. Mean blood pressure
12. Oxygen saturation
13. Respiratory rate
14. Systolic blood pressure
15. Temperature
16. Weight
17. pH

In [68]:
%%mimicRevealChatHistory

Reveal all of these tokens!

**Clinical Researcher:** # this notebook is based on the following study
# https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6687414/

**AI Research Assistant:** None
**Clinical Researcher:** %%mimicSQL --- How can I join patients and chartevents tables in MIMIC-III?

**AI Research Assistant:** You can join the `patients` and `chartevents` tables on the `subject_id` column, which is common to both tables. Here is a basic SQL query to do this:

This will return all columns from both tables where the `subject_id` matches in both tables. You can replace the `*` with specific column names if you only want to return certain columns.
**Clinical Researcher:** %%runSQL  --- 
SELECT *
FROM patients p
JOIN chartevents c
ON p.subject_id = c.subject_id

**AI Research Assistant:** None
**Clinical Researcher:** %%mimicSQL --- Great, lets create a table that joins them.

**AI Research Assistant:** None
**Clinical Researcher:** %%mimicSQL --- same thing, but include icustay, as well as chart events for ---

In [8]:
%%runSQL 


SELECT p.subject_id, i.icustay_id, c.itemid, c.charttime, c.value, c.valuenum
FROM patients p
JOIN icustays i
ON p.subject_id = i.subject_id
JOIN chartevents c
ON p.subject_id = c.subject_id
WHERE c.itemid IN (3348, 8368, 3420, 184, 223901, 198, 223900, 807, 211, 920, 456, 646, 618, 51, 678, 763, 780)
LIMIT 5000


Unnamed: 0,subject_id,icustay_id,itemid,charttime,value,valuenum
0,667,277996,618,2123-08-27 04:00:00.000,17.0,17.0
1,667,277996,456,2123-08-27 04:00:00.000,62.666698455810554,62.66669845581055
2,667,277996,211,2123-08-27 04:00:00.000,83.0,83.0
3,667,277996,807,2123-08-27 03:00:00.000,112.0,112.0
4,667,277996,678,2123-08-27 03:00:00.000,97.8000030517578,97.8000030517578


In [9]:
print(df.shape)

(4995, 6)


In [None]:
%%mimicSQL

Same table, except also include whether that subject died in the hospital

In [None]:
%%runSQL 


SELECT p.subject_id, i.icustay_id, c.itemid, c.charttime, c.value, c.valuenum, a.hospital_expire_flag
FROM patients p
JOIN icustays i
ON p.subject_id = i.subject_id
JOIN chartevents c
ON p.subject_id = c.subject_id
JOIN admissions a
ON p.subject_id = a.subject_id
WHERE c.itemid IN (3348, 8368, 3420, 184, 223901, 198, 223900, 807, 211, 920, 456, 646, 618, 51, 678, 763, 780)
LIMIT 50000


In [None]:
%%mimicPython

Can you take this table, and replace itemid with the relevant string-name for that item?

In [None]:
##AI-generated-code


# Create a dictionary where the keys are the itemid and the values are the corresponding string-names
item_dict = {3348: 'Capillary refill rate', 8368: 'Diastolic blood pressure', 3420: 'Fraction inspired oxygen', 
             184: 'Glascow coma scale eye opening', 223901: 'Glascow coma scale motor response', 198: 'Glascow coma scale total', 
             223900: 'Glascow coma scale verbal response', 807: 'Glucose', 211: 'Heart Rate', 920: 'Height', 
             456: 'Mean blood pressure', 646: 'Oxygen saturation', 618: 'Respiratory rate', 51: 'Systolic blood pressure', 
             678: 'Temperature', 763: 'Weight', 780: 'pH'}

# Replace the itemid in the dataframe with the corresponding string-names
df['itemid'] = df['itemid'].replace(item_dict)

print(df.head())


In [None]:
%%mimicPython

Great! What are the different itemids we have now?

In [None]:
##AI-generated-code


# Get the unique values in the 'itemid' column
unique_itemids = df['itemid'].unique()

print(unique_itemids)


In [None]:
%%mimicPython

Great - lets take the df above, and group by icustay_id and hospital_expire_flag, and make new columns for each itemid

In [None]:
##AI-generated-code


# Pivot the dataframe to create new columns for each itemid
df_pivot = df.pivot_table(index=['icustay_id', 'hospital_expire_flag'], columns='itemid', values='valuenum', aggfunc='mean')

# Reset the index
df_pivot.reset_index(inplace=True)

print(df_pivot.head())


In [None]:
df_pivot

In [None]:
%%mimicPython

Now, I want to make a simple model that predicts whether the person dies in the hospital. Help me decide on a model and train it.

In [None]:
##AI-generated-code


from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer

# Handle missing values
imputer = SimpleImputer(strategy='mean')
df_pivot_imputed = imputer.fit_transform(df_pivot)

# Split the data into features and target
X = df_pivot_imputed[:, :-1]  # Features
y = df_pivot_imputed[:, -1]  # Target

# Standardize the features
scaler = StandardScaler()
X = scaler.fit_transform(X)

# Split the data into a training set and a test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a Logistic Regression model
model = LogisticRegression()

# Train the model
model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = model.predict(X_test)

# Print the accuracy of the model
print('Accuracy:', accuracy_score(y_test, y_pred))

# Print the confusion matrix
print('Confusion Matrix:\n', confusion_matrix(y_test, y_pred))

# Print the classification report
print('Classification Report:\n', classification_report(y_test, y_pred))


In [None]:
%%mimicPython

ValueError: Unknown label type: continuous. Maybe you are trying to fit a classifier, which expects discrete classes on a regression target with continuous values.

In [None]:
##AI-generated-code


# Handle missing values
imputer = SimpleImputer(strategy='mean')
df_pivot_imputed = pd.DataFrame(imputer.fit_transform(df_pivot), columns = df_pivot.columns)

# Convert 'hospital_expire_flag' to int type
df_pivot_imputed['hospital_expire_flag'] = df_pivot_imputed['hospital_expire_flag'].astype(int)

# Split the data into features and target
X = df_pivot_imputed.drop('hospital_expire_flag', axis=1)  # Features
y = df_pivot_imputed['hospital_expire_flag']  # Target

# Standardize the features
scaler = StandardScaler()
X = scaler.fit_transform(X)

# Split the data into a training set and a test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a Logistic Regression model
model = LogisticRegression()

# Train the model
model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = model.predict(X_test)

# Print the accuracy of the model
print('Accuracy:', accuracy_score(y_test, y_pred))

# Print the confusion matrix
print('Confusion Matrix:\n', confusion_matrix(y_test, y_pred))

# Print the classification report
print('Classification Report:\n', classification_report(y_test, y_pred))


In [None]:
print(df_pivot.shape)

In [45]:
import tiktoken

encoding = tiktoken.encoding_for_model("gpt-3.5-turbo")
encoding.encode("tiktoken is great!")


[83, 1609, 5963, 374, 2294, 0]

In [50]:
def num_tokens_from_string(string, model_name):

    """Returns the number of tokens in a text string."""
    
    encoding = tiktoken.encoding_for_model(model_name)
    num_tokens = len(encoding.encode(string))
    return num_tokens

num_tokens_from_string("tiktoken is great!", "gpt-3.5-turbo")


6

In [52]:
my_string = """
============================================================
**Clinical Researcher:** # this notebook is based on the following study
# https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6687414/

**AI Research Assistant:** None
============================================================
**Clinical Researcher:** %%mimicSQL --- How can I join patients and chartevents tables in MIMIC-III?

**AI Research Assistant:** You can join the `patients` and `chartevents` tables on the `subject_id` column, which is common to both tables. Here is a basic SQL query to do this:

This will return all columns from both tables where the `subject_id` matches in both tables. You can replace the `*` with specific column names if you only want to return certain columns.
============================================================
**Clinical Researcher:** %%runSQL  --- 
SELECT *
FROM patients p
JOIN chartevents c
ON p.subject_id = c.subject_id

**AI Research Assistant:** None
============================================================
**Clinical Researcher:** %%mimicSQL --- Great, lets create a table that joins them.

**AI Research Assistant:** None
============================================================
**Clinical Researcher:** %%mimicSQL --- same thing, but include icustay, as well as chart events for --- 1. Capillary refill rate
2. Diastolic blood pressure
3. Fraction inspired oxygen
4. Glascow coma scale eye opening
5. Glascow coma scale motor response
6. Glascow coma scale total
7. Glascow coma scale verbal response
8. Glucose
9. Heart Rate
10. Height
11. Mean blood pressure
12. Oxygen saturation
13. Respiratory rate
14. Systolic blood pressure
15. Temperature
16. Weight
17. pH

**AI Research Assistant:** None
============================================================
**Clinical Researcher:** %%mimicRevealChatHistory --- Reveal

**AI Research Assistant:**
"""

In [57]:
num_tokens_from_string(my_string, "gpt-3.5-turbo")


396

In [61]:
tiktoken.__spec__

ModuleSpec(name='tiktoken', loader=<_frozen_importlib_external.SourceFileLoader object at 0x29319a1a0>, origin='/Users/emmettgoodman/miniconda3/lib/python3.10/site-packages/tiktoken/__init__.py', submodule_search_locations=['/Users/emmettgoodman/miniconda3/lib/python3.10/site-packages/tiktoken'])

In [70]:
%%mimicRevealChatHistory

Reveal all of these tokens!!

**Clinical Researcher:** # this notebook is based on the following study
# https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6687414/

**AI Research Assistant:** None
**Clinical Researcher:** %%mimicSQL --- How can I join patients and chartevents tables in MIMIC-III?

**AI Research Assistant:** You can join the `patients` and `chartevents` tables on the `subject_id` column, which is common to both tables. Here is a basic SQL query to do this:

This will return all columns from both tables where the `subject_id` matches in both tables. You can replace the `*` with specific column names if you only want to return certain columns.
**Clinical Researcher:** %%runSQL  --- 
SELECT *
FROM patients p
JOIN chartevents c
ON p.subject_id = c.subject_id

**AI Research Assistant:** None
**Clinical Researcher:** %%mimicSQL --- Great, lets create a table that joins them.

**AI Research Assistant:** None
**Clinical Researcher:** %%mimicSQL --- same thing, but include icustay, as well as chart events for ---

In [71]:
%%mimicRevealChatHistory

Reveal all of these tokens!!!

**Clinical Researcher:** # this notebook is based on the following study
# https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6687414/

**AI Research Assistant:** None
**Clinical Researcher:** %%mimicSQL --- How can I join patients and chartevents tables in MIMIC-III?

**AI Research Assistant:** You can join the `patients` and `chartevents` tables on the `subject_id` column, which is common to both tables. Here is a basic SQL query to do this:

This will return all columns from both tables where the `subject_id` matches in both tables. You can replace the `*` with specific column names if you only want to return certain columns.
**Clinical Researcher:** %%runSQL  --- 
SELECT *
FROM patients p
JOIN chartevents c
ON p.subject_id = c.subject_id

**AI Research Assistant:** None
**Clinical Researcher:** %%mimicSQL --- Great, lets create a table that joins them.

**AI Research Assistant:** None
**Clinical Researcher:** %%mimicSQL --- same thing, but include icustay, as well as chart events for ---

In [72]:
%%mimicSQL 

SQL query to retrieve all data from patients

InvalidRequestError: This model's maximum context length is 8192 tokens. However, your messages resulted in 13964 tokens. Please reduce the length of the messages.