CSE6250-Project

Initialize Environment

conda env create -f environment.yml

Download MIMIC-III Data

https://physionet.org/content/mimiciii/1.4/

Insert the following files into the directory data/codes/:

- CPTEVENTS.csv
- DIAGNOSES_ICD.csv
- PROCEDURES_ICD.csv

Insert the following files into the directory data/notes/:

- NOTEEVENTS.csv

NOTE: The above files will be downloaded as .csv.gz files. You can use gunzip in your terminal to unzip them if you have a Mac.

Download i2b2 Obesity Challenge Data

https://portal.dbmi.hms.harvard.edu/projects/n2c2-nlp/

Insert the following files into the directory i2b2/Xml/:

- obesity_patient_records_test.xml
- obesity_patient_records_training.xml
- obesity_patient_records_training_2.xml
- obesity_standoff_annotations_test.xml
- obesity_standoff_annotations_training.xml
- obesity_standoff_annotations_training_addendum3.xml

Be sure to have empty directories data/lookups/, data/model/, i2b2/extracted_text/Test, i2b2/extracted_text/Train1+2. Your folder structure should look like this before preparing the dataset:

Prepare Dataset

python preprocess.py

This will take a few minutes to run. The prepared dataset will be saved as pickle files in data/model/.

Train Model

Train a model by changing any desired parameters in config.yml and run:

python train.py

To view model progress, open another terminal start Tensorboard:

tensorboard --logdir runs

Evaluate Model with i2b2 Obesity Challenge

Run to create training/test text files into i2b2/extracted_text/ from the i2b2/Xml/ annotation files.

python i2b2.py

Create pickled alphabet from the notes created in i2b2/extracted_text/.

python i2b2_preprocess.py

You will need to run this to be able to create the TF-IDF & SVD models from the MIMIC-III Patient-Tokens Matrix.

python MIMIC_SVD_preprocess.py

Lastly, the code below will run the needed models & metrics.

(1) Baseline SVM model with Bag of Tokens

(2) Baseline SVM model with a 1000-Dimensional Space SVD on a MIMIC-III Patient-Token Representation

(3) SVM Model with Learned Representation

python i2b2_svm.py

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
documents		documents
output		output
.gitignore		.gitignore
Dockerfile		Dockerfile
MIMIC_SVD_Vectorizer.py		MIMIC_SVD_Vectorizer.py
README.md		README.md
bigquery_extract.py		bigquery_extract.py
config.yml		config.yml
environment.yml		environment.yml
eval_utils.py		eval_utils.py
i2b2.py		i2b2.py
i2b2_preprocess.py		i2b2_preprocess.py
i2b2_svm.py		i2b2_svm.py
models.py		models.py
preprocess.py		preprocess.py
svm.py		svm.py
train.py		train.py
utils.py		utils.py
word2vec.py		word2vec.py

dcortese6/CSE6250-Project

Folders and files

Latest commit

History

Repository files navigation

CSE6250-Project

Initialize Environment

Download MIMIC-III Data

Download i2b2 Obesity Challenge Data

Prepare Dataset

Train Model

Evaluate Model with i2b2 Obesity Challenge

About

Resources

Stars

Watchers

Forks

Languages