Skip to content

clinicalml/deepDiagnosis

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
doc
August 1, 2016 17:53
May 31, 2016 14:40
August 4, 2016 13:54
May 31, 2016 14:40
May 31, 2016 14:40

deepDiagnosis

A torch package for learning diagnosis models from temporal patient data.

For more details please check:

  1. http://arxiv.org/abs/1608.00647

Narges Razavian, Jake Marcus, David Sontag,"Multi-task Prediction of Disease Onsets from Longitudinal Lab Tests", Machine Learning and Healthcare, 2016

  1. http://arxiv.org/abs/1511.07938

Narges Razavian, David Sontag, "Temporal Convolutional Neural Networks for Diagnosis from Lab Tests", ICLR 2016 Workshop track.

#Installation:

The package has the following dependencies:

Python: Numpy, CPickle

LUA: Torch, cunn, nn, cutorch, gnuplot, optim, and rnn

#Usage:

Run the following in order. Creating datasets can be done in parallel over train/test/valid tasks. Up to you.

There are sample input files (./sample_python_data) that you can use to test the package first.

1) python create_torch_tensors.py --x  sample_python_data/xtrain.pkl --y sample_python_data/ytrain.pkl --task 'train' --outdir ./sampledata/

2) python create_torch_tensors.py --x sample_python_data/xtest.pkl --y sample_python_data/ytest.pkl --task 'test' --outdir ./sampledata/

3) python create_torch_tensors.py --x sample_python_data/xvalid.pkl --y sample_python_data/yvalid.pkl --task 'valid' --outdir ./sampledata/


4) th create_batches.lua --task=train --input_dir=./sampledata --batch_output_dir=./sampleBatchDir 

5) th create_batches.lua --task=valid --input_dir=./sampledata --batch_output_dir=./sampleBatchDir 

6) th create_batches.lua --task=scoretrain --input_dir=./sampledata --batch_output_dir=./sampleBatchDir 

7) th create_batches.lua --task=test --input_dir=./sampledata --batch_output_dir=./sampleBatchDir


8) th train_and_validate.lua --task=train --input_batch_dir=./sampleBatchDir --save_models_dir=./sample_models/

Once the model is trained, run the following to get final evaluations on test set: (change the "lstm2016_05_29_10_11_01" into the model directory that you have created in step 8. Training directories have timestamp.)

9) th train_and_validate.lua --task=test --validation_dir=./sample_models/lstm2016_05_29_10_11_01/

Read the following for details on how to define your cohort and task.

#Input: Input should be one of the two formats described below:

Here is an Imaginary input and output for a single person in 2 input setting.

Read below for the details:

Format 1) Python nympy arrays (also support cPickle) of size

xtrain, xvalid, xtest: |labs| x |people| x |cohort time| for creating the input batches

ytrain, yvalid, ytest: |diseases| x |people| x |cohort time| for creating the output batches and inclusion/exclusion for each batch member

Format 2) Python numpy arrays (also support cPickle) of size

xtrain, xvalid, xtest: |Labs| x |people| x |cohort time| for the output

ytrain, yvalid, ytest: |diseases| x |people| for the output, where we do not have a concept of time.

(Note that in format 2 you can also provide exclusion-per-disease for input. If you need that version, let me know and I'll update that part immediately.)

Format 3) advanced shelve databases, for our internal use.

Please refer to https://github.com/clinicalml/ckd_progression for details.

#Prediction Models:

Currently the following models are supported. The details of the architectures are included in the citation paper below.

  1. Logistic Regression (--model=max_logit)

  1. Feedforward network (--model=mlp)

  1. Temporal Convolutional neural network over a backward window (--model=convnet)

  1. Convolutional neural network over input and time dimension (--model=convnet_mix)

  1. Multi-resolution temporal convolutional neural network (--model=multiresconvnet)

  1. LSTM network over the backward window (--model=lstmlast) (note: a version --model=lstmall is also available but we found training with lstmlast gives better results)

  1. Ensemble of multiple models (to be added soon)

#Synthetic Input for testing the package

You can use the following to create synthetic numpy arrays to test the package;

python create_synthetic_data.py --outdir ./sample_python_data --N 6000  --D 15 --T 48 --O 20

This code will create 3 datasets (train, test, valid) in the ./sample_python_data directory, with dimensions of: 5 x 2000 x 48 for each input x (xtrain, xtest, xvalid) and 20 x 2000 x 48 for each outcome set y. This synthetic data corresponds to input type 1 above. Follow steps 1-9 in the (Run) section above to test with this data, and feel free to test with other synthetic datasets.

#Citation: @article{razavian2016temporal, title={Multi-task Prediction of Disease Onsets from Longitudinal Lab Tests}, author={Razavian, Narges and Marcus,Jake and Sontag, David}, journal={1st Conference on Machine Learning and Health Care (MLHC)}, year={2016} }

@article{razavian2015temporal,
  title={Temporal Convolutional Neural Networks for Diagnosis from Lab Tests},
  author={Razavian, Narges and Sontag, David},
  journal={arXiv preprint arXiv:1511.07938},
  year={2015}
}

#Bug reports, questions, and Contact:

For any questions please email: narges razavian [narges.sharif@gmail.com or https://github.com/narges-rzv/]

About

A torch package for learning diagnosis models from temporal patient data.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published