The basic neural network model is based on a word-level Language Model using an RNN (see https://github.com/pytorch/examples).
- PyTorch version >= 1.7.0
- numpy version >= 1.19.4
- scipy version >= 1.6.0
- Python version >= 3.6
- You will also need an NVIDIA GPU and NCCL
These are the versions the code was tested with. It might be possible to use versions not listed above.
The procedure generate an esemble for sepsis prediction and reproduce the experiments from the paper is as follows:
- Prepare data
- Generate fully trained and patient specific models
- Make predictions on dev
- Grow ensemble
- Make predictions on test
- Calculate metrics
Clone the repository and download the data from the StatNLP web site. Extract the data to the code directory:
git clone https://github.com/statnlp/sepens
cd sepens
wget https://www.cl.uni-heidelberg.de/statnlpgroup/sepsisexp/SepsisExp.tar.gz
tar zxvf SepsisExp.tar.gz
Make train/dev/test:
. ./make_data.sh 0 # 0..3: split number for cross-validation
To generate the model that is trained on all data ('full model'):
. ./make_full_model.sh
To generate the patient specific models:
. ./make_models_perpat.sh
You might want to parallelize this step as each model is trained independently from the others.
Generate predictions for all patient specific (pool) models:
. ./inference_poolmodels.sh
Based on the mean suqared error and the correlation to existing ensemble members, grow an ensemble of patient specific models:
. ./grow_ensemble_perrone.py 0 | tee logs/grow_ensemble.log # 0..3: split number for cross-validation
tail -n1 logs/grow_ensemble.log > new_ensemle.py
sed "s/ /\n/g" new_ensemble.py | sed 's/[^0-9]*//g' | sed -r '/^\s*$/d' > new_ensemble.lst
Generates a python-set for inclusin in code and a text list for use in bash scripts.
Generate predictions for the fully trained model:
. ./inference_fullmodel.sh
Generate predictions for each ensemble model:
. ./inference_ensmodels.sh
This takes a lot of time. You might want to parallelize the step above.
Combine predictions for the uniform and the weighted ensemble:
. ./inference_ensemble.sh
Generate AUROC for fully trained and ensemble models for different time intervals:
python3 calc_auroc.py
Generate AUROC for fully trained and ensemble models for different time intervals and various privacy budgets:
python3 calc_auroc_laplace_all.py
Calculates AUROC and accuracy loss.
Apply a membership attack on the fully trained model for various privacy budgets.
python3 membership_fullmodel_epsilon_1k.py
Apply a membership attack on the uniform ensemble model for various privacy budgets.
python3 membership_ensemble_epsilon_alltrain_1k.py
If you use the data or the code, please cite as:
@inproceedings{schamoni2022,
author = {Schamoni, Shigehiko and Hagmann, Michael and Riezler, Stefan},
title = {Ensembling Neural Networks for Improved Prediction and Privacy in Early Diagnosis of Sepsis},
booktitle = {Proceedings of the 6th Machine Learning for Healthcare Conference},
year = {2022},
city = {Durham, NC},
country = {USA},
volume = {182},
series = {Proceedings of Machine Learning Research},
publisher = {PMLR},
url = {https://www.cl.uni-heidelberg.de/~schamoni/publications/dl/MLHC2022_Ensembling.pdf}
}