AutoPrognosis is a system for automating the design of ensembles of predictive modeling pipelines tailored for applications related to clinical prognosis. Each pipeline comprises various algorithms such as
- Imputation and data processing algorithms.
- Feature processing algorithms.
- Classification algorithms.
The system operates using a Bayesian optimization algorithm that relies on structured kernel learning to solve the high-dimensional pipeline optimization problem. Technical details can be found in our ICML paper. An explanation of our algorithm can also be found in this video presentation.
Please refer to < /doc/install.md > for installation instructions.
You can use AutoPrognosis through its command line interface as follows
$ python3 autoprognosis.py -i <data.csv> --target <response variable> -o <outdir> [ -n <num_sample> --it <num_iterations> ]
Once the above command is executed, the results can be found in two json files: /result.json and report.json. They can be shown with:
$ python3 autoprognosis_report.py -i <outdir>
A tutorial on how to use AutoPrognosis API can also be found in this Jupyter notebook.
Acquisition function LCB generates excesive warnings
$ The set cost function is ignored! LCB acquisition does not make sense with cost.
This issue results from interfacing with GPyOpt's acquisition functions. The issue can be ignored.
If you use our code in your research, please cite:
@inproceedings{alaa2018autoprognosis,
title={AutoPrognosis: Automated Clinical Prognostic Modeling via Bayesian Optimization with Structured Kernel Learning},
author={Alaa, Ahmed and Schaar, Mihaela},
booktitle={International Conference on Machine Learning},
pages={139--148},
year={2018}
}
[1] A. M. Alaa and M. van der Schaar, AutoPrognosis: Automated Clinical Prognostic Modeling via Bayesian Optimization with Structured Kernel Learning, ICML 2018.
[2] A. M. Alaa and M. van der Schaar, Prognostication and Risk Factors for Cystic Fibrosis via Automated Machine Learning, Nature Scientific Reports, 2018.
[3] A. M. Alaa and M. van der Schaar, Cardiovascular Disease Risk Prediction using Automated Machine Learning: A Prospective Study of 423,604 UK Biobank Participants, PLOS ONE, 2019.