Python tools for healthcare machine learning
Clone or download
mmastand Merge pull request #479 from VijaySingh-GSLab/branch-1
Imputation of missing values using ML models. (Enhancement and Bug fix opened in #477, #478)
Latest commit cb82b94 Nov 6, 2018
Type Name Latest commit message Commit time
Failed to load latest commit information.
appveyor * removed all pre made database files since they are now generated in… Aug 16, 2017
conda-recipe Bump pandas version in build files Jun 26, 2017
docs Removing infographic; updating R/Py choice Apr 11, 2018
dox * remove built sphinx dox for cleanup - (they can be generated any time) Jun 5, 2017
healthcareai check-TestTopFactors Oct 30, 2018
.dockerignore updated gitignore Nov 1, 2016
.gitignore Merge branch 'master' into 193 May 17, 2017
.landscape.yml Changed to python 3 string format, and not 3.6 formatting Aug 29, 2017
.travis.yml Fix #248, added feature scaling, and a unit test. Consolidated remove… Aug 1, 2017
AUTHORS Cleanup of and documentation adding. Oct 19, 2016 Merge branch '363-add-datasets' of… Oct 23, 2017 Fix #180 Aug 3, 2017
Dockerfile updating from old hcpytools refs Nov 1, 2016
LICENSE Was forced to lowercase package name (here, at least) Oct 31, 2016 * cleaned up a few fixture files left over after updating in 85 from #… Jun 19, 2017 * removed conda installation instructions Oct 10, 2017
appveyor.yml * removed all pre made database files since they are now generated in… Aug 16, 2017 * updated readme with conda build instructions Mar 2, 2017
dev-requirements.txt * high cardinality checks impelmented and tested Sep 23, 2017
environment.yml Clean up environment.yml Aug 21, 2017 * updated examples and docs with new MSSQL db method name Sep 2, 2017 Improved verbose features Oct 9, 2018 Improved verbose features Oct 9, 2018 Improved verbose features Oct 9, 2018 Improved verbose features Oct 9, 2018 * DOI badge for all versions (not version specific) that redirects to… Sep 29, 2017
mkdocs.yml (* single char type Oct 16, 2017
setup.cfg * temporarily commented out python 3 unicode literals because it was … Feb 15, 2017 * high cardinality checks impelmented and tested Sep 23, 2017 updated install Oct 31, 2016


Code Health Appveyor build status Build Status

PyPI version DOI GitHub license

The aim of healthcareai is to streamline machine learning in healthcare. The package has two main goals:

  • Allow one to easily create models based on tabular data, and deploy a best model that pushes predictions to a database such as MSSQL, MySQL, SQLite or csv flat file.
  • Provide tools related to data cleaning, manipulation, and imputation.



  • If you haven't, install 64-bit Python 3.5 via the Anaconda distribution
    • Important When prompted for the Installation Type, select Just Me (recommended). This makes permissions later in the process much simpler.
  • Open the terminal (i.e., CMD or PowerShell, if using Windows)
  • Run conda install pyodbc
  • Upgrade to latest scipy (note that upgrade command took forever)
  • Run conda remove scipy
  • Run conda install scipy
  • Run conda install scikit-learn
  • Install healthcareai using one and only one of these three methods (ordered from easiest to hardest).
    1. Recommended: Install the latest release with pip run pip install healthcareai
    2. If you know what you're doing, and instead want the bleeding-edge version direct from our github repo, run pip install

Why Anaconda?

We recommend using the Anaconda python distribution when working on Windows. There are a number of reasons:

  • When running anaconda and installing packages using the conda command, you don't need to worry about dependency hell, particularly because packages aren't compiled on your machine; conda installs pre-compiled binaries.
  • A great example of the pain the using conda saves you is with the python package scipy, which, by their own admission "is difficult".


You may need to install the following dependencies:

  • sudo apt-get install python-tk
  • sudo pip install pyodbc
    • Note you'll might run into trouble with the pyodbc dependency. You may first need to run sudo apt-get install unixodbc-dev then retry sudo pip install pyodbc. Credit stackoverflow

Once you have the dependencies satisfied run pip install healthcareai or sudo pip install healthcareai


  • pip install healthcareai or sudo pip install healthcareai

Linux and macOS (via docker)

  • Install docker
  • Clone this repo (look for the green button on the repo main page)
  • cd into the cloned directory
  • run docker build -t healthcareai .
  • run the docker instance with docker run -p 8888:8888 healthcareai
  • You should then have a jupyter notebook available on http://localhost:8888.

Verify Installation

To verify that healthcareai installed correctly, open a terminal and run python. This opens an interactive python console (also known as a REPL). Then enter this command: from healthcareai import SupervisedModelTrainer and hit enter. If no error is thrown, you are ready to rock.

If you did get an error, or run into other installation issues, please let us know or better yet post on Stack Overflow (with the healthcare-ai tag) so we can help others along this process.

Getting started

  1. Read through the Getting Started section of the healthcareai-py documentation.

  2. Read through the example files to learn how to use the healthcareai-py API.

    • For examples of how to train and evaluate a supervised model, inspect and run either or using our sample diabetes dataset.
    • For examples of how to use a model to make predictions, inspect and run either or after running one of the first examples.
    • For examples of more advanced use cases, inspect and run
  3. To train and evaluate your own model, modify the queries and parameters in either or to match your own data.

  4. Decide what type of prediction output you want. See Choosing a Prediction Output Type for details.

  5. Set up your database tables to match the schema of the output type you chose.

  6. Congratulations! After running one of the example files with your own data, you should have a trained model. To use your model to make predictions, modify either or to use your new model. You can then run it to see the results.

For Issues

  • Double check that the code follows the examples here
  • If you're still seeing an error, create a post in Stack Overflow (with the healthcare-ai tag) that contains
    • Details on your environment (OS, database type, R vs Py)
    • Goals (ie, what are you trying to accomplish)
    • Crystal clear steps for reproducing the error
  • You can also log a new issue in the GitHub repo by clicking here