Python tools for healthcare machine learning
Clone or download
mmastand Merge pull request #479 from VijaySingh-GSLab/branch-1
Imputation of missing values using ML models. (Enhancement and Bug fix opened in #477, #478)
Latest commit cb82b94 Nov 6, 2018
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
appveyor * removed all pre made database files since they are now generated in… Aug 16, 2017
conda-recipe Bump pandas version in build files Jun 26, 2017
docs Removing infographic; updating R/Py choice Apr 11, 2018
dox * remove built sphinx dox for cleanup - (they can be generated any time) Jun 5, 2017
healthcareai check-TestTopFactors Oct 30, 2018
.dockerignore updated gitignore Nov 1, 2016
.gitignore Merge branch 'master' into 193 May 17, 2017
.landscape.yml Changed to python 3 string format, and not 3.6 formatting Aug 29, 2017
.travis.yml Fix #248, added feature scaling, and a unit test. Consolidated remove… Aug 1, 2017
AUTHORS Cleanup of setup.py and documentation adding. Oct 19, 2016
CHANGELOG.md Merge branch '363-add-datasets' of https://github.com/jpo/healthcarea… Oct 23, 2017
CONTRIBUTING.md Fix #180 Aug 3, 2017
Dockerfile updating from old hcpytools refs Nov 1, 2016
LICENSE Was forced to lowercase package name (here, at least) Oct 31, 2016
MANIFEST.in * cleaned up a few fixture files left over after updating in 85 from #… Jun 19, 2017
README.md * removed conda installation instructions Oct 10, 2017
appveyor.yml * removed all pre made database files since they are now generated in… Aug 16, 2017
conda-build-pipeline.sh * updated readme with conda build instructions Mar 2, 2017
dev-requirements.txt * high cardinality checks impelmented and tested Sep 23, 2017
environment.yml Clean up environment.yml Aug 21, 2017
example_advanced.py * updated examples and docs with new MSSQL db method name Sep 2, 2017
example_classification_1.py Improved verbose features Oct 9, 2018
example_classification_2.py Improved verbose features Oct 9, 2018
example_regression_1.py Improved verbose features Oct 9, 2018
example_regression_2.py Improved verbose features Oct 9, 2018
how_to_release_a_version.md * DOI badge for all versions (not version specific) that redirects to… Sep 29, 2017
mkdocs.yml (* single char type Oct 16, 2017
setup.cfg * temporarily commented out python 3 unicode literals because it was … Feb 15, 2017
setup.py * high cardinality checks impelmented and tested Sep 23, 2017
tasks.py updated install Oct 31, 2016

README.md

healthcareai

Code Health Appveyor build status Build Status

PyPI version DOI GitHub license

The aim of healthcareai is to streamline machine learning in healthcare. The package has two main goals:

  • Allow one to easily create models based on tabular data, and deploy a best model that pushes predictions to a database such as MSSQL, MySQL, SQLite or csv flat file.
  • Provide tools related to data cleaning, manipulation, and imputation.

Installation

Windows

  • If you haven't, install 64-bit Python 3.5 via the Anaconda distribution
    • Important When prompted for the Installation Type, select Just Me (recommended). This makes permissions later in the process much simpler.
  • Open the terminal (i.e., CMD or PowerShell, if using Windows)
  • Run conda install pyodbc
  • Upgrade to latest scipy (note that upgrade command took forever)
  • Run conda remove scipy
  • Run conda install scipy
  • Run conda install scikit-learn
  • Install healthcareai using one and only one of these three methods (ordered from easiest to hardest).
    1. Recommended: Install the latest release with pip run pip install healthcareai
    2. If you know what you're doing, and instead want the bleeding-edge version direct from our github repo, run pip install https://github.com/HealthCatalyst/healthcareai-py/zipball/master

Why Anaconda?

We recommend using the Anaconda python distribution when working on Windows. There are a number of reasons:

  • When running anaconda and installing packages using the conda command, you don't need to worry about dependency hell, particularly because packages aren't compiled on your machine; conda installs pre-compiled binaries.
  • A great example of the pain the using conda saves you is with the python package scipy, which, by their own admission "is difficult".

Linux

You may need to install the following dependencies:

  • sudo apt-get install python-tk
  • sudo pip install pyodbc
    • Note you'll might run into trouble with the pyodbc dependency. You may first need to run sudo apt-get install unixodbc-dev then retry sudo pip install pyodbc. Credit stackoverflow

Once you have the dependencies satisfied run pip install healthcareai or sudo pip install healthcareai

macOS

  • pip install healthcareai or sudo pip install healthcareai

Linux and macOS (via docker)

  • Install docker
  • Clone this repo (look for the green button on the repo main page)
  • cd into the cloned directory
  • run docker build -t healthcareai .
  • run the docker instance with docker run -p 8888:8888 healthcareai
  • You should then have a jupyter notebook available on http://localhost:8888.

Verify Installation

To verify that healthcareai installed correctly, open a terminal and run python. This opens an interactive python console (also known as a REPL). Then enter this command: from healthcareai import SupervisedModelTrainer and hit enter. If no error is thrown, you are ready to rock.

If you did get an error, or run into other installation issues, please let us know or better yet post on Stack Overflow (with the healthcare-ai tag) so we can help others along this process.

Getting started

  1. Read through the Getting Started section of the healthcareai-py documentation.

  2. Read through the example files to learn how to use the healthcareai-py API.

    • For examples of how to train and evaluate a supervised model, inspect and run either example_regression_1.py or example_classification_1.py using our sample diabetes dataset.
    • For examples of how to use a model to make predictions, inspect and run either example_regression_2.py or example_classification_2.py after running one of the first examples.
    • For examples of more advanced use cases, inspect and run example_advanced.py.
  3. To train and evaluate your own model, modify the queries and parameters in either example_regression_1.py or example_classification_1.py to match your own data.

  4. Decide what type of prediction output you want. See Choosing a Prediction Output Type for details.

  5. Set up your database tables to match the schema of the output type you chose.

  6. Congratulations! After running one of the example files with your own data, you should have a trained model. To use your model to make predictions, modify either example_regression_2.py or example_classification_2.py to use your new model. You can then run it to see the results.

For Issues

  • Double check that the code follows the examples here
  • If you're still seeing an error, create a post in Stack Overflow (with the healthcare-ai tag) that contains
    • Details on your environment (OS, database type, R vs Py)
    • Goals (ie, what are you trying to accomplish)
    • Crystal clear steps for reproducing the error
  • You can also log a new issue in the GitHub repo by clicking here