ML Prediction Models with Discharge Abstract Database
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
docs
notes
src/pydad
tests
.coveragerc
.gitignore
AUTHORS.rst
CHANGELOG.rst
LICENSE.txt
README.md
requirements.txt
setup.cfg
setup.py

README.md

pydad - Machine Learning with Discharge Abstract Database using python

About

DAD is a CIHI database of hospital admissions. This is an experiment with the DAD enhanced dataset to create a RandomForest model for predicting the total length of hospital stay (TLOS) based on the derived CMG fields added by Western U.

This is just a learning project for Apache Spark and Spark ML using pyspark. The accuracy of the model taking all derived categorical variables is only 20%.

Disclaimer

Parts of this material are based on the Canadian Institute for Health Information Discharge Abstract Database Research Analytic Files (sampled from fiscal years 2014-15). However the analysis, conclusions, opinions and statements expressed herein are those of the author(s) and not those of the Canadian Institute for Health Information.

Learning project, Not for actual use.

Try refining the model. PR welcome.

Checkout R package for DAD

Contributor(s)