pydad - Machine Learning with Discharge Abstract Database using python
DAD is a CIHI database of hospital admissions. This is an experiment with the DAD enhanced dataset to create a RandomForest model for predicting the total length of hospital stay (TLOS) based on the derived CMG fields added by Western U.
This is just a learning project for Apache Spark and Spark ML using pyspark. The accuracy of the model taking all derived categorical variables is only 20%.
Parts of this material are based on the Canadian Institute for Health Information Discharge Abstract Database Research Analytic Files (sampled from fiscal years 2014-15). However the analysis, conclusions, opinions and statements expressed herein are those of the author(s) and not those of the Canadian Institute for Health Information.
Learning project, Not for actual use.
Try refining the model. PR welcome.