(Team 14)
- Lennon Au-Yeung
- Chenyang Wang
- Shirley Zhang
This data analysis project was created in fulfillment of the team project requirements for DSCI 522 (Data Science Workflows), a course in the Master of Data Science program at the University of British Columbia.
In this project, we propose a Decision Tree classification model to predict whether an individual may be at low, mid, or high maternal health risk given some information about their age and health. Our final chosen model had a max depth of 29, and performed relatively well on unseen data with 203 observations. The test score was 0.823, with 53 out of 60 high risk targets predicted correctly. However, further steps can be taken to improve the model, such as tuning or other hyperparameters or grouping the target classes into high risk and 'other'.
The full data set was sourced from the UCI Machine Learning Repository (Dua and Graff 2017), and can be found here. A .csv format of the data can be directly downloaded using this link. The data can be attributed to Marzia Ahmed (Daffodil International University, Dhaka, Bangladesh) and Mohammod Kashem (Dhaka University of Science and Technology, Gazipur, Bangladesh) (Ahmed and Kashem, 2020).
To replicate the analysis done in this project, you follow the steps below:
- Install the dependencies listed under "Dependencies":
- Clone the repository (the following shows cloning through ssh keys):
git clone git@github.com:UBC-MDS/maternal_health_risk_predictor.git
- Move to the cloned directory:
cd maternal_health_risk_predictor
- Use one of the following options to run the rest of the analysis:
*Note: dockerfile is currently not able to be run fully - please use "Option B. Without using Docker"
Install docker. Then, run the following at the command line/terminal inside of the root directory of this repo:
docker pull wenlansz/maternal_health_risk_predictor
docker run --rm -v /$(pwd):/home/maternal_health_risk_predictor wenlansz/maternal_health_risk_predictor:latest make -C /home/maternal_health_risk_predictor all
To reset the repository to a clean slate with no intermediates or results (so that you can rerun the analysis), run the following at the command line/terminal inside of the root directory of this repo:
docker run --rm -v /$(pwd):/home/maternal_health_risk_predictor wenlansz/maternal_health_risk_predictor:latest make -C /home/maternal_health_risk_predictor clean
Install all of the dependencies listed below. Then, run the following at the command line/terminal inside of the root directory of this repo:
make all
To reset the repository to a clean slate with no intermediates or results, so that you can rerun the analysis:
make clean
Python 3.10 and Python packages:
- docopt==0.6.2
- pandas==1.5.1
- pandoc==2
- numpy==1.23.5
- re==2022.10.31
- altair==4.2.0
- altair_saver==0.5.0
- requests==2.22.0
- vl_convert
- graphviz==0.20.1
- nbconvert==7.2.5
- dataframe_image
Model building packages:
- sklearn==1.1.3
- scipy==1.3.2
R version 4.2.1 and R packages:
- rmarkdown==2.18
- knitr==1.26
- tidyverse==1.2.1
- kableExtra==1.3.4
- devtools
Link to final report in Markdown: final_report.md
The Maternal Health Risk Predictor materials are licensed under the Creative Commons Attribution 2.5 Canada License (CC BY 2.5 CA). If re-using/re-mixing please provide attribution and link to this webpage.
Further license information can be viewed in the LICENSE
file in the root folder of this repository.
The data set is attributed to Marzia Ahmed and Mohammod Kashem (Ahmed and Kashem, 2020) as well as the UCI Machine Learning Repository (Dua and Graff 2017).
Ahmed, M., Kashem, M.A., Rahman, M., Khatun, S. (2020). Review and Analysis of Risk Factor of Maternal Health in Remote Area Using the Internet of Things (IoT). In: , et al. InECCE2019. Lecture Notes in Electrical Engineering, vol 632. Springer, Singapore. https://doi.org/10.1007/978-981-15-2317-5_30
Ahmed, M. and Kashem, M. A. (2020). IoT Based Risk Level Prediction Model For Maternal Health Care In The Context Of Bangladesh. In: 2nd International Conference on Sustainable Technologies for Industry 4.0 (STI), 2020, pp. 1-6. https://doi.org/10.1109/STI50764.2020.9350320
Dua, Dheeru, and Casey Graff. (2017). “UCI Machine Learning Repository.” University of California, Irvine, School of Information; Computer Sciences. http://archive.ics.uci.edu/ml.
WHO. (2019). "Maternal mortality". World Health Organization. https://www.who.int/news-room/fact-sheets/detail/maternal-mortality