Maternal Health Data And Health Risk Level

Question: To what extent can we use model data from “sensing enabled technology”: blood pressure, blood glucose level, body temperature and heart rate, to classify and predict Bangladeshi women’s maternal health risk level?

Fetal and maternal fatality is a pervasive problem especially in developing nations, and especially felt by disadvantaged classes. The Maternal Health Risk Dataset from UC Irvine Machine Learning Repository has tracked sensory data from pregnant women in rural villages in Bangladesh regarding health parameters like blood pressure and also their respective maternal health risk. 

Our data has the following variables (columns):

- Age 
- Systolic Blood Pressure (SystolicBP)
- (Diastolic Blood Pressure) DiastolicBP
- Blood Glucose Level (BS)
- Body Temperature (BodyTemp)
- Resting Heart Rate (HeartRate)
- Predicted Risk Intensity Level during pregnancy (RiskLevel)

We will use tidymodels to perform K-nearest neighbours to classify maternal health risk as either high, medium, or low, as predicted by all the remaining health variables. We will be doing cross-validation by splitting the same data into training and test sets to get a more robust estimate of accuracy. Understanding the association between health data and maternal risk can help medical professionals advise women on pregnancy and also help them understand what risk factors to look out for to prepare for them in advance.  

References

10.24432/C5DP5D

Ahmed, M., Kashem, M.A., Rahman, M., & Khatun, S. (2020). Review and Analysis of Risk Factor of Maternal Health in Remote Area Using the Internet of Things (IoT).


In [2]:
library(repr)
library(tidyverse)
library(tidymodels)
options(repr.matrix.max.rows = 10)
source('tests.R')
source("cleanup.R")

── [1mAttaching core tidyverse packages[22m ──────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.3     [32m✔[39m [34mreadr    [39m 2.1.4
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.0
[32m✔[39m [34mggplot2  [39m 3.4.3     [32m✔[39m [34mtibble   [39m 3.2.1
[32m✔[39m [34mlubridate[39m 1.9.2     [32m✔[39m [34mtidyr    [39m 1.3.0
[32m✔[39m [34mpurrr    [39m 1.0.2     
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mℹ[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors
── [1mAttaching packages[22m ────────────────────────────────────── tidymodels 1.1.1 ──

[32m✔[39m [34mbroom       [39m 1.0.5     [32m✔[39m [34mrsample     [39

ERROR: Error in file(filename, "r", encoding = encoding): cannot open the connection


In [3]:
mat_health_risk <- read_csv("Maternal_Health_Risk_Data_Set.csv")

mat_health_risk

[1mRows: [22m[34m1014[39m [1mColumns: [22m[34m7[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (1): RiskLevel
[32mdbl[39m (6): Age, SystolicBP, DiastolicBP, BS, BodyTemp, HeartRate

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


Age,SystolicBP,DiastolicBP,BS,BodyTemp,HeartRate,RiskLevel
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>
25,130,80,15.0,98,86,high risk
35,140,90,13.0,98,70,high risk
29,90,70,8.0,100,80,high risk
30,140,85,7.0,98,70,high risk
35,120,60,6.1,98,76,low risk
⋮,⋮,⋮,⋮,⋮,⋮,⋮
22,120,60,15,98,80,high risk
55,120,90,18,98,60,high risk
35,85,60,19,98,86,high risk
43,120,90,18,98,70,high risk


In [4]:
unique_cases <- mat_health_risk |>
select(RiskLevel) |>
unique()

n_observations_1 <- mat_health_risk |>
filter(RiskLevel == "high risk") |>
nrow()

n_observations_2 <- mat_health_risk |>
filter(RiskLevel == "mid risk") |>
nrow()

n_observations_3 <- mat_health_risk |>
filter(RiskLevel == "low risk") |>
nrow()

classes <- c(slice(unique_cases, 1), slice(unique_cases, 2), slice(unique_cases, 3))

observations <- c(n_observations_1, n_observations_2, n_observations_3)
case_ob <- tibble(classes = classes, n_observations = observations)
case_ob

classes,n_observations
<named list>,<int>
high risk,272
low risk,336
mid risk,406


In [5]:
predictor_means <- mat_health_risk |>
select(-RiskLevel) |>
map_df(mean, na.rm=TRUE)

predictor_means

Age,SystolicBP,DiastolicBP,BS,BodyTemp,HeartRate
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
29.87179,113.1982,76.46055,8.725986,98.66509,74.30178


In [8]:
na <- mat_health_risk |>
is.na() |>
unique()

na #no missing values in dataset

Age,SystolicBP,DiastolicBP,BS,BodyTemp,HeartRate,RiskLevel
False,False,False,False,False,False,False
