Title: Predicting Chest Pain Type Based on Resting Blood Pressure and Cholesterol Level in Patients at Risk of Heart Disease.


Introduction:
A new prediction model, using test results from the Cleveland Clink in Ohio predicted the diagnoses of heart disease based on numeric, integer and categorical variables. This model was then applied to a new group of 425 patients who were going through the process of angiography, which is the examination of blood vessels via X ray. We will repurpose the dataset used in this paper. 


In this project, we inquire whether we can use classification to predict the Chest Pain Type (cp) an individual could suffer from based on Resting Blood Pressure (trestbps) and Serum Cholesterol in mg/dl (chol). The chest pain types include: (value 1: typical angina, value 2: atypical angina, value 3: non-anginal pain, value 4: asymptomatic). Angina (chest pain) is a primary indicator of heart disease and the types of pain experienced can be used to diagnose and assess the severity of the condition (Nakias et al., 2018).


As high levels of cholesterol and irregular resting blood correlate with atypical angina (Mosby, 2004), we are curious if these variables can predict the kinds of chest pain patients have. Predicting chest pain type can benefit individuals at risk of heart disease as healthcare providers can identify what symptoms a patient should be monitoring for to get crucial care faster. 


Methods:


We will be making a model for classification data analysis which can determine the categorical label “type of heart pain patients have” based on their “resting blood pressure” and “cholesterol”. We will be keeping the columns cp (chest pain type), resting blood pressure (resting blood pressure), and cholesterol (chol) (serum cholesterol in mg/dl). 


Our classification model will produced using tidymodels, data will be split in an 80:20 ratio to form training and testing sets of data. Variables will be standardized. We will evaluate our model using the “metrics” function. One visualization we will do is to plot accuracy vs. k to demonstrate our chosen k is close to optimal for the nearest neighbours function. 




Expected outcomes and significance:


What do you expect to find? 
We expect to be able to predict and classify the type of chest pain a patient is experiencing as typical angina, atypical angina, non-anginal pain, or asymptomatic based only on the numerical categories: cholesterol and resting blood pressure. 


What impact could such findings have? 
These findings could have impacts in the area of healthcare. It could be used to classify potential chest pain type based on physical testing (blood test for cholesterol and blood pressure cuff for measuring blood pressure) 


What future questions could this lead to? 
This classification could lead to future questions about how cholesterol levels and resting blood pressure levels are related to chest pain, and, in turn, related to heart disease. It would give healthcare professionals additional information to use in their clinics to better treat patients with chest pain and potentially heart disease.


Citations (APA):


Nakias, N., Bechlioulis, A., & et. al.,. (2018, June 8). The importance of characteristics of angina symptoms for the prediction of coronary artery disease in a cohort of stable patients in the modern era. Hellenic Journal of Cardiology. https://www.sciencedirect.com/science/article/pii/S1109966618300277


Mosby. (2004b, February 27). The relation of the systolic blood pressure and heart rate to attacks of angina pectoris precipitated by effort. American Heart Journal. https://www.sciencedirect.com/science/article/abs/pii/S0002870336908839
839
com/science/article/abs/pii/S0002870336908839


Word Count: 495 excluding works cited

In [1]:
library(tidyverse)
library(repr)
library(tidymodels)

── [1mAttaching core tidyverse packages[22m ──────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.3     [32m✔[39m [34mreadr    [39m 2.1.4
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.0
[32m✔[39m [34mggplot2  [39m 3.4.4     [32m✔[39m [34mtibble   [39m 3.2.1
[32m✔[39m [34mlubridate[39m 1.9.3     [32m✔[39m [34mtidyr    [39m 1.3.0
[32m✔[39m [34mpurrr    [39m 1.0.2     
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mℹ[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors
── [1mAttaching packages[22m ────────────────────────────────────── tidymodels 1.1.1 ──

[32m✔[39m [34mbroom       [39m 1.0.5     [32m✔[39m [34mrsample     [39

We begin by reading the heart disease csv file into R from the web. 

In [2]:
heart_disease <- read_csv("data/heart_disease_uci.csv")
heart_disease

ERROR: Error: 'data/heart_disease_uci.csv' does not exist in current working directory ('/home/jovyan/work/Group-31').


Next, we select the columns that we're interested in. In this case, these columns include resting heart rate, cholesterol and chest pain type. 

In [None]:
heart_disease_select <- heart_disease|>
select(trestbps,chol,cp)
heart_disease_select

We must set the seed so that our work is reproducible. Then, we split the data into our training and testing sets. 

In [None]:
set.seed(3456)

heart_disease_split <- initial_split(heart_disease_select, prop=0.75, strata=cp)
heart_disease_train<- training(heart_disease_split)
heart_disease_test <- testing(heart_disease_split)
heart_disease_train
heart_disease_test

We make a summary statistic, grouping by chest pain type and finding the average of both the resting heart rate and of the cholesterol level. 

In [None]:
heart_disease_tidy <- heart_disease_train |>
group_by(cp)|>
summarize(mean_trestbps = mean(trestbps, na.rm=TRUE), mean_chol = mean(chol, na.rm=TRUE))
heart_disease_tidy

Finally we create an initial visualization of resting blood pressure vs cholesterol levels, using colour and shape to categorize chest pain type. 

In [None]:
options(repr.plot.height=10, repr.plot.width=10)
Visualization <- heart_disease_train|>
ggplot(aes(x= trestbps, y= chol))+
       geom_point(aes(colour = cp, shape = cp)) +
labs(x = "Resting Blood Pressure", y = "Cholesterol Levels", colour = "Chest Pain Type", shape = "Chest Pain Type") +
ggtitle("Resting Blood Pressure vs Cholesterol Levels For Type of Chest Pain")+
  theme(text = element_text(size = 20))
Visualization