# Project-008-2 Heart Disease 

### Predicting Heart Disease Diagnosis

### Introduction 

The cardiovascular system encompasses an intricate network of blood vessels, including veins, arteries, and capillaries, which facilitate the circulation of blood throughout the body. Any disruptions in the flow of blood emanating from the heart can lead to various forms of heart diseases, collectively referred to as cardiovascular or heart diseases. Globally, heart diseases constitute a leading cause of mortality - the World Health Organization (WHO) indicates that approximately 17.5 million deaths annually are attributable to heart attacks and strokes. Consequently, the early detection of cardiac anomalies is imperative as it can significantly save lives and assist healthcare professionals in crafting effective treatment strategies. 

In this study, we will examine a dataset that contains test results from 303 patients referred for coronary angiography at the Cleveland Clinic in Ohio between May 1981 and September 1984. All the patients had similar medical profiles and underwent the same non-invasive tests, namely, exercise electrocardiogram, exercise thallium scintigraphy and fluoroscopy for coronary calcium. 

The clinical and test variables included in this dataset are per the below, while the target variable is the angiographic disease status, where the value 0 corresponds to the major vessels show less than 50% narrowing of the vessels diameter while the value 1 corresponds to showing greater than 50% narrowing of the vessels diameter.

Clinical 
Age
Sex
1 = male
0 = female 
Chest Pain Type 
1 = typical anginal 
2 = atypical anginal
3 = nonanginal 
4 = asymptomatic
Systolic Blood Pressure (in mmHg on admission to the hospital)

Routine Test Data Collected
Serum cholesterol determination (in mg/dl) 
Fasting blood sugar determination (fasting blood sugar > 120 mg/dl)
	1 = true
	0 = false
Resting electrocardiographic results
	0 = normal
	1 = having ST-T wave abnormality
	2 = showing probable or definite left ventricular hypertrophy by Estes’ criteria

Exercise Test Data Collected
Maximum heart rate achieved (beats per minute)
Exercise induced angina 
	1 = yes
	0 = no
ST depression induced by exercise relative to rest
Slope of the peak exercise ST segment
1 = upsloping 
2 = flat
3 = downsloping 

Other Non-invasive Test Data Collected
Number of major vessels colored by fluoroscopy for coronary calcium (0 - 3)
Exercise thallium scintigraphy results
3 = normal
6 = fixed defect
7 = reversible defect

In this study, we will address the predictive question: can we use clinical and test data available to us to predict a diagnosis of heart disease. This is important as it provides a quick and data-driven method to diagnose heart disease and eliminates subjectivity and dependence on the skill and experience of the diagnosing physician. 


In [None]:
library(tidyverse)
library(repr)
library(tidymodels)
options(repr.matrix.max.rows = 6)
source('tests.R')
source('cleanup.R')

In [None]:
heart_disease <- read_csv("processed.cleveland.data", col_names = FALSE)
colnames(heart_disease) <- c("age", "sex", "cp", "trestbps", "chol", "fbs", "restecg", "thalach", "exang", "oldpeak", "slope", "ca", "thal", "num")
heart_disease

In [None]:
heart_disease <- mutate(heart_disease, num = as.factor(num))
heart_disease_plot <- heart_disease |>
ggplot(aes (x = cp, fill = num)) +
geom_bar(position = "dodge")
heart_disease_plot
#The plot shows that people with type 4 chest pain (asymptomatic, so actually no chest pain) are more likely to be diagnosed with cancer

In [None]:
heart_disease_split <- initial_split(heart_disease , prop = 0.75, strata = num)  
heart_disease_train <- training(heart_disease_split)   
heart_disease_test<- testing(heart_disease_split)
heart_disease_train

### Methods 
Explain how you will conduct either your data analysis and which variables/columns you will use. Note - you do not need to use all variables/columns that exist in the raw data set. In fact, that's often not a good idea. For each variable think: is this a useful variable for prediction?





Describe at least one way that you will visualize the results