# Predict Heart Disease Status Based on Quantifiable Variables

# Introduction:

Cardiovascular diseases (CVDs) is a class of disease that involves the heart or blood vessels. the number one cause of death globally, taking an estimated 17.9 million lives each year, which accounts for 31% of all deaths worldwide. Four out of five CVD deaths are due to heart attacks and strokes, and one-third of these deaths occur prematurely in people under 70 years of age. Heart failure is a common event caused by CVDs.

People with cardiovascular disease or who are at high cardiovascular risk (due to the presence of one or more risk factors such as hypertension, diabetes, hyperlipidemia or already established disease) need early detection and management wherein a machine learning model can be of great help.

We are trying to predict the heart disease status based on quantifiable variables such as age, sex, cholesterol level, resting blood pressure, heart rate and old peak. This dataset contains 11 features that can be used to predict possible heart disease. We picked variables with numerical values that were easy to quantify and standardize. In our final report we intend to do subset/forward selection for our final porject. 

### Attribute Information
 1. Age: years
 3. Cholesterol: (mm/dl)
 4. Oldpeak: (Numeric value measured in depression)
 5. RestingBP: resting blood pressure (mm HG)
 6. MaxHR: maximum heart rate achieved (Numeric value between 60 and 202)
 5. HeartDisease:(1: heart disease, 0: Normal)


In [3]:
library(repr)
library(tidyverse)
library(tidymodels)

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.0 ──

[32m✔[39m [34mggplot2[39m 3.3.2     [32m✔[39m [34mpurrr  [39m 0.3.4
[32m✔[39m [34mtibble [39m 3.0.3     [32m✔[39m [34mdplyr  [39m 1.0.2
[32m✔[39m [34mtidyr  [39m 1.1.2     [32m✔[39m [34mstringr[39m 1.4.0
[32m✔[39m [34mreadr  [39m 1.3.1     [32m✔[39m [34mforcats[39m 0.5.0

“package ‘ggplot2’ was built under R version 4.0.1”
“package ‘tibble’ was built under R version 4.0.2”
“package ‘tidyr’ was built under R version 4.0.2”
“package ‘dplyr’ was built under R version 4.0.2”
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()

“package ‘tidymodels’ was built under R version 4.0.2”
── [1mAttaching packages[22m ────────────────────────────────────── tidymodels 0.1.1 ──

[32m✔

In [8]:
#: reading the data set
heart_data <- read_csv("heart.csv") %>%
              mutate(HeartDisease = as_factor(HeartDisease)) %>% 
              select (Age,RestingBP,Cholesterol,MaxHR,Oldpeak,HeartDisease )
head(heart_data)

Parsed with column specification:
cols(
  Age = [32mcol_double()[39m,
  Sex = [31mcol_character()[39m,
  ChestPainType = [31mcol_character()[39m,
  RestingBP = [32mcol_double()[39m,
  Cholesterol = [32mcol_double()[39m,
  FastingBS = [32mcol_double()[39m,
  RestingECG = [31mcol_character()[39m,
  MaxHR = [32mcol_double()[39m,
  ExerciseAngina = [31mcol_character()[39m,
  Oldpeak = [32mcol_double()[39m,
  ST_Slope = [31mcol_character()[39m,
  HeartDisease = [32mcol_double()[39m
)



Age,RestingBP,Cholesterol,MaxHR,Oldpeak,HeartDisease
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<fct>
40,140,289,172,0.0,0
49,160,180,156,1.0,1
37,130,283,98,0.0,0
48,138,214,108,1.5,1
54,150,195,122,0.0,0
39,120,339,170,0.0,0


In [9]:
heart_split <- initial_split(heart_data, prop = 0.75, strata = HeartDisease)
heart_train <- training(heart_split)
heart_test <- testing(heart_split)

In [11]:
heart_recipe <- recipe(HeartDisease ~ Age + RestingBP + Cholesterol + MaxHR + Oldpeak, data = heart_train) %>%
                step_scale(all_predictors()) %>%
                step_center(all_predictors())
heart_recipe

Data Recipe

Inputs:

      role #variables
   outcome          1
 predictor          5

Operations:

Scaling for all_predictors()
Centering for all_predictors()