# Group Report

## Heart Disease
Today we will explore the heart disease dataset. We want to create a classifier model to predict whether a patient has heart disease by using age, cholesterol and maximum heart rate as variables. We chose these variables as our predictors because they are important risk factors according to a research article (Peter, 1998). 

The columns in this dataset are:
1. Age - (years)
2. Sex 
3. Chest Pain Type - (1 = typical angina)
                     (2 = atypical angina)
                     (3 = non-anginal pain)
                     (4 = asymptomatic)
4. Resting Blood Pressure - (mm Hg)
5. Cholesterol -(mg/dl)
6. Fasting Blood Sugar - (> 120 mg/dl)
                       - (1 = true; 0 = false)
7. Resting Electrocardiogram
8. Maximum heart rate
9. Exercised induced angina - (1 = yes; 0 = no)
10. ST depression induced by exercise relative to rest
11. Slope of peak exercise ST segment
12. Number of major vessels 
13. Blood disorder - 3 = normal
                     6 = fixed defect
                     7 = reversible defect
14. Heart disease status - either yes or no 

In [None]:
# Run this cell before continuing.
library(tidyverse)
library(repr)
library(tidymodels)
options(repr.matrix.max.rows = 6)

In [None]:
set.seed(20)

heart_data_1 <- read_csv("data/processed.cleveland (1).data", col_names = FALSE) |>
                mutate(X14 = as_factor(X14)) 
                
colnames(heart_data_1) <- c("age","sex","cp","trestbps","chol","fbs","restecg","thalach",
                                    "exang","oldpeak","slope","ca","thal","num") 
heart_data <- heart_data_1 |>
                mutate(num=recode(num, "0" = "no",
                                       "1" = "yes",
                                       "2" = "yes",
                                       "3" = "yes",
                                       "4" = "yes")) |>
                mutate(sex=recode(sex, "1" = "male",
                                       "0" = "female"))
                           
heart_data