### Introduction

Heart disease remains a significant global health concern, contributing to high mortality rates worldwide. Timely detection and intervention are critical in managing this condition effectively. Among the various risk factors assessed by healthcare professionals, chest pain type and age are prominent indicators used to evaluate the likelihood of heart disease. 

Chest pain, a common symptom, varies in its severity and characteristics, indicating different underlying conditions. Age, on the other hand, is a well-established risk factor, with the incidence of heart disease increasing as individuals grow older.

### The Question I Will Be Analyzing

**Can the type of chest pain and age predict the presence of heart disease in new patients?**

### Dataset Description

For this project, I will be using the Heart Disease dataset from the UCI Machine Learning Repository.

#### Variables I Will Be Looking At:
- **Age**: The age of the patient.
- **Chest Pain Type (cp)**: Categorized as:
  - 1: Typical angina
  - 2: Atypical angina
  - 3: Non-anginal pain
  - 4: Asymptomatic
- **Target (num)**: The presence of heart disease, where 0 indicates no heart disease and 1 indicates the presence of heart disease.

This analysis aims to provide insights into the early detection and preventive strategies for cardiovascular diseases which ultimately benefit patient care and outcomes.


In [2]:
# Importing Libraries 

library(tidyverse)
library(repr)
library(tidymodels)
options(repr.matrix.max.rows = 6)
library(RColorBrewer)
library(readr)

“package ‘ggplot2’ was built under R version 4.3.2”
── [1mAttaching core tidyverse packages[22m ──────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.3     [32m✔[39m [34mreadr    [39m 2.1.4
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.0
[32m✔[39m [34mggplot2  [39m 3.5.0     [32m✔[39m [34mtibble   [39m 3.2.1
[32m✔[39m [34mlubridate[39m 1.9.2     [32m✔[39m [34mtidyr    [39m 1.3.0
[32m✔[39m [34mpurrr    [39m 1.0.2     
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mℹ[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors
── [1mAttaching packages[22m ────────────────────────────────────── tidymodels 1.1.1 ──

[32m✔[39m [34mbroom    

In [4]:
set.seed(123)
# reading the dataframe from URL, assigning col names and types
url <- "https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/processed.cleveland.data"
cleveland_data <- read.csv(url, header = FALSE, col.names = c("age", "sex", "cp", "trestbps", "chol", "fbs", "restecg", 
                                                         "thalach", "exang", "oldpeak", "slope", "ca", "thal", "num"))

# cleaning, wrangling data
cleveland_data[ cleveland_data == "?" ] <- NA

cleveland_clean <- cleveland_data |>
                    mutate(diag = ifelse(is.na(num), NA, (num > 0))) |>
                    mutate(sex = as.factor(as.integer(sex)), cp = as.factor(as.integer(cp)), 
                           fbs = as.factor(as.integer(fbs)), restecg = as.factor(as.integer(restecg)),
                           exang = as.factor(as.integer(exang)), thal = as.factor(as.integer(thal)),
                           ca = as.factor(as.integer(ca)), slope = as.factor(as.integer(slope)))

#splitting dataframe into training, testing datasets
cleveland_split <- initial_split(cleveland_clean, prop = 3/4, strata = num)

cleveland_training <- training(cleveland_split)
cleveland_testing <- testing(cleveland_split)

head(cleveland_training)

Unnamed: 0_level_0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,num,diag
Unnamed: 0_level_1,<dbl>,<fct>,<fct>,<dbl>,<dbl>,<fct>,<fct>,<dbl>,<fct>,<dbl>,<fct>,<fct>,<fct>,<int>,<lgl>
1,63,1,1,145,233,1,2,150,0,2.3,3,0,6,0,False
2,56,1,2,120,236,0,0,178,0,0.8,1,0,3,0,False
3,57,0,4,120,354,0,0,163,1,0.6,1,0,3,0,False
4,57,1,4,140,192,0,0,148,0,0.4,2,0,6,0,False
5,56,0,2,140,294,0,2,153,0,1.3,2,0,3,0,False
6,44,1,2,120,263,0,0,173,0,0.0,1,0,7,0,False
