# Heart Failure Project

## Introduction:

"Cardiovascular diseases (CVDs) are the number 1 cause of death globally" (LARXEL, kaggle) and it is truly concerning how it is taking over many lives, like family members losing their love ones due to it, people needing to live their lives in hospitals, and so on. We hope to be able to use this opportunity to discover different factors that could lead to heart failure and building a model to predict those who are most in need and hopefully get them the medical attention they require. Our main focus of this project is to determine/answer the question: "which factors have a significant contribution towards heart failure?". The data we obtained consists of factors such as diabetes, high blood pressure, age, sex, whether someone smokes, and so on. Our goal is to use these factors as predictors to predict if someone should receive medical attention immediately or not. If predictions indicates death, it would suggest doctors to focus on this case immediately to prevent death, and if it indicates they are going to survive, then we would do precautions to prevent them from falling into the categories that might lead them to death. This dataset we gathered is from Kaggle, which was released by user LARXEL in 2020.

In [8]:
library(tidyverse)
library(repr)
library(tidymodels)
options(repr.matrix.max.rows = 6)
install.packages("kknn")

Updating HTML index of packages in '.Library'

Making 'packages.html' ...
 done



In [12]:
download.file("https://github.com/KristenisHuaiyi/Data_Science_Project/blob/main/heart_failure_clinical_records_dataset.csv", "downloaded_data.csv")

In [16]:
heart_failure_patient_dataset <- read_csv("downloaded_data.csv")
heart_failure_patient_dataset

[1m[22mNew names:
[36m•[39m `contentType:directory}` -> `contentType:directory}...4`
[36m•[39m `contentType:directory}` -> `contentType:directory}...7`
[36m•[39m `contentType:file}` -> `contentType:file}...10`
[36m•[39m `contentType:file}` -> `contentType:file}...13`
[36m•[39m `contentType:file}` -> `contentType:file}...16`
[36m•[39m `contentType:file}` -> `contentType:file}...19`
[36m•[39m `contentType:file}` -> `contentType:file}...22`
[36m•[39m `path:heart_failure_clinical_records_dataset.csv` ->
  `path:heart_failure_clinical_records_dataset.csv...24`
[36m•[39m `contentType:file}` -> `contentType:file}...25`
[36m•[39m `path:heart_failure_clinical_records_dataset.csv` ->
  `path:heart_failure_clinical_records_dataset.csv...52`
[36m•[39m `["75"` -> `["75"...69`
[36m•[39m `0` -> `0...70`
[36m•[39m `582` -> `582...71`
[36m•[39m `0` -> `0...72`
[36m•[39m `20` -> `20...73`
[36m•[39m `1` -> `1...74`
[36m•[39m `265000` -> `265000...75`
[36m•[39m `1.9` 

"{""payload"":{""allShortcutsEnabled"":false","fileTree:{"":{items:[{name:.ipynb_checkpoints",path:.ipynb_checkpoints,contentType:directory}...4,"{""name"":""data""",path:data,contentType:directory}...7,"{""name"":"".gitignore""",path:.gitignore,contentType:file}...10,⋯,viewable:true,workflowRedirectUrl:null,symbols:{timedOut:false,notAnalyzed:true,symbols:[]}},copilotInfo:null,copilotAccessAllowed:false,csrf_tokens:{/KristenisHuaiyi/Data_Science_Project/branches:{post:5gD_7UL9GqpWbywyHATesgZD6TuFRwdaNaBsbf_I416gaDf7cxUT_i-0asu9xOuYt8Gzb0eXiqN5yYYUIIdKXw},/repos/preferences:{post:fqVnuY_HoQHf9zd8VBwXGaqaILIq8SoRKyTdAYskcgkFyQKlT-t9GZa48Oc7V_3da5be1PzfN0JpC_vAcIHQjA}}},title:Data_Science_Project/heart_failure_clinical_records_dataset.csv at main · KristenisHuaiyi/Data_Science_Project}
<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,⋯,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>


In [None]:
#wrangling the data by selecting the predictors we would work with and 
#converting the response variable DEATH_EVENT to the factor datatype and renaming it to "survived"
#converting the predicting variables that has results 0 or 1 to the logical datatype so it shows as TRUE or FALSE

data_wrangled <- downloaded_data |> 
            select(age, diabetes, ejection_fraction, serum_creatinine, high_blood_pressure, smoking, DEATH_EVENT) |>
            mutate(DEATH_EVENT = as_factor(DEATH_EVENT)) |>
            mutate(DEATH_EVENT = fct_recode(DEATH_EVENT, "Yes" = "0", "No" = "1")) |>
            rename("survived" = "DEATH_EVENT") |>
            mutate(diabetes = as.logical(diabetes)) |>
            mutate(high_blood_pressure = as.logical(high_blood_pressure)) |>
            mutate(smoking = as.logical(smoking))    

data_wrangled