In [2]:
import pandas as pd

#### Introduction

According to the World Health Organization (WHO), cardiovascular diseases (CVDs) are the leading cause of death worldwide, responsible for around 17.9 million deaths each year. Over 80% of these deaths are attributed to heart attacks and strokes, with one-third occurring prematurely in individuals under the age of 70. Given the significant impact of CVDs, early detection of those at risk is crucial for ensuring timely and appropriate treatment. In this study, we utilize a machine learning model based on the K-Nearest-Neighbors (K-NN) algorithm to predict the diagnosis of heart disease and identify individuals at risk.

#### Description of Features
- age: age in years
- sex: sex (1 = male, 0 = female)
- cp: chest pain type (1 = typical angina; 2 = atypical angina; 3 = non-anginal pain; 0 = asymptomatic)
- trestbps: resting blood pressure (in mm Hg)
- chol: cholestoral in mg/dl fetched via BMI sensor
- fbs: fasting blood sugar > 120 mg/dl (1 = true; 0 = false)
- restecg: resting electrocardiographic results (0 = normal; 1 = having ST-T wave abnormality, T wave inversions and/or ST elevation or depression of > 0.05 mV; 2 = showing probable or definite left ventricular hypertrophy by Estes' criteria)
- thalachh: maximum heart rate achieved
- exng: exercise induced angina (1 = yes; 0 = no)
- oldpeak: ST depression induced by exercise relative to rest
- slp: slope of the peak exercise ST segment (2 = upsloping; 1 = flat; 0 = downsloping)
- caa: number of major vessels (0-3) colored by flourosopy
- thall: (2 = normal; 1 = fixed defect; 3 = reversable defect)
- output: diagnosis of heart disease (angiographic disease status) (0 = < 50% diameter narrowing; 1 = > 50% diameter narrowing)

#### Loading the Dataset

In [8]:
# Opening the dataset using Pandas dataframes
df = pd.read_csv('heart.csv')
df.head()

Unnamed: 0,age,sex,cp,trtbps,chol,fbs,restecg,thalachh,exng,oldpeak,slp,caa,thall,output
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


In [6]:
# Taking a peek at our first sample from the data set
df.iloc[0]

age          63.0
sex           1.0
cp            3.0
trtbps      145.0
chol        233.0
fbs           1.0
restecg       0.0
thalachh    150.0
exng          0.0
oldpeak       2.3
slp           0.0
caa           0.0
thall         1.0
output        1.0
Name: 0, dtype: float64