# Heart-Disease-Dataset
## UCI Cleveland Heart disease dataset


The Heart-Disease-Dataset database consists of 76 attributes, but only a subset of 14 attributes has been utilized in all published experiments thus far. Among these experiments, ML researchers have exclusively employed the Cleveland database. The attribute labeled "goal" indicates the presence of heart disease in a patient and is represented by an integer ranging from 0 (indicating no presence) to 4. Previous studies conducted using the Cleveland database have primarily focused on distinguishing between the presence (values 1, 2, 3, 4) and absence (value 0) of heart disease.


https://www.kaggle.com/datasets/ineubytes/heart-disease-dataset


Variables explanation: 

- age => age in years
- sex => (1 = male; 0 = female)
- cp => chest pain type (4 values)
- trestbps => resting blood pressure (in mm Hg on admission to the hospital)
- chol => serum cholestoral in mg/dl
- fbs => (fasting blood sugar &gt; 120 mg/dl) (1 = true; 0 = false)
- restecg => resting electrocardiographic results
- thalach => maximum heart rate achieved
- exang => exercise induced angina (1 = yes; 0 = no)
- oldpeak => ST depression induced by exercise relative to rest
- slope => the slope of the peak exercise ST segment
- ca => number of major vessels (0-3) colored by flourosopy
- thal => 0 = normal; 1 = fixed defect; 2 = reversable defect
- target => 0 absence heart disease

# IMPORT and PREPARE DATA

In [1]:
#Import the libraries necessaries for this project
import numpy as np
import pandas as pd


In [5]:
#Load the dataset
path = './data/heart.csv'
raw_df = pd.read_csv(path,sep=',')

In [7]:
#Display the information loaded
display(raw_df.head())

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,52,1,0,125,212,0,1,168,0,1.0,2,2,3,0
1,53,1,0,140,203,1,0,155,1,3.1,0,0,3,0
2,70,1,0,145,174,0,1,125,1,2.6,0,0,3,0
3,61,1,0,148,203,0,1,161,0,0.0,2,1,3,0
4,62,0,0,138,294,1,1,106,0,1.9,1,3,2,0


In [16]:
#Check for the total number of registers
raw_df.shape

(1025, 14)

In [13]:
#Check the null values
raw_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1025 entries, 0 to 1024
Data columns (total 14 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   age       1025 non-null   int64  
 1   sex       1025 non-null   int64  
 2   cp        1025 non-null   int64  
 3   trestbps  1025 non-null   int64  
 4   chol      1025 non-null   int64  
 5   fbs       1025 non-null   int64  
 6   restecg   1025 non-null   int64  
 7   thalach   1025 non-null   int64  
 8   exang     1025 non-null   int64  
 9   oldpeak   1025 non-null   float64
 10  slope     1025 non-null   int64  
 11  ca        1025 non-null   int64  
 12  thal      1025 non-null   int64  
 13  target    1025 non-null   int64  
dtypes: float64(1), int64(13)
memory usage: 112.2 KB


### Conclusion:
1025/1025 values, not null and no need to work in this part to solve the NaN/null situation

14/14 Dtype number, no need to transform any field to number, is already coded


In [12]:
#Define the target to search and the features to work with
target = raw_df['target']

#Define the features to work with (everything less 'target')
data = raw_df.drop('target', axis=1)
data.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal
0,52,1,0,125,212,0,1,168,0,1.0,2,2,3
1,53,1,0,140,203,1,0,155,1,3.1,0,0,3
2,70,1,0,145,174,0,1,125,1,2.6,0,0,3
3,61,1,0,148,203,0,1,161,0,0.0,2,1,3
4,62,0,0,138,294,1,1,106,0,1.9,1,3,2
