# Exploratory Data Analysis
   ---
*By Tan Yu Xuan                    31 Jan 2022*

# Imported Libraries

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# change to nicer default style
plt.style.use('seaborn')
%matplotlib inline

The dataset was downloaded from https://www.kaggle.com/datasets/fedesoriano/heart-failure-prediction.

It contains patients' health information from Cleveland, Hungary, Switzerland, and the VA Long Beach.

With insights on these, we can draw correlations to see which features lead to heart diease and perhaps predict patients with heart diease.

**List of attributes**

| Attribute | Description |
|-----------|-------------|
|Age|age of patient [in years]|
|Sex|sex of patient [M: Male, F: Female]|
|ChestPainType|chest pain type [TA: Typical Angina, ATA: Atypical Angina, NAP: Non-Anginal Pain, ASY: Asymptomatic]|
|RestingBP|resting blood pressure [mm Hg]|
|Cholesterol|serum cholesterol [mm/dl]|
|FastingBS|fasting blood sugar [1: if FastingBS > 120mg/dl, 0: otherwise]|
|RestingECG|Resting electrocargiogram results [Normal: Normal, ST: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05mV), LVH: showing probable or definite left ventricular hypertrophy by Estes' criteria]|
|MaxHR|maximum heart rate achieved [Numeric value between 60 and 202]|
|ExerciseAngina|exercise-induced agina [Y: Yes, N: No]|
|OldPeak|old peak = ST [Numeric value measured in depression]|
|ST_Slope|the slope of the peak exercise ST segment [Up: upsloping, Flat: flat, Down: downsloping]|
|HeartDisease|output class [1: heart diease, 0: Normal]|

# Importing Dataset

In [2]:
# Read csv into pandas DataFrame
df = pd.read_csv(r'data\heart.csv')

# Heart Dataset

In [3]:
df.head()

Unnamed: 0,Age,Sex,ChestPainType,RestingBP,Cholesterol,FastingBS,RestingECG,MaxHR,ExerciseAngina,Oldpeak,ST_Slope,HeartDisease
0,40,M,ATA,140,289,0,Normal,172,N,0.0,Up,0
1,49,F,NAP,160,180,0,Normal,156,N,1.0,Flat,1
2,37,M,ATA,130,283,0,ST,98,N,0.0,Up,0
3,48,F,ASY,138,214,0,Normal,108,Y,1.5,Flat,1
4,54,M,NAP,150,195,0,Normal,122,N,0.0,Up,0


In [4]:
df.tail()

Unnamed: 0,Age,Sex,ChestPainType,RestingBP,Cholesterol,FastingBS,RestingECG,MaxHR,ExerciseAngina,Oldpeak,ST_Slope,HeartDisease
913,45,M,TA,110,264,0,Normal,132,N,1.2,Flat,1
914,68,M,ASY,144,193,1,Normal,141,N,3.4,Flat,1
915,57,M,ASY,130,131,0,Normal,115,Y,1.2,Flat,1
916,57,F,ATA,130,236,0,LVH,174,N,0.0,Flat,1
917,38,M,NAP,138,175,0,Normal,173,N,0.0,Up,0


The following features to be modified for modelling preparation:
1. Sex - encoded
2. ChestPainType - encoded
3. RestingECG - classify according to severity (0:Normal, 1:ST, 2:LVH)
4. ExerciseAngina - encoded (0: N, 1: Y)
5. ST_Slope - classify according to severity (0:up, 1:flat, 2:Down)