## Heart Failure Prediction

    Cardiovascular diseases (CVDs) are the number 1 cause of death globally, taking an estimated 17.9 million lives each year, which accounts for 31% of all deaths worlwide.
    Heart failure is a common event caused by CVDs and this dataset contains 12 features that can be used to predict mortality by heart failure.

    Most cardiovascular diseases can be prevented by addressing behavioural risk factors such as tobacco use, unhealthy diet and obesity, physical inactivity and harmful use of alcohol using population-wide strategies.

    People with cardiovascular disease or who are at high cardiovascular risk (due to the presence of one or more risk factors such as hypertension, diabetes, hyperlipidaemia or already established disease) need early detection and management wherein a machine learning model can be of great help.


In [1]:
# Importing the necessary libraries
import pandas as pd

In [2]:
# Loading the dataset
hf_data = pd.read_csv('../input/heart-failure-clinical-data/heart_failure_clinical_records_dataset.csv')

In [3]:
# Viewing the first five rows of the dataset
hf_data.head(5)

Unnamed: 0,age,anaemia,creatinine_phosphokinase,diabetes,ejection_fraction,high_blood_pressure,platelets,serum_creatinine,serum_sodium,sex,smoking,time,DEATH_EVENT
0,75.0,0,582,0,20,1,265000.0,1.9,130,1,0,4,1
1,55.0,0,7861,0,38,0,263358.03,1.1,136,1,0,6,1
2,65.0,0,146,0,20,0,162000.0,1.3,129,1,1,7,1
3,50.0,1,111,0,20,0,210000.0,1.9,137,1,0,7,1
4,65.0,1,160,1,20,0,327000.0,2.7,116,0,0,8,1


In [4]:
# Checking the shape of the dataset
hf_data.shape

(299, 13)

In [5]:
# Checking for missing values
hf_data.isnull().sum()

age                         0
anaemia                     0
creatinine_phosphokinase    0
diabetes                    0
ejection_fraction           0
high_blood_pressure         0
platelets                   0
serum_creatinine            0
serum_sodium                0
sex                         0
smoking                     0
time                        0
DEATH_EVENT                 0
dtype: int64

In [6]:
# Checking the dtypes of the dataset
hf_data.dtypes

age                         float64
anaemia                       int64
creatinine_phosphokinase      int64
diabetes                      int64
ejection_fraction             int64
high_blood_pressure           int64
platelets                   float64
serum_creatinine            float64
serum_sodium                  int64
sex                           int64
smoking                       int64
time                          int64
DEATH_EVENT                   int64
dtype: object

In [7]:
# Obtaining the value counts of the target
hf_data['DEATH_EVENT'].value_counts()

0    203
1     96
Name: DEATH_EVENT, dtype: int64

In [8]:
# Obtaining the value counts of genders
female = hf_data[hf_data['sex'] == 0]
male = hf_data[hf_data['sex'] == 1]

print('Male = {}'.format(male.count()[1]))
print('Female = {}'.format(female.count()[1]))

Male = 194
Female = 105


In [9]:
# Obtaining the value counts of genders that survived and died
print('Male survivors = {}'.format(((male['DEATH_EVENT'] == 0).sum())))
print('Dead males = {}'.format(((male['DEATH_EVENT'] == 1).sum())))

print('Female survivors = {}'.format(((female['DEATH_EVENT'] == 0).sum())))
print('Dead females = {}'.format(((female['DEATH_EVENT'] == 1).sum())))

Male survivors = 132
Dead males = 62
Female survivors = 71
Dead females = 34


In [10]:
# Obtaining the value counts of patients with high blood pressure
hf_data['high_blood_pressure'].value_counts()

0    194
1    105
Name: high_blood_pressure, dtype: int64

In [11]:
# Obtaining the value counts of genders that have high blood pressure
print('Males without HBP = {}'.format(((male['high_blood_pressure'] == 0).sum())))
print('Males with HBP = {}'.format(((male['high_blood_pressure'] == 1).sum())))

print('Females without HBP = {}'.format(((female['high_blood_pressure'] == 0).sum())))
print('Females with HBP = {}'.format(((female['high_blood_pressure'] == 1).sum())))

Males without HBP = 133
Males with HBP = 61
Females without HBP = 61
Females with HBP = 44


In [12]:
# Obtaining the value counts of HBP patients that died and survived
no_hbp = hf_data[hf_data['high_blood_pressure'] == 0]
yes_hbp = hf_data[hf_data['high_blood_pressure'] == 1]

print('Survived non-HBP patients = {}'.format(((no_hbp['DEATH_EVENT'] == 0).sum())))
print('Survived HBP patients = {}'.format(((yes_hbp['DEATH_EVENT'] == 0).sum())))
print('Dead non-HBP patients = {}'.format(((no_hbp['DEATH_EVENT'] == 1).sum())))
print('Dead HBP patients = {}'.format(((yes_hbp['DEATH_EVENT'] == 1).sum())))

Survived non-HBP patients = 137
Survived HBP patients = 66
Dead non-HBP patients = 57
Dead HBP patients = 39


In [13]:
# Obtaining the value counts of diabetic patients
hf_data['diabetes'].value_counts()

0    174
1    125
Name: diabetes, dtype: int64

In [14]:
# Obtaining the value counts of diabeic genders
print('Non-diabetic males = {}'.format(((male['diabetes'] == 0).sum())))
print('Diabetic males = {}'.format(((male['diabetes'] == 1).sum())))

print('Non-diabetic females = {}'.format(((female['diabetes'] == 0).sum())))
print('Diabetic females = {}'.format(((female['diabetes'] == 1).sum())))

Non-diabetic males = 124
Diabetic males = 70
Non-diabetic females = 50
Diabetic females = 55


In [15]:
# Obtaining the value counts of diabetic patients that died and survived
no_dia = hf_data[hf_data['diabetes'] == 0]
yes_dia = hf_data[hf_data['diabetes'] == 1]

print('Survived non-diabetic patients = {}'.format(((no_dia['DEATH_EVENT'] == 0).sum())))
print('Survived diabetic patients = {}'.format(((yes_dia['DEATH_EVENT'] == 0).sum())))
print('Dead non-diabetic patients = {}'.format(((no_dia['DEATH_EVENT'] == 1).sum())))
print('Dead diabetic patients = {}'.format(((yes_dia['DEATH_EVENT'] == 1).sum())))

Survived non-diabetic patients = 118
Survived diabetic patients = 85
Dead non-diabetic patients = 56
Dead diabetic patients = 40


In [16]:
# Obtaining the value counts of patients with anaemia
hf_data['anaemia'].value_counts()

0    170
1    129
Name: anaemia, dtype: int64

In [17]:
# Obtaining the value counts of genders with and without anaemia
print('Males without anaemia = {}'.format(((male['anaemia'] == 0).sum())))
print('Males with anaemia = {}'.format(((male['anaemia'] == 1).sum())))

print('Females without anaemia = {}'.format(((female['anaemia'] == 0).sum())))
print('Females with anaemia = {}'.format(((female['anaemia'] == 1).sum())))

Males without anaemia = 117
Males with anaemia = 77
Females without anaemia = 53
Females with anaemia = 52


In [18]:
# Obtaining the value counts of patients with anaemia that died and survived
no_anm = hf_data[hf_data['anaemia'] == 0]
yes_anm = hf_data[hf_data['anaemia'] == 1]

print('Survived patients without anaemia = {}'.format(((no_anm['DEATH_EVENT'] == 0).sum())))
print('Survived patients with anaemia = {}'.format(((yes_anm['DEATH_EVENT'] == 0).sum())))
print('Dead patients without anaemia = {}'.format(((no_anm['DEATH_EVENT'] == 1).sum())))
print('Dead patients with anaemia = {}'.format(((yes_anm['DEATH_EVENT'] == 1).sum())))

Survived patients without anaemia = 120
Survived patients with anaemia = 83
Dead patients without anaemia = 50
Dead patients with anaemia = 46


In [19]:
# Obtaining the value counts of patients that smoke
hf_data['smoking'].value_counts()

0    203
1     96
Name: smoking, dtype: int64

In [20]:
# Obtaining the value counts of genders that smoke
print('Male non-smokers = {}'.format(((male['smoking'] == 0).sum())))
print('Male smokers = {}'.format(((male['smoking'] == 1).sum())))

print('Female non-smokers = {}'.format(((female['smoking'] == 0).sum())))
print('Female smokers = {}'.format(((female['smoking'] == 1).sum())))

Male non-smokers = 102
Male smokers = 92
Female non-smokers = 101
Female smokers = 4


In [21]:
# Obtaining the value counts of smokers that died and survived
no_smk = hf_data[hf_data['smoking'] == 0]
yes_smk = hf_data[hf_data['smoking'] == 1]

print('Survived non-smokers = {}'.format(((no_smk['DEATH_EVENT'] == 0).sum())))
print('Survived smokers = {}'.format(((yes_smk['DEATH_EVENT'] == 0).sum())))
print('Dead non-smokers = {}'.format(((no_smk['DEATH_EVENT'] == 1).sum())))
print('Dead smokers = {}'.format(((yes_smk['DEATH_EVENT'] == 1).sum())))

Survived non-smokers = 137
Survived smokers = 66
Dead non-smokers = 66
Dead smokers = 30


In [22]:
# Obtaining the dataset correlations
hf_data.corr()

Unnamed: 0,age,anaemia,creatinine_phosphokinase,diabetes,ejection_fraction,high_blood_pressure,platelets,serum_creatinine,serum_sodium,sex,smoking,time,DEATH_EVENT
age,1.0,0.088006,-0.081584,-0.101012,0.060098,0.093289,-0.052354,0.159187,-0.045966,0.06543,0.018668,-0.224068,0.253729
anaemia,0.088006,1.0,-0.190741,-0.012729,0.031557,0.038182,-0.043786,0.052174,0.041882,-0.094769,-0.10729,-0.141414,0.06627
creatinine_phosphokinase,-0.081584,-0.190741,1.0,-0.009639,-0.04408,-0.07059,0.024463,-0.016408,0.05955,0.079791,0.002421,-0.009346,0.062728
diabetes,-0.101012,-0.012729,-0.009639,1.0,-0.00485,-0.012732,0.092193,-0.046975,-0.089551,-0.15773,-0.147173,0.033726,-0.001943
ejection_fraction,0.060098,0.031557,-0.04408,-0.00485,1.0,0.024445,0.072177,-0.011302,0.175902,-0.148386,-0.067315,0.041729,-0.268603
high_blood_pressure,0.093289,0.038182,-0.07059,-0.012732,0.024445,1.0,0.049963,-0.004935,0.037109,-0.104615,-0.055711,-0.196439,0.079351
platelets,-0.052354,-0.043786,0.024463,0.092193,0.072177,0.049963,1.0,-0.041198,0.062125,-0.12512,0.028234,0.010514,-0.049139
serum_creatinine,0.159187,0.052174,-0.016408,-0.046975,-0.011302,-0.004935,-0.041198,1.0,-0.189095,0.00697,-0.027414,-0.149315,0.294278
serum_sodium,-0.045966,0.041882,0.05955,-0.089551,0.175902,0.037109,0.062125,-0.189095,1.0,-0.027566,0.004813,0.08764,-0.195204
sex,0.06543,-0.094769,0.079791,-0.15773,-0.148386,-0.104615,-0.12512,0.00697,-0.027566,1.0,0.445892,-0.015608,-0.004316
