## Data Import

In [1]:
import pandas as pd


In [8]:
heartFailure=pd.read_csv("heart_failure_clinical_records_dataset.csv")
heartFailure.head()

Unnamed: 0,age,anaemia,creatinine_phosphokinase,diabetes,ejection_fraction,high_blood_pressure,platelets,serum_creatinine,serum_sodium,sex,smoking,time,DEATH_EVENT
0,75.0,0,582,0,20,1,265000.0,1.9,130,1,0,4,1
1,55.0,0,7861,0,38,0,263358.03,1.1,136,1,0,6,1
2,65.0,0,146,0,20,0,162000.0,1.3,129,1,1,7,1
3,50.0,1,111,0,20,0,210000.0,1.9,137,1,0,7,1
4,65.0,1,160,1,20,0,327000.0,2.7,116,0,0,8,1


In [9]:
heartFailure.shape

(299, 13)

## Data Description:

Ejection_fraction = Ejection fraction (EF) is a measurement, expressed as a percentage,
                     of how much blood the left ventricle pumps out with each contraction. 

anaemia: decrease of red blood cells or hemoglobin (boolean)
high blood pressure: if the patient has hypertension (boolean)
 
creatinine_phosphokinase =  an enzyme in the body. It is found mainly in the heart, brain, and skeletal muscle.
                            The CPK normal range for a male is between 39 – 308 U/L, 
                            while in females the CPK normal range is between 26 – 192 U/L.
 
Platelets = The cells that circulate within our blood and bind together when they recognize damaged blood vessels.
             A normal platelet count ranges from 150,000 to 450,000 platelets per microliter of blood.

serum_creatinine = An increased level of creatinine may be a sign of poor kidney function.
                    The typical range for serum creatinine is: For adult men, 0.74 to 1.35 mg/dL 
                    For adult women, 0.59 to 1.04 mg/dL (52.2 to 91.9 micromoles/L)

serum_sodium = A measurement in assessing electrolyte, acid-base, and water balance, as well as renal function.
               The reference range for serum sodium is 135-147 mmol/L.
sex: woman[0] or man[1] (binary)
smoking: if the patient smokes or not (boolean)
time: follow-up period (days)
[target] death event: if the patient deceased during the follow-up period (boolean)

## Checking Data 

In [10]:
# Searching in appropriate values
heartFailure.describe().loc[['count','max','min']]

# So, all data of anemia, diabetes, high_blood_pressure,sex,smoking death_event are valid data 
# as those are either boolean or binary.
# Data of age, ejection_fraction,serum_creatinine, serum_sodium, time features also valid.
# Let's check data of creatinine_phosphokinase,platelets,ejection_fraction. 

Unnamed: 0,age,anaemia,creatinine_phosphokinase,diabetes,ejection_fraction,high_blood_pressure,platelets,serum_creatinine,serum_sodium,sex,smoking,time,DEATH_EVENT
count,299.0,299.0,299.0,299.0,299.0,299.0,299.0,299.0,299.0,299.0,299.0,299.0,299.0
max,95.0,1.0,7861.0,1.0,80.0,1.0,850000.0,9.4,148.0,1.0,1.0,285.0,1.0
min,40.0,0.0,23.0,0.0,14.0,0.0,25100.0,0.5,113.0,0.0,0.0,4.0,0.0


In [11]:
# creatinine_phosphokinase:Normal range: 10 to 120 micrograms per liter (mcg/L) [Ref:]
heartFailure['creatinine_phosphokinase'][heartFailure['creatinine_phosphokinase']>310].value_counts()

582     47
835      2
1021     1
2656     1
335      1
        ..
675      1
936      1
427      1
943      1
514      1
Name: creatinine_phosphokinase, Length: 89, dtype: int64

In [12]:
heartFailure['creatinine_phosphokinase'][heartFailure['creatinine_phosphokinase']<10].value_counts()

Series([], Name: creatinine_phosphokinase, dtype: int64)

In [13]:
heartFailure['platelets'][heartFailure['platelets']<150000].value_counts()
# These are valid data as sick people can have low platelets count 

149000.0    3
147000.0    2
133000.0    2
140000.0    2
127000.0    2
122000.0    1
119000.0    1
47000.0     1
132000.0    1
51000.0     1
73000.0     1
141000.0    1
105000.0    1
25100.0     1
126000.0    1
130000.0    1
70000.0     1
62000.0     1
87000.0     1
136000.0    1
75000.0     1
Name: platelets, dtype: int64

In [14]:
heartFailure['platelets'][heartFailure['platelets']>450000].value_counts()
#  These are valid data as sick people can have high platelets count

451000.0    2
507000.0    1
742000.0    1
497000.0    1
533000.0    1
621000.0    1
850000.0    1
543000.0    1
461000.0    1
504000.0    1
481000.0    1
454000.0    1
Name: platelets, dtype: int64