Cardiovascular diseases (CDVs) are the number 1 cause of death globally, taking an estimate of 17.9 million each year which counts for 31% of all deaths worldwide. Heart failure is a common cause of CVDs and can be prevented by addressing behavioral risk factors such as tabaco use, unhealthy diet, physical activity and alcohol use.
Origional data set is from Davide Chicco & Giuseppe Jurman from biomedcentral and can be accessed from there.
Libraries and modules used are
- Pandas
- Math
- Numpy
- Scipy
- Matplotlib
- Seaborn
Data set is divided into 2 main categories
-
Patients who died
-
Patients who survived
Ho: There is no significant difference between platelets distributed between patients who died vs those who survived.
Ha: There is a significant difference between platelets distributed between patients who died vs who survived.
stats.ttest_ind(death_data['platelets'], no_death_data['platelets'])
Ttest_indResult(statistic=-0.8478681784251544, pvalue=0.3971941540413678)
Box plot shows that 'platelets' appears to be normally distributed in both death_data and no_death_data groups. showing these are less contributed towards heart failure cause.
The difference in means at the 95% confidence interval (two-tail) is between -13566.099706320376 and 34118.989925942726. Which suggests the difference of platelets between survided and non survived patients.