# About - Dataset:

The dataset provided for this project is focused on heart failure, a critical medical condition characterized by the heart's inability to pump blood effectively, leading to inadequate circulation throughout the body. Here's an overview of the dataset

# Title: Heart Failure Clinical Records Dataset

**Source**: The dataset was collected at the Faisalabad Institute of Cardiology and the Allied Hospital in Faisalabad, Punjab, Pakistan, during the period of April to December in 2015. It was later made available on Kaggle.

**Description:** The dataset comprises medical records of 299 patients diagnosed with heart failure. These records include various clinical, body, and lifestyle information of the patients. The dataset encompasses 13 features, providing insights into the patients' health status and potential risk factors for heart failure

Features:

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express  as px
from scipy import stats

In [None]:
df

1. Age: The age of the patient (numeric).

In [None]:
df = pd.read_csv('/content/drive/MyDrive/heart_failure_clinical_records_dataset.csv')
df['age']

2. Anaemia: Indicates the presence of anemia (binary: 0 for no, 1 for yes).

In [None]:
anaemia = df['anaemia']
anaemia

3. Creatinine Phosphokinase (CPK): Level of creatinine phosphokinase enzyme in the blood (numeric).

In [None]:
cpk_levels = df['creatinine_phosphokinase']

print(cpk_levels)

4. Diabetes: Indicates whether the patient has diabetes (binary: 0 for no, 1 for yes).

In [None]:
diabetes = df['diabetes']
diabetes

5. Ejection Fraction: Percentage of blood leaving the heart at each contraction (numeric).

In [None]:
Ejection_Fraction  = ["ejection_fraction"]
Ejection_Fraction

6. High Blood Pressure: Indicates whether the patient has high blood pressure (binary: 0 for no, 1 for yes).

In [None]:
High_Blood_Pressure = df['high_blood_pressure']
High_Blood_Pressure

7. Platelets: Platelet count in the blood (numeric).

In [None]:
Platelets = df['platelets']

8. Serum Creatinine: Level of creatinine in the blood (numeric).

In [None]:
Serum_Creatinine = df['serum_creatinine']
Serum_Creatinine

9. Serum Sodium: Level of sodium in the blood (numeric).

In [None]:
Serum_Sodium = df['serum_sodium']
Serum_Sodium

10. Sex: Gender of the patient (binary: 0 for female, 1 for male).

In [None]:
Sex = df['sex']
Sex

11. Smoking: Indicates whether the patient smokes (binary: 0 for no, 1 for yes).

In [None]:
Smoking = df['smoking']
Smoking

12. Time: Follow-up period (in days) for the patient's condition (numeric).

In [None]:
Time= df['time']
Time

# Death Event:
*Indicates whether the patient died during the follow-up period (binary: 0 for no, 1 for yes).*

1. What is the distribution of age among heart failure patients in the dataset?

In [None]:
ages = df['age']

plt.figure(figsize=(10, 6))
sns.histplot(ages, bins=30, kde=True)
plt.title('Age of Heart Failure Patients')
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.show()

2. How does the death rate vary with age?

In [None]:
death_rates = df.groupby("age")["DEATH_EVENT"].mean()

plt.figure(figsize=(8, 6))
plt.plot(death_rates.index, death_rates.values)
plt.xlabel("Age")
plt.ylabel("Death Rate (Mean)")
plt.title("Death Rate Variation with Age in Heart Failure Patients")
plt.grid(True)
plt.show()


3. What is the percentage of male and female patients in the dataset?

In [None]:
male_count = len(df[df["sex"] ==0])
female_count = len(df[df["sex"] ==1])

total_patients = len(df)

male_percentage = (male_count / total_patients) * 100
female_percentage = (female_count / total_patients) * 100
print("male",male_percentage)
print("female",female_percentage)

4. How does the platelet count vary among different age groups?

In [None]:
age_bins = [0,30,40,50,60,70,80,90]

df['age_group'] = pd.cut(df['age'], bins=age_bins)

platelet_means = df.groupby('age_group')['platelets'].mean()

plt.figure(figsize=(10, 6))
sns.barplot(x=platelet_means.index.astype(str), y=platelet_means.values)
plt.title('Platelet Count of Different Age Groups')
plt.xlabel('age Group')
plt.ylabel('mean Platelet Count')
plt.xticks(rotation=45)
plt.show()

5. Is there a correlation between creatinine and sodium levels in the blood?

In [None]:

correlation = df['creatinine_phosphokinase'].corr(df['serum_sodium'])

plt.figure(figsize=(8, 6))
sns.scatterplot(x='creatinine_phosphokinase', y='serum_sodium', data=df)
plt.title('correlation between creatinine and sodium levels')
plt.xlabel('creatinine levels')
plt.ylabel('Sodium Levels')
plt.text(2000, 125, f'Correlation coefficient: {correlation:.2f}', fontsize=12, bbox=dict(facecolor='white', alpha=0.5))
plt.show()

6. How does the prevalence of high blood pressure differ between male and female patients?

In [None]:
prevalence = df.groupby('sex')['high_blood_pressure'].mean() * 100

plt.figure(figsize=(8, 6))
sns.barplot(x=prevalence.index, y=prevalence.values)
plt.title('Prevalence of High Blood Pressure by Gender')
plt.xlabel('Gender')
plt.ylabel('Prevalence (%)')
plt.xticks([0, 1], ['Female', 'Male'])
plt.show()

print(f'Female {prevalence[0]:.2f}%')
print(f'Male {prevalence[1]:.2f}%')

7. What is the relationship between smoking habits and the occurrence of heart failure?

In [None]:

occurrence = df.groupby('smoking')['DEATH_EVENT'].mean() * 100

plt.figure(figsize=(8, 6))
sns.barplot(x=occurrence.index, y=occurrence.values)
plt.title('Occurrence of Heart Failure by Smoking Status')
plt.xlabel('Smoking Status (0: Non-Smoker, 1: Smoker)')
plt.ylabel('Occurrence (%)')
plt.xticks([0, 1], ['Non-Smoker', 'Smoker'])
plt.show()

print(f'Non-Smoker{occurrence[0]:.2f}%')
print(f'Smoker{occurrence[1]:.2f}%')


8. Are there any noticeable patterns in the distribution of death events across different age groups?

In [None]:

age_bins = [0, 50, 60, 70, 80, 90]

df['age_group'] = pd.cut(df['age'], bins=age_bins)

death_distribution = df.groupby('age_group')['DEATH_EVENT'].mean() * 100

plt.figure(figsize=(10, 6))
sns.barplot(x=death_distribution.index.astype(str), y=death_distribution.values)
plt.title('Death Event Distribution Across Age Groups')
plt.xlabel('Age Group')
plt.ylabel('Death Event Percentage (%)')
plt.xticks(rotation=45)
plt.show()



9. Is there any significant difference in ejection fraction between patients with and without diabetes?

In [None]:
ejection_no_diabetes = df[df['diabetes'] == 0]['ejection_fraction']
ejection_diabetes = df[df['diabetes'] == 1]['ejection_fraction']

t_stat, p_value = stats.ttest_ind(ejection_no_diabetes, ejection_diabetes, equal_var=False)

print(f'T-test results:')
print(f'-t-statistic: {t_stat}')
print(f'-p-value: {p_value}')

alpha = 0.05
if p_value < alpha:
    print('There is a significant difference in ejection fraction between patients with and without diabetes.')
else:
    print('There is no significant difference in ejection fraction between patients with and without diabetes.')


10. How does the serum creatinine level vary between patients who survived and those who did not?

In [None]:
# Plot boxplots to compare serum creatinine distribution by survival status
plt.figure(figsize=(8, 6))
sns.boxplot(x='DEATH_EVENT', y='serum_creatinine', data=df)
plt.title('Serum Creatinine Levels by Survival Status')
plt.xlabel('Survival Status (0: Survived, 1: Not Survived)')
plt.ylabel('Serum Creatinine Level')
plt.show()