# Heart Failure Survival Analysis

by Merari Santana, Kevin Gao, Gurmehak Kaur, Yuhan Fan

## Summary

We built a classification model using the logistic regression algorithm to predict survival outcomes for patients with heart failure. Using patient test results, the final classifier achieves an accuracy of 81.6%. The model’s precision of 70.0% suggests it is moderately conservative in predicting the positive class (death), minimizing false alarms. More importantly, the recall of 73.68% ensures the model identifies the majority of high-risk patients, reducing the likelihood of missing true positive cases, which could have serious consequences. The F1-score of 0.71 reflects a good balance between precision and recall, highlighting the model’s robustness in survival prediction.

From the confusion matrix, the model correctly identified 14 patients who passed away (true positives) and 35 patients who survived (true negatives). However, it also predicted 6 false positives, incorrectly classifying some survivors as deceased, and missed 5 actual cases of death (false negatives). While these errors warrant consideration, the model’s performance demonstrates strong predictive capabilities for both positive and negative outcomes.

Overall, the logistic regression classifier effectively leverages patient test results to support survival prediction, providing a valuable tool to aid clinical decision-making in heart failure management.


## Introduction

Cardiovascular diseases are responsible for approximately 17 million deaths globally each year, with heart failure and myocardial infarctions being the leading contributors to this staggering toll. Electronic medical records from patients with heart failure, collected during follow-up care, provide a wealth of data on symptoms, test results, and clinical outcomes. Leveraging this data, our team applies machine learning algorithms to predict patient survival after heart failure. This approach uncovers critical patterns and insights that might otherwise remain hidden from traditional clinical assessments, offering valuable tools to support medical decision-making and improve patient outcomes. 

## Data 

We analyzed a dataset containing the medical records of 299 heart failure patients. The patients consisted of 105 women and 194 men, and their ages range between 40 and 95 years old. The dataset contains 13 features, which report clinical, body, and lifestyle information. The **death event** was used as the target variable in our binary classification study. It states whether the patient died or survived before the end of the follow-up period, which lasted 130 days on average. Our dataset has a class imbalance where the number of survived patients (death event = 0) is 203 (67.89%) and the number of dead patients (death event = 1) is 96 (32.11%).

| Column Name            | Description                                                  |
|------------------------|--------------------------------------------------------------|
| age                    | Patient's age                                          |
| anaemia                | Decrease of red blood cells or hemoglobin                    |
| creatinine_phosphokinase| Level of the CPK enzyme in the blood                        |
| diabetes               | If the patient has diabetes                                  |
| ejection_fraction      | Percentage of blood leaving the heart at each contraction    |
| high_blood_pressure    | If the patient has hypertension                              |
| platelets              | Platelets in the blood                                       |
| serum_creatinine       | Level of serum creatinine in the blood                       |
| serum_sodium           | Level of serum sodium in the blood                           |
| sex                    | Woman or man                                                 |
| smoking                | If the patient smokes or not                                 |
| time                   | Follow-up period                                             |
| DEATH_EVENT            | Whether the patient died or not (target variable)            |


## Model

We compared Decision Tree, KNN, Logistic Regression, and selected Logistic Regression due to its interpretability, and ability to handle both linear and non-linear relationships between features. Logistic Regression performed better than the other two models as it works well with fewer features and is less prone to overfitting compared to more complex models like Decision Trees or KNN, especially when the data is relatively small.

## Results and Conculsion

The analysis revealed that `platelets` and `ejection_fraction` are the most important features in predicting the risk of patient mortality. These features significantly impact the model's ability to assess patient risk, which is crucial for early intervention. Our model achieved a recall score of 0.73, which is a good start, but there is room for improvement, particularly in reducing the number of high risk patients the model might miss, i.e., maximising recall by minimising False Negatives.

The main challenges in this project stem from class imbalance and limited data availability. With more diverse and comprehensive datasets, performance could be further enhanced. We would also like to explore other machine learning models to improve the overall accuracy.

In conclusion, while the current model shows potential, there is significant opportunity to enhance its effectiveness. With improvements in data quality and model optimization, this tool could become a crucial asset in predicting patient risk and saving lives.

## EDA and Analysis

### Dataset and Imports

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import altair as alt
import altair_ally as aly
import os
from vega_datasets import data
from sklearn import set_config
from sklearn.model_selection import (GridSearchCV, cross_validate, train_test_split,)
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import make_column_transformer
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score, f1_score


# Enable Vegafusion for better data transformation
aly.alt.data_transformers.enable('vegafusion')
alt.data_transformers.enable('vegafusion')

DataTransformerRegistry.enable('vegafusion')

In [2]:
# Load the dataset
file_path = 'data/heart_failure_clinical_records_dataset.csv'
heart_failure_data = pd.read_csv(file_path)

# List of binary columns
binary_columns = ['anaemia', 'diabetes', 'high_blood_pressure', 'sex', 'smoking', 'DEATH_EVENT']

# Convert all binary columns to True/False
heart_failure_data[binary_columns] = heart_failure_data[binary_columns].astype(bool)

### EDA and Visualisations

In [3]:
heart_failure_data.shape

(299, 13)

In [4]:
heart_failure_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 299 entries, 0 to 298
Data columns (total 13 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   age                       299 non-null    float64
 1   anaemia                   299 non-null    bool   
 2   creatinine_phosphokinase  299 non-null    int64  
 3   diabetes                  299 non-null    bool   
 4   ejection_fraction         299 non-null    int64  
 5   high_blood_pressure       299 non-null    bool   
 6   platelets                 299 non-null    float64
 7   serum_creatinine          299 non-null    float64
 8   serum_sodium              299 non-null    int64  
 9   sex                       299 non-null    bool   
 10  smoking                   299 non-null    bool   
 11  time                      299 non-null    int64  
 12  DEATH_EVENT               299 non-null    bool   
dtypes: bool(6), float64(3), int64(4)
memory usage: 18.2 KB


In [5]:
heart_failure_data['DEATH_EVENT'].value_counts()

DEATH_EVENT
False    203
True      96
Name: count, dtype: int64

* Dataset Size: The dataset is relatively small, with only 300 rows.
* Class Imbalance: The target variable, DEATH_EVENT, has few examples in the "True" class (i.e., the event occurred), which might affect the model's ability to learn and generalize well. This class imbalance will be taken into consideration during analysis and model evaluation.

In [6]:
# Summary statistics
print("Summary Statistics:")
heart_failure_data.describe()

Summary Statistics:


Unnamed: 0,age,creatinine_phosphokinase,ejection_fraction,platelets,serum_creatinine,serum_sodium,time
count,299.0,299.0,299.0,299.0,299.0,299.0,299.0
mean,60.833893,581.839465,38.083612,263358.029264,1.39388,136.625418,130.26087
std,11.894809,970.287881,11.834841,97804.236869,1.03451,4.412477,77.614208
min,40.0,23.0,14.0,25100.0,0.5,113.0,4.0
25%,51.0,116.5,30.0,212500.0,0.9,134.0,73.0
50%,60.0,250.0,38.0,262000.0,1.1,137.0,115.0
75%,70.0,582.0,45.0,303500.0,1.4,140.0,203.0
max,95.0,7861.0,80.0,850000.0,9.4,148.0,285.0


In [7]:
# Check for missing values

missing_values = heart_failure_data.isnull().sum()
print("\nMissing Values:")
print(missing_values)


Missing Values:
age                         0
anaemia                     0
creatinine_phosphokinase    0
diabetes                    0
ejection_fraction           0
high_blood_pressure         0
platelets                   0
serum_creatinine            0
serum_sodium                0
sex                         0
smoking                     0
time                        0
DEATH_EVENT                 0
dtype: int64


No missing values, no imputation or filling Nulls required

In [8]:
aly.heatmap(heart_failure_data,color="DEATH_EVENT")

In [9]:
# Distributions of all columns
print("Visualizing distributions for all columns...")
aly.dist(heart_failure_data)

Visualizing distributions for all columns...


In [10]:
aly.pair(heart_failure_data,color="DEATH_EVENT")

In [11]:
aly.corr(data.movies())

In [12]:
aly.parcoord(heart_failure_data,color = 'DEATH_EVENT')

In [13]:
# Create the distribution plots
aly.dist(heart_failure_data,color = 'DEATH_EVENT')

### Data Splitting

In [14]:
heart_failure_data = pd.read_csv(file_path)

heart_failure_train, heart_failure_test = train_test_split(heart_failure_data, 
                                                           train_size = 0.8, 
                                                           stratify = heart_failure_data['DEATH_EVENT'],
                                                           random_state = 522)

url_processed = 'data/processed/'
heart_failure_train.to_csv(os.path.join(url_processed, 'heart_failure_train.csv'))
heart_failure_test.to_csv(os.path.join(url_processed, 'heart_failure_test.csv'))

### Preprocessing columns

In [15]:
# Define numeric columns
numeric_columns = ['age', 'creatinine_phosphokinase', 'ejection_fraction', 
                   'platelets', 'serum_creatinine', 'serum_sodium', 'time']
# List of binary columns
binary_columns = ['anaemia', 'diabetes', 'high_blood_pressure', 'sex', 'smoking']

# Convert all binary columns to True/False so they're treated as categorical data
heart_failure_train[binary_columns] = heart_failure_train[binary_columns].astype(bool)
heart_failure_test[binary_columns] = heart_failure_test[binary_columns].astype(bool)

In [16]:
preprocessor = make_column_transformer(
    (StandardScaler(), numeric_columns),
    (OneHotEncoder(handle_unknown="ignore", sparse_output=False, drop='if_binary', dtype = int), binary_columns),
    remainder = 'passthrough'
)

# preprocessor.fit(heart_failure_train)
# heart_failure_scaled_train = preprocessor.transform(heart_failure_train)
# heart_failure_scaled_test = preprocessor.transform(heart_failure_test)

## Building the Model
Testing Decision Tree, KNN, Logistic Regression

### Decision Tree

In [17]:
pipeline = make_pipeline(
        preprocessor, 
        DecisionTreeClassifier(random_state=522)
    )

dt_scores = cross_validate(pipeline, 
                           heart_failure_train.drop(columns=['DEATH_EVENT']), 
                           heart_failure_train['DEATH_EVENT'],
                           return_train_score=True
                          )

dt_scores = pd.DataFrame(dt_scores).sort_values('test_score', ascending = False)
dt_scores

Unnamed: 0,fit_time,score_time,test_score,train_score
4,0.00557,0.002393,0.829787,1.0
1,0.006578,0.002455,0.8125,1.0
3,0.004832,0.002262,0.791667,1.0
2,0.005425,0.003197,0.770833,1.0
0,0.022048,0.009246,0.666667,1.0


### KNN

In [18]:
pipeline = make_pipeline(
        preprocessor, 
        KNeighborsClassifier()
    )

param_grid = {
    "kneighborsclassifier__n_neighbors": range(1, 100, 3)
}

grid_search = GridSearchCV(
    pipeline,
    param_grid,
    cv=10,  
    n_jobs=-1,  
    return_train_score=True,
)

heart_failure_fit = grid_search.fit(heart_failure_train.drop(columns=['DEATH_EVENT']), heart_failure_train['DEATH_EVENT'] )

knn_best_model = grid_search.best_estimator_ 
knn_best_model

  _data = np.array(data, dtype=dtype, copy=copy,


In [19]:
pd.DataFrame(grid_search.cv_results_).sort_values('mean_test_score', ascending = False)[['params', 'mean_test_score']].iloc[0]

params             {'kneighborsclassifier__n_neighbors': 19}
mean_test_score                                     0.777899
Name: 6, dtype: object

 _

#### Logistic Regression

In [20]:
pipeline = make_pipeline(
        preprocessor, 
        LogisticRegression(random_state=522, max_iter=2000, class_weight = "balanced")
    )

param_grid = {
    "logisticregression__C": 10.0 ** np.arange(-4, 7, 1)
}

grid_search = GridSearchCV(
    pipeline,
    param_grid,
    cv=10,  
    n_jobs=-1,  
    return_train_score=True
)

heart_failure_fit = grid_search.fit(heart_failure_train.drop(columns=['DEATH_EVENT']), heart_failure_train['DEATH_EVENT'] )

lr_best_model = grid_search.best_estimator_.named_steps['logisticregression']
lr_best_model

In [21]:
lr_scores = pd.DataFrame(grid_search.cv_results_).sort_values('mean_test_score', ascending = False)[['param_logisticregression__C', 'mean_test_score', 'mean_train_score']]
lr_scores.iloc[0:5]

Unnamed: 0,param_logisticregression__C,mean_test_score,mean_train_score
1,0.001,0.828261,0.833559
0,0.0001,0.824094,0.834488
2,0.01,0.819928,0.825187
3,0.1,0.815761,0.824266
4,1.0,0.799275,0.823807


**Model is performing well with C = 0.0010 with a high test score, close to train score, indicating that model isn't overfitting or underfitting**

In [22]:
alt.Chart(lr_scores).mark_line().encode(
    x=alt.X("param_logisticregression__C:Q", title="C (Regularization Parameter)"),
    y=alt.Y("mean_test_score:Q", title="Score"),
    color=alt.value("skyblue"),  # Fixed color for CV score
    tooltip=["param_logisticregression__C", "mean_test_score"]
).properties(
    title="Training vs Cross-Validation Scores",
    width=600,
    height=400
) + alt.Chart(lr_scores).mark_line().encode(
    x=alt.X("param_logisticregression__C:Q", title="C (Regularization Parameter)"),
    y=alt.Y("mean_train_score:Q", title="Score"),
    color=alt.value("pink"),  # Fixed color for Train score
    tooltip=["param_logisticregression__C", "mean_train_score"]
)

**Logistic regression performs better than decision trees and KNN on the cross validation data, hence, we will select it as our final model**

In [23]:
features = lr_best_model.coef_
feature_names = heart_failure_train.drop(columns=['DEATH_EVENT']).columns
coefficients = pd.DataFrame({
    'Feature': feature_names,
    'Coefficient': features[0],
    'Absolute_Coefficient': abs(features[0])
}).sort_values(by='Absolute_Coefficient', ascending=False)

coefficients

Unnamed: 0,Feature,Coefficient,Absolute_Coefficient
6,platelets,-0.062528,0.062528
4,ejection_fraction,0.037601,0.037601
2,creatinine_phosphokinase,-0.032684,0.032684
0,age,0.026302,0.026302
5,high_blood_pressure,-0.021535,0.021535
1,anaemia,0.009568,0.009568
9,sex,0.006571,0.006571
7,serum_creatinine,0.00387,0.00387
3,diabetes,-0.003343,0.003343
11,time,-0.002138,0.002138


## Model Evaluation

#### Confusion Matrix

In [24]:
# Confusion Matrix

heart_failure_predictions = heart_failure_test.assign(
    predicted=heart_failure_fit.predict(heart_failure_test)
)

cm_crosstab = pd.crosstab(heart_failure_predictions['DEATH_EVENT'], 
                          heart_failure_predictions['predicted'], 
                          rownames=["Actual"], 
                          colnames=["Predicted"]
                         )


cm_crosstab
# cm = confusion_matrix(heart_failure_test["DEATH_EVENT"], heart_failure_fit.predict(heart_failure_test))
# cm

Predicted,0,1
Actual,Unnamed: 1_level_1,Unnamed: 2_level_1
0,35,6
1,5,14


In [25]:
accuracy = accuracy_score(heart_failure_predictions['DEATH_EVENT'], heart_failure_predictions['predicted'])
precision = precision_score(heart_failure_predictions['DEATH_EVENT'], heart_failure_predictions['predicted'])
recall = recall_score(heart_failure_predictions['DEATH_EVENT'], heart_failure_predictions['predicted'])
f1 = f1_score(heart_failure_predictions['DEATH_EVENT'], heart_failure_predictions['predicted'])

print(f"Accuracy: {accuracy:.4f}")
print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")
print(f"F1 Score: {f1:.4f}")

Accuracy: 0.8167
Precision: 0.7000
Recall: 0.7368
F1 Score: 0.7179


## References

Chicco, D., Jurman, G. Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone. BMC Med Inform Decis Mak 20, 16 (2020). https://doi.org/10.1186/s12911-020-1023-5

Dua, Dheeru, and Casey Graff. 2017. “UCI Machine Learning Repository.” University of California, Irvine, School of Information; Computer Sciences. http://archive.ics.uci.edu/ml.

Heart Failure Clinical Records [Dataset]. (2020). UCI Machine Learning Repository. https://doi.org/10.24432/C5Z89R.