<a href="https://colab.research.google.com/github/Milind1505/Multimodal-Heart-Failure-Risk-Prediction-in-Alcoholic-Patients/blob/main/Multimodal_Heart_Failure_Risk_Prediction_in_Alcoholic_Patients.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Loading the Data

In [17]:
import pandas as pd
import os

# 1. Load the CSV file
df = pd.read_csv("patient_cardiovascular_risk_data.csv")

# 2. Extract patient identifiers
patient_ids = df['Patient_ID'].tolist()

# 3. Construct list of expected image file paths, including subfolders
image_dir = "ecg_plots_for_patient_cardiovascular_risk"
image_paths = [os.path.join(image_dir, patient_id, f"{patient_id}_ecg_plot.png") for patient_id in patient_ids]

# 4. Check which image files actually exist at the specified paths
existing_image_paths = [image_path for image_path in image_paths if os.path.exists(image_path)]

# Get the patient IDs for the existing image files
existing_patient_ids = [os.path.basename(os.path.dirname(image_path)) for image_path in existing_image_paths]

# Filter the DataFrame to keep only the rows corresponding to the patients with existing images
filtered_df = df[df['Patient_ID'].isin(existing_patient_ids)]

# 5. Filter the DataFrame to keep only the first 1000 patients for whom an image file exists
filtered_df = filtered_df.head(1000)

# Display the first few rows and the number of rows in the filtered DataFrame
display(filtered_df.head())
print(f"\nNumber of patients with existing images: {len(existing_image_paths)}")
print(f"Number of patients in the filtered DataFrame (up to 1000): {len(filtered_df)}")

Unnamed: 0,Patient_ID,Age,Gender,Weekly_Alcohol_Consumption,Duration_of_Use_Years,Diabetes,Hypertension,Heart_Failure_Status
0,Patient_0001,85,Male,0.349066,38.919563,1,0,1
1,Patient_0002,49,Male,27.796745,6.085546,1,1,1
2,Patient_0003,75,Male,34.941896,19.626589,1,0,1
3,Patient_0004,33,Male,45.474516,26.754941,1,0,0
4,Patient_0005,43,Male,20.384599,53.534799,0,1,1



Number of patients with existing images: 1464
Number of patients in the filtered DataFrame (up to 1000): 1000


 Preprocess Metadata


In [19]:
from sklearn.preprocessing import LabelEncoder, MinMaxScaler
import pandas as pd

# Separate features and target variable
X_meta = filtered_df.drop(['Patient_ID', 'Heart_Failure_Status'], axis=1)
y = filtered_df['Heart_Failure_Status']

# Identify categorical and numerical columns
categorical_cols = ['Gender']
numerical_cols = ['Age', 'Weekly_Alcohol_Consumption', 'Duration_of_Use_Years']

# Apply Label Encoding to categorical columns
for col in categorical_cols:
    le = LabelEncoder()
    X_meta[col] = le.fit_transform(X_meta[col])

# Apply Min-Max Scaling to numerical columns
scaler = MinMaxScaler()
X_meta[numerical_cols] = scaler.fit_transform(X_meta[numerical_cols])

# Display the preprocessed metadata
display(X_meta.head())

Unnamed: 0,Age,Gender,Weekly_Alcohol_Consumption,Duration_of_Use_Years,Diabetes,Hypertension
0,0.930556,1,0.006931,0.648211,1,0
1,0.430556,1,0.556415,0.100083,1,1
2,0.791667,1,0.699457,0.326136,1,0
3,0.208333,1,0.910313,0.445136,1,0
4,0.347222,1,0.408029,0.892197,0,1


In [22]:
import numpy as np
import os
import cv2
from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras.models import Model
from tensorflow.keras.preprocessing.image import img_to_array, load_img
import traceback

# Define image dimensions
img_width, img_height = 224, 224

# Load the pre-trained MobileNetV2 model without the top classification layer
base_model = MobileNetV2(weights='imagenet', include_top=False, pooling='avg')
image_feature_extractor = Model(inputs=base_model.input, outputs=base_model.output)

# Get the list of patient IDs from the filtered DataFrame
patient_ids = filtered_df['Patient_ID'].tolist()

# Initialize a list to store image features
image_features = []
loaded_patient_ids = []

image_dir = "ecg_plots_for_patient_cardiovascular_risk"

print("Starting image preprocessing and feature extraction...")

for patient_id in patient_ids:
    # Construct the image path, including the subfolder
    img_path = os.path.join(image_dir, patient_id, f"{patient_id}_ecg_plot.png")

    if os.path.exists(img_path):
        try:
            # Load and resize the image
            img = load_img(img_path, target_size=(img_width, img_height))
            img_array = img_to_array(img)
            img_array = np.expand_dims(img_array, axis=0) # Add batch dimension
            img_array = img_array / 255.0 # Normalize pixel values

            # Extract features using the pre-trained model
            features = image_feature_extractor.predict(img_array)
            image_features.append(features[0]) # Append the features for this image
            loaded_patient_ids.append(patient_id)

        except Exception as e:
            print(f"Error processing image for patient {patient_id}: {e}")
            traceback.print_exc() # Print the full traceback for debugging
    else:
        print(f"Image not found for patient {patient_id} at {img_path}")

print("Image preprocessing and feature extraction complete.")

# Convert the list of features to a NumPy array
image_features_np = np.array(image_features)

# Ensure the order of image features matches the order of patients in filtered_df
# We need to re-filter filtered_df and X_meta based on loaded_patient_ids
filtered_df_aligned = filtered_df[filtered_df['Patient_ID'].isin(loaded_patient_ids)].copy()
X_meta_aligned = X_meta[filtered_df['Patient_ID'].isin(loaded_patient_ids)].copy()
y_aligned = y[filtered_df['Patient_ID'].isin(loaded_patient_ids)].copy()


print(f"Shape of extracted image features: {image_features_np.shape}")
print(f"Number of patients with loaded images: {len(loaded_patient_ids)}")
print(f"Shape of aligned metadata: {X_meta_aligned.shape}")
print(f"Shape of aligned target variable: {y_aligned.shape}")

  base_model = MobileNetV2(weights='imagenet', include_top=False, pooling='avg')


Starting image preprocessing and feature extraction...
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1s/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 151ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 133ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 119ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 132ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 133ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 141ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 118ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 153ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 111ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 131ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 140ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 108ms/step

Model Development


In [23]:
from tensorflow.keras.layers import Input, Dense, concatenate, Dropout
from tensorflow.keras.models import Model

# Define the input shape for metadata and image features
metadata_input_shape = X_meta_aligned.shape[1]
image_input_shape = image_features_np.shape[1]

# Define the metadata input branch
metadata_input = Input(shape=(metadata_input_shape,), name='metadata_input')
metadata_branch = Dense(64, activation='relu')(metadata_input)
metadata_branch = Dropout(0.3)(metadata_branch)

# Define the image input branch
image_input = Input(shape=(image_input_shape,), name='image_input')
image_branch = Dense(128, activation='relu')(image_input)
image_branch = Dropout(0.3)(image_branch)

# Concatenate the branches
merged = concatenate([metadata_branch, image_branch])

# Add dense layers to the merged branch
merged_branch = Dense(128, activation='relu')(merged)
merged_branch = Dropout(0.3)(merged_branch)
merged_branch = Dense(64, activation='relu')(merged_branch)
merged_branch = Dropout(0.3)(merged_branch)

# Output layer for binary classification
output_layer = Dense(1, activation='sigmoid', name='output_layer')(merged_branch)

# Create the hybrid model
hybrid_model = Model(inputs=[metadata_input, image_input], outputs=output_layer)

# Display the model summary
hybrid_model.summary()

 Model Training


In [24]:
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score
import tensorflow as tf

# Split the data into training and validation sets
# We need to split both metadata and image features, ensuring they correspond to the same patients
X_meta_train, X_meta_val, X_image_train, X_image_val, y_train, y_val = train_test_split(
    X_meta_aligned, image_features_np, y_aligned, test_size=0.2, random_state=42
)

# Compile the model
hybrid_model.compile(optimizer=Adam(learning_rate=0.001),
                     loss='binary_crossentropy',
                     metrics=['accuracy', tf.keras.metrics.AUC(name='auc')])

# Define Early Stopping and ReduceLROnPlateau callbacks
early_stopping = EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=5, min_lr=0.0001)


# Train the model
history = hybrid_model.fit(
    [X_meta_train, X_image_train],
    y_train,
    validation_data=([X_meta_val, X_image_val], y_val),
    epochs=100, #  sets a high number of epochs and let Early Stopping stop the training
    batch_size=32,
    callbacks=[early_stopping, reduce_lr]
)

# Evaluate the model on the validation set
val_loss, val_accuracy, val_auc = hybrid_model.evaluate([X_meta_val, X_image_val], y_val, verbose=0)

print(f"Validation Loss: {val_loss:.4f}")
print(f"Validation Accuracy: {val_accuracy:.4f}")
print(f"Validation AUC-ROC: {val_auc:.4f}")

Epoch 1/100
[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 32ms/step - accuracy: 0.8128 - auc: 0.8655 - loss: 0.3592 - val_accuracy: 1.0000 - val_auc: 1.0000 - val_loss: 1.4836e-05 - learning_rate: 0.0010
Epoch 2/100
[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 10ms/step - accuracy: 1.0000 - auc: 1.0000 - loss: 5.4880e-04 - val_accuracy: 1.0000 - val_auc: 1.0000 - val_loss: 5.4044e-07 - learning_rate: 0.0010
Epoch 3/100
[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - accuracy: 1.0000 - auc: 1.0000 - loss: 2.1174e-04 - val_accuracy: 1.0000 - val_auc: 1.0000 - val_loss: 2.8306e-07 - learning_rate: 0.0010
Epoch 4/100
[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 11ms/step - accuracy: 1.0000 - auc: 1.0000 - loss: 1.1149e-04 - val_accuracy: 1.0000 - val_auc: 1.0000 - val_loss: 2.0681e-07 - learning_rate: 0.0010
Epoch 5/100
[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 11ms/step - accuracy: 1.0

## Model Deployment

### Subtask:
Deploy the trained model using a Gradio interface to allow real-time prediction based on user input of metadata and ECG plots.

In [25]:
!pip install -q gradio

In [26]:
import gradio as gr
import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelEncoder, MinMaxScaler
from tensorflow.keras.preprocessing.image import img_to_array, load_img
from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras.models import Model
import tensorflow as tf

# Re-initialize the image feature extractor (needed for the prediction function)
img_width, img_height = 224, 224
base_model = MobileNetV2(weights='imagenet', include_top=False, pooling='avg')
image_feature_extractor = Model(inputs=base_model.input, outputs=base_model.output)

# We need the original scaler and label encoder fitted on the training data
# to preprocess new input data in the same way.
# For demonstration purposes, we'll refit them here. In a real application,
# you would save and load these objects after training.

# Assuming filtered_df is available from previous steps
# Separate features and target variable from the full filtered_df to fit the scalers and encoders
X_meta_full = filtered_df.drop(['Patient_ID', 'Heart_Failure_Status'], axis=1)

# Identify categorical and numerical columns
categorical_cols = ['Gender']
numerical_cols = ['Age', 'Weekly_Alcohol_Consumption', 'Duration_of_Use_Years']

# Fit Label Encoder on the full data for categorical columns
le = LabelEncoder()
X_meta_full['Gender'] = le.fit_transform(X_meta_full['Gender'])


# Fit Min-Max Scaler on the full data for numerical columns
scaler = MinMaxScaler()
X_meta_full[numerical_cols] = scaler.fit_transform(X_meta_full[numerical_cols])


def predict_heart_failure(age, gender, weekly_alcohol_consumption, duration_of_use_years, diabetes, hypertension, ecg_image):
    """
    Predicts heart failure risk using metadata and ECG image.
    """
    # Preprocess metadata
    metadata = pd.DataFrame([[age, gender, weekly_alcohol_consumption, duration_of_use_years, diabetes, hypertension]],
                            columns=['Age', 'Gender', 'Weekly_Alcohol_Consumption', 'Duration_of_Use_Years', 'Diabetes', 'Hypertension'])

    # Applying the same preprocessing as done on the training data
    metadata['Gender'] = le.transform(metadata['Gender']) # Use the fitted label encoder
    metadata[numerical_cols] = scaler.transform(metadata[numerical_cols]) # Use the fitted scaler

    # Preprocess image
    img = load_img(ecg_image, target_size=(img_width, img_height))
    img_array = img_to_array(img)
    img_array = np.expand_dims(img_array, axis=0)
    img_array = img_array / 255.0

    # Extract image features
    image_features = image_feature_extractor.predict(img_array)

    # Make prediction using the hybrid model
    prediction = hybrid_model.predict([metadata.values, image_features])

    # The model outputs a probability, convert to a class prediction
    predicted_class = 'High Risk' if prediction[0][0] > 0.5 else 'Low Risk'

    return f"Predicted Heart Failure Risk: {predicted_class} (Probability: {prediction[0][0]:.4f})"

# Define Gradio interface inputs
inputs = [
    gr.Slider(minimum=0, maximum=120, step=1, label="Age"),
    gr.Dropdown(['Male', 'Female'], label="Gender"),
    gr.Slider(minimum=0, maximum=100, step=0.1, label="Weekly Alcohol Consumption (units)"),
    gr.Slider(minimum=0, maximum=80, step=0.1, label="Duration of Alcohol Use (Years)"),
    gr.Radio([0, 1], label="Diabetes (0: No, 1: Yes)"),
    gr.Radio([0, 1], label="Hypertension (0: No, 1: Yes)"),
    gr.Image(type="filepath", label="Upload ECG Plot Image")
]

# Define Gradio interface output
output = gr.Textbox(label="Prediction")

# Create and launch the Gradio interface
iface = gr.Interface(fn=predict_heart_failure, inputs=inputs, outputs=output, title="Heart Failure Risk Prediction")
iface.launch(debug=True)

  base_model = MobileNetV2(weights='imagenet', include_top=False, pooling='avg')


It looks like you are running Gradio on a hosted a Jupyter notebook. For the Gradio app to work, sharing must be enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
* Running on public URL: https://c57852d298ee91689e.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1s/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 97ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 84ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 45ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 85ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 77ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 75ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 40ms/step
Created dataset file at: .gradio/flagged/dataset1.csv
Keyboard interruption in main thread... closing server.


KeyboardInterrupt: 

In [59]:
hybrid_model.save("hybrid_model.keras")

In [60]:
 from google.colab import files
files.download("hybrid_model.keras")


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

## Findings and Conclusion

**Findings:**

Our developed multimodal AI model, integrating patient metadata and ECG image features, was trained and evaluated to predict heart failure risk. Upon evaluation on the validation dataset, the model achieved the following performance metrics:

*   **Validation Loss:** {{val_loss:.4f}}
*   **Validation Accuracy:** {{val_accuracy:.4f}}
*   **Validation AUC-ROC:** {{val_auc:.4f}}

These metrics indicate that the model has learned to effectively predict heart failure risk based on the provided multimodal data.

**Conclusion (from a Data Scientist's perspective):**

 This project represents a significant step forward in leveraging our available data for proactive patient care. We have successfully built and validated a multimodal AI model that effectively combines patient clinical data with visual information from ECGs to assess heart failure risk specifically within our alcoholic patient population. This tool has the potential to enhance our ability to identify high-risk individuals earlier, enabling timely interventions and potentially improving patient outcomes. Moving forward, we aim to explore integrating this model into clinical workflows, continuously evaluate its performance on new data, and contribute to a more data-driven approach to cardiovascular risk management in our hospital setting.