# Advanced Certification Program in AI and MLOps
## A program by IISc and TalentSprint
### Mini-Project: Patient Survival Prediction using XGBoost

## Learning Objectives

At the end of the experiment, you will be able to:

* perform data preprocessing on the Heart failure dataset
* train an XGBoost model to predict survival of patients with heart failure
* save your trained model
* creare Gradio application
* deploy application with AWS

## Dataset Description

[Heart failure clinical records dataset](https://archive.ics.uci.edu/ml/datasets/Heart+failure+clinical+records) contains the medical records of 299 patients who had **heart failure**, collected during their follow-up period, where each patient profile has 13 clinical features given as:

- **age**: age of the patient (years)
- **anaemia**: decrease of red blood cells or hemoglobin (boolean)
- **high_blood_pressure**: if the patient has hypertension (boolean)
- **creatinine_phosphokinase (CPK)**: level of the CPK enzyme in the blood (mcg/L)
- **diabetes**: if the patient has diabetes (boolean)
- **ejection_fraction**: percentage of blood leaving the heart at each contraction (percentage)
- **platelets**: platelets in the blood (kiloplatelets/mL)
- **sex**: woman or man (binary)
- **serum_creatinine**: level of serum creatinine in the blood (mg/dL)
- **serum_sodium**: level of serum sodium in the blood (mEq/L)
- **smoking**: if the patient smokes or not (boolean)
- **time**: follow-up period (days)
- **DEATH_EVENT**: if the patient deceased during the follow-up period (boolean)

## Information

Cardiovascular diseases kill millions of people globally every year, and they mainly exhibit as myocardial infarctions and heart failures. Heart failure occurs when the heart cannot pump enough blood to meet the needs of the body.Available electronic medical records of patients quantify symptoms, body features, and clinical laboratory test values, which can be used to perform biostatistics analysis aimed at highlighting patterns and correlations otherwise undetectable by medical doctors. Machine learning, in particular, can predict patients' survival from their data and can individuate the most important features among those included in their medical records.

### Problem Statement

* Build a XGBoost classifier to predict survival of patients with heart failure
* Deploy the application with AWS ECR and ECS

Please refer to ***The demo session held on 4th May - Deployment with AWS ECR and ECS*** to get familiar with how to deploy the application with AWS.

### Install XGBoost library

In [None]:
!pip -qq install xgboost

### Import required packages

In [None]:
import numpy as np
import pandas as pd
import joblib
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, f1_score
from xgboost import XGBClassifier

In [None]:
#@title Download the dataset
!wget -q https://cdn.iisc.talentsprint.com/CDS/Datasets/heart_failure_clinical_records_dataset.csv
!ls | grep '.csv'

### Load the dataset

In [None]:
# Load dataset
df = pd.read_csv('heart_failure_clinical_records_dataset.csv')
df.head()

In [None]:
# Shape of dataset
df.shape

### Check missing values

In [None]:
# Check for missing values
df.isna().sum()

### Handle Outliers

In [None]:
# Checking for outliers
df.boxplot()
plt.xticks(rotation=90)
plt.show()

In [None]:
# Handing outliers
outlier_colms = ['creatinine_phosphokinase', 'ejection_fraction', 'platelets', 'serum_creatinine', 'serum_sodium']
df1 = df.copy()

def handle_outliers(df, colm):
    '''Change the values of outlier to upper and lower whisker values '''
    q1 = df.describe()[colm].loc["25%"]
    q3 = df.describe()[colm].loc["75%"]
    iqr = q3 - q1
    lower_bound = q1 - (1.5 * iqr)
    upper_bound = q3 + (1.5 * iqr)
    for i in range(len(df)):
        if df.loc[i,colm] > upper_bound:
            df.loc[i,colm]= upper_bound
        if df.loc[i,colm] < lower_bound:
            df.loc[i,colm]= lower_bound
    return df

for colm in outlier_colms:
    df1 = handle_outliers(df1, colm)

In [None]:
# Recheck for outliers
df1.boxplot()
plt.xticks(rotation=90)
plt.show()

### Split into training and testing set

In [None]:
# Split dataset into training and testing set, considering all features for prediction

X = df1.iloc[:, :-1].values
y = df1['DEATH_EVENT'].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, stratify = y, random_state= 123)

In [None]:
X_train[1]

### Model Training

In [None]:
xgb_clf = XGBClassifier(n_estimators=200, max_depth=4, max_leaves=5, random_state=42)
xgb_clf.fit(X_train, y_train)

### Model Performance

In [None]:
# Accuracy

train_acc = accuracy_score(y_train, xgb_clf.predict(X_train))
test_acc = accuracy_score(y_test, xgb_clf.predict(X_test))
print("Training accuracy: ", train_acc)
print("Testing accuracy: ", test_acc)

In [None]:
# F1-score

train_f1 = f1_score(y_train, xgb_clf.predict(X_train))
test_f1 = f1_score(y_test, xgb_clf.predict(X_test))
print("Training F1 score: ", train_f1)
print("Testing F1 score: ", test_f1)

### Save the trained model

In [None]:
# Prepare versioned save file name
save_file_name = "xgboost-model.pkl"

joblib.dump(xgb_clf, save_file_name)

## Gradio Implementation

In [None]:
!pip -q install gradio

In [None]:
import gradio
import joblib
import numpy as np

In [None]:
# Load your trained model

# YOUR CODE HERE

In [None]:
# Function for prediction

def predict_death_event(# YOUR CODE HERE for parameters):

    # YOUR CODE HERE...



For categorical user input, user [Radio](https://www.gradio.app/docs/radio) button component.

For numerical user input, user [Slider](https://www.gradio.app/docs/slider) component.

In [None]:
# Description summary of the dataset
# YOUR CODE HERE

#Hint: describe()

In [None]:
# Inputs from user
# YOUR CODE HERE ...

# Output response
# YOUR CODE HERE


In [None]:
# Gradio interface to generate UI link
title = "Patient Survival Prediction"
description = "Predict survival of patient with heart failure, given their clinical record"

iface = gradio.Interface(fn = predict_death_event,
                         inputs = # YOUR CODE HERE,
                         outputs = # YOUR CODE HERE,
                         title = title,
                         description = description,
                         allow_flagging='never')

iface.launch(share = True)  # server_name="0.0.0.0", server_port = 8001   # Ref: https://www.gradio.app/docs/interface