## Heart failure prediction model
This notebook shows how to create the heart failure prediction model and how to use the created model to predict with new data.

### Attribute Information:

**Thirteen (13) clinical features:**

- *age*: age of the patient (years)
- *anaemia*: decrease of red blood cells or hemoglobin (boolean)
- *creatinine_phosphokinase*: level of the creatinine phosphokinase enzyme in the blood (mcg/L)
- *diabetes*: if the patient has diabetes (boolean)
- *ejection_fraction*: percentage of blood leaving the heart at each contraction (percentage)
- *high_blood_pressure*: if the patient has hypertension (boolean)
- *platelets*: platelets in the blood (kiloplatelets/mL)
- *serum_creatinine*: level of serum creatinine in the blood (mg/dL)
- *serum_sodium*: level of serum sodium in the blood (mEq/L)
- *sex*: woman or man (binary)
- *smoking*: if the patient smokes or not (boolean)
- *time*: follow-up period (days)
- [target/output class] *DEATH_EVENT*: if the patient deceased during the follow-up period (boolean)

### Dataset Source
https://archive.ics.uci.edu/ml/datasets/Heart+failure+clinical+records

## Load the libraries

Libraries used are:
- pandas
- sklearn (to create our prediction model)
- Flask (as back-end of our web app)
- waitress (to run the Flask web app)
- pymongo (to communicate with mongodb database)
- matplotlib

*Notes:*
- The MongoDB will be used to store the dataset as well as the future prediction results.
- Make sure you have mongodb installed in your VM if you are working with your own VM instance or use the MongoDB provided in this class.
- A MongoDB instance address will be shared in the class.

In [None]:
# import the modules/libraries
import pandas as pd
from pymongo import MongoClient

# suppress all warnings (ignore unnecessary warnings msgs)
import warnings
warnings.filterwarnings("ignore")

## Load the dataset

In [None]:
df = pd.read_csv('heart_failure_clinical_records_dataset.csv') #read the "heart_failure_clinical_records_dataset.csv" file and assign it to df variable

In [None]:
df.head() # show the first 5 data

## Store the dataset into mongoDB (if you haven't done it)

In [None]:
# 127.0.0.1 is the local mongodb address installed in this VM
client = MongoClient('mongodb://127.0.0.1:27017/') 

In [None]:
# YOU SHOULD change 'YOURDB' with your STUDENTID
# Otherwise, you might end up accessing the same database as your classmate.
db = client['YOURDB'] 

In [None]:
#heart_table is the collection (table) name in our mongodb database
heart_table = db['heart_table']

In [None]:
df_mongo = df.copy() #copy the dataset

#convert the data into dictionary before saving it into mongodb
df_mongo.reset_index(drop=True)
data_dict = df_mongo.to_dict("records")

In [None]:
data_dict[:2] # show 2 data for example

In [None]:
# Insert all the records into mongodb collection
heart_table.insert_many(data_dict)

## Load the dataset from mongoDB database

In [None]:
# 127.0.0.1 is the local mongodb address installed in this VM
client = MongoClient('mongodb://127.0.0.1:27017/')
# YOU SHOULD change 'YOURDB' with your STUDENTID
db = client['YOURDB']
#heart_table is the collection (table) name in our mongodb database
heart_table = db['heart_table']

# query all the records inside the mongodb collection
heart_table_cursor = heart_table.find()

# convert it into dataframe
heart_df = pd.DataFrame(list(heart_table_cursor))

heart_df = heart_df.drop(['_id'], axis=1) # drop _id column

In [None]:
heart_df.head() # show the first 5 data

## Data exploration

In [None]:
heart_df.describe() #describe the data

In [None]:
heart_df.columns #show the columns name

In [None]:
heart_df.shape #show the shape of the data (rows size, column size)

In [None]:
# Data Info
heart_df.info()

In [None]:
# check missing values for each column 
heart_df.isnull().sum().sort_values(ascending=False)

In [None]:
heart_df.groupby('sex').size() #group the data based on column name: sex

In [None]:
age = heart_df['age'] # assign age with the data from dataframe df['age']
age.hist(bins=10) #plot the histogram

## Splitting data into X_data (input) and Y_data (output)

In [None]:
X_data = heart_df.drop(['DEATH_EVENT'], axis=1) #drop the column 'DEATH_EVENT' as it is not used as input X_data

In [None]:
Y_data = heart_df['DEATH_EVENT'] #copy column 'DEATH_EVENT' as output Y_data

## Splitting the data into train and test for each X_data (input), and Y_data (output)

In [None]:
#import function to split the data into training and testing
from sklearn.model_selection import train_test_split 

#Split the data into train and test for each X and y; test_size=0.3 means 30% for test data and the rest for training
X_train, X_test, y_train, y_test = train_test_split(X_data, Y_data, random_state=0, test_size=0.3) 

## Initiate the model and train it using training data

In [None]:
from sklearn.ensemble import RandomForestClassifier #import RF classifier

model = RandomForestClassifier(random_state=0) # initiate the classifier/model
model = model.fit(X_train, y_train) # training the model/classifier with training data (X_train and y_train)

## Evaluate the model performance by predicting the output of test data and comparing it with the real test output

In [None]:
from sklearn import metrics #import the metrics from sklearn

y_pred = model.predict(X_test) # predict the X_test
confusion_matrix = metrics.confusion_matrix(y_test, y_pred) # confusion matrix
accuracy = metrics.accuracy_score(y_test, y_pred) # calculate accuracy
precision = metrics.precision_score(y_test, y_pred) # calculate precision
recall = metrics.recall_score(y_test, y_pred) #calculate recall

#get the true negative, false positive, false negative, and true positive
tn, fp, fn, tp = metrics.confusion_matrix(y_test, y_pred).ravel() 
specificity = tn / (tn+fp)

# print / show the output
print("Confusion Matrix:", confusion_matrix)
print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("Specificity", specificity)

## Export the prediction model into FILENAME.model file

In [None]:
# save the model using joblib
import joblib

filename = "RF.model" #filename
joblib.dump(model, filename)

## Import the prediction model and use it to predict with new data

In [None]:
import joblib
import numpy as np

filename = "RF.model" #filename
loaded_model = joblib.load(filename)

#new data
'''
features = ['age', 'anaemia', 'creatinine_phosphokinase', 'diabetes','ejection_fraction', 'high_blood_pressure', 'platelets','serum_creatinine', 'serum_sodium', 'sex', 'smoking', 'time'
'''

new_data = np.array([[50.0, 0, 196, 0, 45, 0, 395000.0, 1.6, 136, 1, 1, 285]])
#predict new_data
new_data_pred = loaded_model.predict(new_data)

print("Predicted as", new_data_pred[0])

## Now let's create a web app so that it can be useful :)

check and run the webapp.py code