<a id="4"></a><h1 style='background:#6daa9f; border:3; color:white'><center> Web application for cardiovascular disease modelling: beginner's guide </center></h1>

<center><img 
src="https://www.verywellhealth.com/thmb/RTA0B0j5XCAn0rNbL98bfimDBms=/800x450/filters:fill(87E3EF,1)/Anim_HeartDisease-053487127daf48fd98162dfdb84206b6.gif" width="900" height="900"></img></center>

<br>

<a id="4"></a><h1 style='background:#7ad16d; border:0; color:black'><center> Table of contents </center></h1>

1. [Introduction](#1)
1. [Data cleaning, exploration and preprocessing](#2)
1. [Basic model building](#3)
1. [Web development](#4)
1. [Acknowledgements](#5)

<a id="1"></a><h1 style='background:#7ad16d; border:0; color:black'><center> Introduction </center></h1>

Cardiovascular disease (CVD) is the most common cause of morbidity and mortality among men and women globally. An estimated 18 million deaths are reported from CVDs annually, representing nearly a third of all global deaths. Most of these deaths (85%) are due to heart attack and stroke. Every three in four CVD deaths happen in low and middle income countries. Heart failure is a commong CVD condition. 

[World Health Organization](https://www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds)) defines CVDs as a group of disorders of the heart and blood vascular system including but not limited to: <br>
    * Coronary heart disease
    * Cerebrovascular disease 
    * Peripheral heart disease
    * Rheumatic heart disease
    * Congential heart disease
    * Deep vein thrombosis and pulmonary embolism 

<a id="2"></a><h1 style='background:#7ad16d; border:0; color:black'><center> Data cleaning, exploration and preprocessing </center></h1>

**Dataset** <br> <br>
**Age**: age in years.<br>
**Sex**: sex (1=male; 0=female).<br>
**Cp**: chest pain type (0 = typical angina; 1 = atypical angina; 2 = non-anginal pain; 3: asymptomatic).<br>
**Trestbps**: resting blood pressure in mm Hg on admission to the hospital.<br>
**Chol**: serum cholesterol in mg/dl.
fbs: fasting blood sugar > 120 mg/dl (1=true; 0=false).<br>
**Restecg**: resting electrocardiographic results ( 0=normal; 1=having ST-T wave abnormality; 2=probable or definite left ventricular hypertrophy).<br>
**Thalach**: maximum heart rate achieved.
Exang: exercise-induced angina (1=yes; 0=no).<br>
**Oldpeak**: ST depression induced by exercise relative to rest.<br>
**Slope**: the slope of the peak exercise ST segment (0=upsloping; 1=flat; 2=downsloping).<br>
**Ca**: number of major vessels (0–3) colored by fluorosopy.<br>
**Thal**: thalassemia (3=normal; 6=fixed defect; 7=reversable defect).<br>
**Target**: heart disease (1=no, 2=yes).<br>

<a id="3"></a><h1 style='background:#7ad16d; border:0; color:black'><center> Basic model building </center></h1>

**Import libraries**

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import re

from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier, GradientBoostingClassifier
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report, make_scorer

from plotly.offline import iplot
import plotly as py
import plotly.tools as tls

import pickle

In [None]:
heart = pd.read_csv('../input/heart-data/data.csv')
heart.head()

In [None]:
heart.shape
#270 observations and 14 columns/variables in the dataset.

In [None]:
heart.info()
#there are 270 observations in the datsets: with nearly all variables in numeric format.

In [None]:
heart.describe()

In [None]:
#checking missing values
heart.isnull().sum()

In [None]:
#distribution of outcome variable 'target': '1' means No and '2' means Yes for heart disease.
heart['target'].value_counts()

In [None]:
#% of patients who have heart disease
heart['target'].value_counts()/heart.shape[0]*100

In [None]:
#Pie-chart for visualization of heart disease (1: No, 2: Yes)
labels=['Yes','No']
values=heart['target'].value_counts().values

sns.set_theme(context='poster')
plt.figure(figsize=(7,7))
plt.title('Heart Diseases', color="Black",fontsize=40)

plt.pie(values, labels=labels, autopct='%1.0f%%')
plt.show()

**Correlation between variables**

In [None]:
#Correlation between variables
sns.set_theme(context='poster')
plt.figure(figsize=(25,25))
plt.title('Correlation between variables', color="Black",fontsize=15)
sns.heatmap(heart.corr(),annot=True,cmap="hot")
plt.show()

**Age**

In [None]:
# Min, max and average of the age variable
print('Min age: ', min(heart['age']))
print('Max age: ', max(heart['age']))
print('Average age: ', heart['age'].mean())

In [None]:
sns.set_theme(context='poster')
plt.figure(figsize=(12,7))
plt.title('Age distribution', color="Black",fontsize=25)
heart['age'].plot(kind = 'hist',color='orangered')
plt.show()

In [None]:
#Age distribution for those with and without heart disease

In [None]:
sns.set_theme(context='poster')
plt.figure(figsize=(10,7))
plt.title('Age distribution based on heart disease', color="Black",fontsize=25)

sns.distplot(heart[heart['target'] == 1]['age'], label='Do not have heart disease')
sns.distplot(heart[heart['target'] == 2]['age'], label = 'Have heart disease')
plt.xlabel('Frequency')
plt.ylabel('Age')
plt.legend()
plt.show()

**Gender**

In [None]:
sns.set_theme(context='poster')
# Number of males and females
F = heart[heart['sex'] == 0].count()['target']
M = heart[heart['sex'] == 1].count()['target']

# Create a plot
figure, ax = plt.subplots(figsize = (10, 7))
ax.bar(x = ['Female', 'Male'], height = [F, M],color='orangered')
plt.xlabel('Gender')
plt.title('Gender distribution', color="Black",fontsize=25)
plt.show()

**Other variables**
<br>
Chest pain
<br>
Blood pressure
<br>
Cholesterol
<br>
Fasting blood sugar
<br>
Electrocardiographic results
<br>
Maximum heart rate
<br>
Exercise induced angina
<br>
ST depression
<br>
Slope
<br>
Major vessels
<br>
Thalassemia

In [None]:
heart['cp'].value_counts()

In [None]:
# Chest pain types in bar chart
import matplotlib

matplotlib.rc('xtick', labelsize=15) 
xs =[1,2,3,4]
labels = ['typical angina', 'atypical angina', 'non-anginal pain', 'asymptomatic']

heart.groupby(heart['cp']).count()['target'].plot(kind = 'bar', figsize = (12, 6),color='orangered')
plt.xlabel('Chest pain types')
plt.xticks(np.arange(4), ('typical angina', 'atypical angina', 'non-anginal pain', 'asymptomatic'), rotation = 0)
plt. xticks(xs,labels)
plt.show()

In [None]:
# Blood pressure distribution
heart['trestbps'].plot(kind = 'hist', title = 'Blood Pressure in mm Hg', figsize = (12, 6), color='orangered')
plt.show()

In [None]:
# Display cholestoral distribution
heart['chol'].plot(kind = 'hist', title = 'Serum Cholestoral in mg/dl', figsize = (12, 6), color='orangered')
plt.show()

In [None]:
# Display fasting blood sugar in bar chart
heart.groupby(heart['fbs']).count()['target'].plot(kind = 'bar', title = 'Fasting blood sugar', figsize = (12, 6), color='orangered')
plt.xticks(np.arange(2), ('fbs < 120 mg/dl', 'fbs > 120 mg/dl'), rotation = 0)
plt.show()

In [None]:
# Display electrocardiographic results in bar chart
heart.groupby(heart['restecg']).count()['target'].plot(kind = 'bar', title = 'Resting electrocardiographic results', figsize = (12, 6), color='orangered')
plt.xticks(np.arange(3), ('normal', 'ST-T wave abnormality', 'probable or left ventricular hypertrophy'))
plt.show()

In [None]:
# Display maximum heart rate distribution
heart['thalach'].plot(kind = 'hist', title = 'Maximum Heart Rate Achieved', figsize = (12, 6), color='orangered')
plt.show()

In [None]:
# Display exercise induced angina in bar chart
heart.groupby(heart['exang']).count()['target'].plot(kind = 'bar', title = 'Exercise induced angina',  figsize = (12, 6), color='orangered')
plt.xticks(np.arange(2), ('No', 'Yes'), rotation = 0)
plt.show()

In [None]:
# Display ST depression induced by exercise relative to rest distribution
heart['oldpeak'].plot(kind = 'hist', title = 'ST Depression Induced by Exercise Relative to Rest', figsize = (12, 6), color='orangered')
plt.show()

In [None]:
heart['slope'].value_counts()

In [None]:
# Display slope of the peak exercise ST segment in bar chart
matplotlib.rc('xtick', labelsize=15) 
xs =[1,2,3]
labels = ['upsloping', 'flat', 'downsloping']

heart.groupby(heart['slope']).count()['target'].plot(kind = 'bar', title = 'Slope of the peak exercise ST segment', figsize = (12, 6), color='orangered')
plt.xticks(np.arange(3), ('upsloping', 'flat', 'downsloping'), rotation = 0)
plt. xticks(xs,labels)
plt.show()

In [None]:
# Display number of major vessels in bar chart
heart.groupby(heart['ca']).count()['target'].plot(kind = 'bar', title = 'Number of major vessels colored by flourosopy', figsize = (12, 6), color='orangered')
plt.show()

In [None]:
heart['thal'].value_counts()

In [None]:
# Display thalassemia in bar chart
matplotlib.rc('xtick', labelsize=12) 
xs =[3,7,6]
labels = ['normal', 'fixed defect', 'reversible defect']

heart.groupby(heart['thal']).count()['target'].plot(kind = 'bar', title = 'Thalassemia', figsize=(15,6), color='orangered')
plt.xticks(np.arange(3), ('normal', 'fixed defect', 'reversible defect'), rotation = 0)
plt.xticks(xs,labels)
plt.autoscale(enable=True)
plt.show()

**Correlations**
* Age and heart rate
* Age and CA
* Target, slope and oldpeak

In [None]:
# Age and heart rate
import seaborn as sns
sns.set(rc={'figure.figsize':(20,8.27)})

sns.relplot(x = 'age', y = 'thalach', data = heart, hue = 'target', legend="full", palette="Set2",marker="+",color="g",height=5.27, aspect=11.7/8.27)
plt.title('The correlation between age and heart rate')
plt.show()

In [None]:
# Age and CA
g = sns.catplot(x = 'ca', y = 'age', hue = 'target', data = heart, palette="Set2",height=5.27, aspect=11.7/8.27)
g.fig.suptitle('The correlation between number of major vessels colored by flourosopy and age', y = 1.1)
plt.show()

In [None]:
# Target, slope and oldpeak
sns.catplot(x = "slope", y = "oldpeak", hue = "target", data = heart, height=5.27, palette="Set2",kind="swarm",aspect=11.7/8.27)
plt.title('The correlation between oldpeak and slope')
plt.xticks(np.arange(3), ('upsloping', 'flat', 'downsloping'), rotation = 0)
plt.show()

**Modeling** <br>
For the model development part, I will use the following models: i) support vector macine, ii) random forest, iii) Ada Boost, iv) Gradient boosting for evaluating cardiovacular risk prediction using set of predictor variables defined/examined above.

> Steps to go:
> 1. Prepare data for ML 
> 2. Train and evaludate model
> 3. Examine the important features of the model
> 4. Save the model

In [None]:
# 1.1 Initialize data and target
target = heart['target']
features = heart.drop(['target'], axis = 1)

In [None]:
# 1.2 Split the data into training set and testing set
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size = 0.2, random_state = 0)

In [None]:
# 2.1 Train and evaluate model
def fit_eval_model(model, train_features, y_train, test_features, y_test):
    
    """
    Function: train and evaluate a machine learning classifier.
    Args:
      model: machine learning classifier
      train_features: train data extracted features
      y_train: train data lables
      test_features: train data extracted features
      y_test: train data lables
    Return:
      results(dictionary): a dictionary of classification report
    """
    results = {}
    
    # Train the model
    model.fit(train_features, y_train)
    
    # Test the model
    train_predicted = model.predict(train_features)
    test_predicted = model.predict(test_features)
    
     # Classification report and Confusion Matrix
    results['classification_report'] = classification_report(y_test, test_predicted)
    results['confusion_matrix'] = confusion_matrix(y_test, test_predicted)
        
    return results

In [None]:
# 2.2 Initialize the models
sv = SVC(random_state = 1)
rf = RandomForestClassifier(random_state = 1)
ab = AdaBoostClassifier(random_state = 1)
gb = GradientBoostingClassifier(random_state = 1)


# Fit and evaluate models
results = {}
for cls in [sv, rf, ab, gb]:
    cls_name = cls.__class__.__name__
    results[cls_name] = {}
    results[cls_name] = fit_eval_model(cls, X_train, y_train, X_test, y_test)

In [None]:
# 2.3 Print classifiers results
for result in results:
    print (result)
    print()
    for i in results[result]:
        print (i, ':')
        print(results[result][i])
        print()
    print ('-----')
    print()

In [None]:
# 3.1 Initialize the models
sv = SVC(random_state = 1)
rf = RandomForestClassifier(random_state = 1)
ab = AdaBoostClassifier(random_state = 1)
gb = GradientBoostingClassifier(random_state = 1)


# Fit and evaluate models
results = {}
for cls in [sv, rf, ab, gb]:
    cls_name = cls.__class__.__name__
    results[cls_name] = {}
    results[cls_name] = fit_eval_model(cls, X_train, y_train, X_test, y_test)

> Based on the results presented above,I am happy to pick gradient boosting classifier for further development.

In [None]:
# 3.2 Get the important features 
importance = gb.feature_importances_
# summarize feature importance
for i,v in enumerate(importance):
    print('Feature: %s, Score: %.5f' % (features.columns[i], v))
# plot feature importance
plt.bar([x for x in range(len(importance))], importance, color='r')
plt.tight_layout()
plt.show()

In [None]:
# 4. Save the model as serialized object pickle
with open('model_heart.pkl', 'wb') as file:
    pickle.dump(gb, file)

<a id="4"></a><h1 style='background:#7ad16d; border:0; color:black'><center> Web development </center></h1>

Using the model developed above I will now develop a web application to show the best predicted model.I have included the codes - written for web development in markdown (below)

#Loading dependencies <br>
    import numpy as np
    import pickle
    from flask import Flask, request, render_template

#Load ML model <br>
    model = pickle.load(open('model_heart.pkl', 'rb')) 

#Create application <br>
    app = Flask(__name__)

#Bind home function to URL <br>
    @app.route('/')
    def home():
        return render_template('Heart Disease Classifier.html')
 
#Bind predict function to URL <br>
    @app.route('/predict', methods =['POST'])
    def predict():

    # Put all form entries values in a list 
    features = [float(i) for i in request.form.values()]
    # Convert features to array
    array_features = [np.array(features)]
    # Predict features
    prediction = model.predict(array_features)
    
    output = prediction
    
    # Check the output values and retrive the result with html tag based on the value
    if output == 1:
        return render_template('Heart Disease Classifier.html', 
                               result = 'Heart disease - Unlikely!')
    else:
        return render_template('Heart Disease Classifier.html', 
                               result = 'Heart disease - Likely!')

if __name__ == '__main__':
#Run the application
    app.run()

📌 For full codes and HTML template for web application please check my [Github Repository](https://github.com/shivarajmishra/cvdwebapp-py) for further details.

Click 👉 [**Web Application**](https://cvdwebapp-py.herokuapp.com/) 

<center><img 
src="https://github.com/shivarajmishra/cvdwebapp-py/raw/main/screenrecording%20(3).gif" width="900" height="900"></img></center>

<br>

<a id="5"></a><h1 style='background:#7ad16d; border:0; color:black'><center> Acknowledgements</center></h1>

I would like to thank my fellow Kagglers [@rahulgulia](https://www.kaggle.com/rahulgulia/datascience-tackling-heart-diseases) and @[taylormartin94](https://www.kaggle.com/taylormartin94/cardio-disease-model-w-web-application) for inspiration in conducting the analysis. Also much thanks to [Analytics Vidhya](https://www.analyticsvidhya.com/blog/2020/09/web-application/) for providing amazing resources for conducting the analysis.

<center><img 
src="https://cdn.dribbble.com/users/1277402/screenshots/4180449/heartwalk.gif" width="900" height="900"></img></center>

<br>