<center>
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/assets/logos/SN_web_lightmode.png" width="300" alt="cognitiveclass.ai logo">
</center>

# **Heart Failure Prediction**

# Lab 6. Forecasting of deaths from Heart Failure on medical measurement

# Abstract
This lab will give you experience using Python, Pandas, Sklearn, and medical data analysis to teach you how to predict heart failure symptoms based on patient conditions. Learn how to use these powerful tools to capture and use healthcare data to refine predictions. Unlock the potential of medical data models to improve overall accuracy and predictability. This lab will sum up all you've learned in this course.

Estimated time needed: **30** minutes

## Objectives

After completing this lab you will be able to:

*   Be confident about your data analysis skills


<h2>Table of Contents</h2>

<div class="alert alert-block alert-info" style="margin-top: 20px">
<ol>
    <li><a href="#import">Importing the Data</a></li>
    <li><a href="#prep">Data Pre-preparation and analyzing </a></li>
    <li><a href="#logistic">Logistic Regression evaluation and over-sampling</a></li>
    <li><a href="#ensemble">Ensemble of Classifiers</a></li>
    <li><a href="#dec_tree">Decision Tree</a></li>
    <li><a href="#pred">Prediction using your own functions</a></li>
</ol>

</div>

<hr>

<p>
You can find the "Heart Disease Dataset UCI" from the following link: <br><a href="https://www.kaggle.com/datasets/ketangangal/heart-disease-dataset-uci" target="_blank">https://www.kaggle.com/datasets/ketangangal/heart-disease-dataset-uci</a>. <br><br>
The statistical data obtained from <a href=\"https://www.kaggle.com/datasets/ketangangal/heart-disease-dataset-uci" target=\"_blank\">https://www.kaggle.com/datasets/ketangangal/heart-disease-dataset-uci</a> under <a href=\"https://creativecommons.org/publicdomain/zero/1.0/\" target=\"_black\">CC0: Public Domain</a> license. <br><br>
We will use this dataset in this lab. It contains medical information about patients, who may suffer from heart disease. Comparing it to our previous dataset, this one has similar columns, except target column, which refers to the presence of heart disease in the patient. It is integer valued 0 = no disease and 1 = disease.
</p>

You will need the following libraries:


In [ ]:
!pip install dython
!conda install --yes -c conda-forge imbalanced-learn

In [ ]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from dython.nominal import associations
from sklearn.preprocessing import OrdinalEncoder
from sklearn.preprocessing import StandardScaler
from sklearn.compose import make_column_transformer
from sklearn.model_selection import train_test_split
from imblearn.pipeline import make_pipeline
from sklearn.metrics import plot_confusion_matrix
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier, ExtraTreeClassifier
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.ensemble import BaggingClassifier
from sklearn.ensemble import VotingClassifier
from sklearn.metrics import recall_score
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import cross_val_predict
from sklearn import *

<div class="alert alert-danger alertdanger" style="margin-top: 20px">
If error appeared please restart the kernel or run this block again
</div>

## 1. Importing the Data <a id="import"></a>


you will need to download the dataset; if you are running locally, please comment out the following 


In [ ]:
path = 'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMSkillsNetwork-GPXX0N40EN/heart_disease.csv'

Load the csv:


In [ ]:
df= pd.read_csv(path)

Set number of digits in float type:

In [ ]:
pd.options.display.float_format = '{:.2f}'.format

We use the method  <code>head()</code>  to display the first 5 columns of the dataframe:


In [ ]:
df.head()

<details>
<summary><b>Click to see attribute information</b></summary>

Input features (column names):

1. `age` - patient's age in years
2. `sex` - patient's sex ('Male', 'Female')
3. `chest_pain_type` - chest pain type ('typical angina', 'atypical angina', 'non-anginal pain', 'asymptomatic')
4. `resting_blood_pressure` - resting blood pressure
5. `cholestoral` - serum cholestoral in mg/dl
6. `fasting_blood_sugar` - fasting blood sugar > 120 mg/dl
7. `rest_ecg` - resting electrocardiographic results ('normal', 'ST-T wave abnormality', 'Left ventricular hypertrophy')
8. `Max_heart_rate` - maximum heart rate achieved
9. `exercise_induced_angina` - exercise induced angina ('Yes', 'No')
10. `oldpeak` - ST depression induced by exercise relative to rest
11. `slope` - the slope of the peak exercise ST segment ('Upsloping', 'Downsloping', 'Flat')
12. `vessels_colored_by_flourosopy` - number of major vessels colored by fluoroscopy
13. `thalassemia` - normal; fixed defect; reversible defect

Output feature (desired target):

14. `target` - does the patient have heart disease? (binary)
</details>

## 2. Data Pre-preparation and analyzing <a id="prep"></a>

<div class="alert alert-danger alertdanger" style="margin-top: 20px">
<h1> Question 1 </h1>
<b> Display the data types of each column using the attribute `dtypes`.</b>
</div>

In [ ]:
# Write your code below and press Shift+Enter to execute


<div class="alert alert-danger alertdanger" style="margin-top: 20px">
<h1> Question 2 </h1>
<b> Check if this DataSet contains NaN values.</b>
</div>

In [ ]:
# Write your code below and press Shift+Enter to execute


<div class="alert alert-danger alertdanger" style="margin-top: 20px">
<h1> Question 3 </h1>
<b>Check the correlation (numerical values) and association (objects) of each pair of columns.</b>
</div>

In [ ]:
# Write your code below and press Shift+Enter to execute


<div class="alert alert-danger alertdanger" style="margin-top: 20px">
<h1> Question 4 </h1>
<b>Divide the dataset into input and target factors.</b>
</div>

In [ ]:
# Write your code below and press Shift+Enter to execute


<div class="alert alert-danger alertdanger" style="margin-top: 20px">
<h1> Question 5 </h1>
<b>Create column transformer using `OrdinalEncoder()` and `StandardScaler()` and visualize it.</b>
</div>

In [ ]:
# Write your code below and press Shift+Enter to execute


## 3. Logistic Regression evaluation and over-sampling <a id="logistic"></a>

<div class="alert alert-danger alertdanger" style="margin-top: 20px">
<h1> Question 6 </h1>
<b>Separate DataSets for train and test DataSets in 0.3 proportion train/test.</b>
</div>

In [ ]:
# Write your code below and press Shift+Enter to execute


<div class="alert alert-danger alertdanger" style="margin-top: 20px">
<h1> Question 7 </h1>
<b>Create Pipeline using `LogisticRegression()` model and show its accuracy and recall score.</b>
</div>

In [ ]:
# Write your code below and press Shift+Enter to execute


<div class="alert alert-danger alertdanger" style="margin-top: 20px">
<h1> Question 8 </h1>
<b>Calculate Cross-Validation Score using 4 folds, calculate the average and standard deviation of estimate and predict the output.</b>
</div>

In [ ]:
# Write your code below and press Shift+Enter to execute


<div class="alert alert-danger alertdanger" style="margin-top: 20px">
<h1> Question 9 </h1>
<b>Plot the confusion matrix to evaluate the correctness of the classification.</b>
</div>

In [ ]:
# Write your code below and press Shift+Enter to execute


<div class="alert alert-danger alertdanger" style="margin-top: 20px">
<h1> Question 10 </h1>
<b>Check whether the number of values of target column is similar, use `RandomOverSampler()` if it's not.</b>
</div>

In [ ]:
# Write your code below and press Shift+Enter to execute


## 4. Ensemble of Classifiers <a id="ensemble"></a>

<div class="alert alert-danger alertdanger" style="margin-top: 20px">
<h1> Question 11 </h1>
<b>Test different classifiers including `VotingClassifier()` and calculate their accuracy.</b>
</div>

In [ ]:
# Write your code below and press Shift+Enter to execute


<div class="alert alert-danger alertdanger" style="margin-top: 20px">
<h1> Question 12 </h1>
<b>Compare the accuracy of classifiers and build a plot to visualize it.</b>
</div>

In [ ]:
# Write your code below and press Shift+Enter to execute


## 5. Decision Tree <a id="dec_tree"></a>

<div class="alert alert-danger alertdanger" style="margin-top: 20px">
<h1> Question 13 </h1>
<b>Create a Pipeline based on Decision Tree, calculate and visualize its accuracy. Use `max_depth = 3` in order to see the vertices clearly.</b>
</div>

In [ ]:
# Write your code below and press Shift+Enter to execute


<div class="alert alert-danger alertdanger" style="margin-top: 20px">
<h1> Question 14 </h1>
<b>Visualize the Decision Tree using `plot_tree` function.</b>
</div>

In [ ]:
# Write your code below and press Shift+Enter to execute


## 6. Prediction using your own functions <a id="pred"></a>

<div class="alert alert-danger alertdanger" style="margin-top: 20px">
<h1> Question 15 </h1>
<b>Create a function called `create_ensemble()`, that can create ensemble using a predetermined number of classifiers. Make a pipeline with it, fit it and calculate its accuracy.</b>
</div>

In [ ]:
# Write your code below and press Shift+Enter to execute


<div class="alert alert-danger alertdanger" style="margin-top: 20px">
<h1> Question 16 </h1>
<b>Create `make_prediction()` function, that returns an answer to whether the patient has heart disease. Input should contain a DataFrame and a classifier.</b>
</div>

In [ ]:
# Write your code below and press Shift+Enter to execute


<div class="alert alert-danger alertdanger" style="margin-top: 20px">
<h1> Question 17 </h1>
<b>Create a new ensemble of your own list of classifiers using the first function.</b>
</div>

In [ ]:
# Write your code below and press Shift+Enter to execute


<div class="alert alert-danger alertdanger" style="margin-top: 20px">
<h1> Question 18 </h1>
<b>Predict the output with your own data, using the second function and the ensemble you just obtained.</b>
</div>

In [ ]:
# Write your code below and press Shift+Enter to execute


<b>Sources</b>


<a href="https://www.kaggle.com/datasets/ketangangal/heart-disease-dataset-uci" target="_blank">https://www.kaggle.com/datasets/ketangangal/heart-disease-dataset-uci</a>.

### Thank you for completing this lab!

## Author

<a href="https://author.skills.network/instructors/bohdan_kuno">Bohdan Kuno</a>

### Other Contributors

<a href="https://author.skills.network/instructors/yaroslav_vyklyuk_2">Prof. Yaroslav Vyklyuk, DrSc, PhD</a>

<a href="https://author.skills.network/instructors/nataliya_boyko">Ass. Prof. Nataliya Boyko, PhD</a>

## Change Log

| Date (YYYY-MM-DD) | Version | Changed By | Change Description                                         |
| ----------------- | ------- | ---------- | ---------------------------------------------------------- |
|2023-04-01|01|Bohdan Kuno|Lab created|


<hr>

## <h3 align="center"> © IBM Corporation 2023. All rights reserved. <h3/>
