## Heart Attack Prediction ❤🔮

![Heart Attack](https://st2.depositphotos.com/1006472/9686/i/600/depositphotos_96861070-stock-photo-severe-heartache-man-suffering-from.jpg)

A blockage of blood flow to the heart muscle.
A heart attack is a medical emergency. A **`heart attack`** usually occurs when a blood clot blocks blood flow to the heart. Without blood, tissue loses oxygen and dies.
Symptoms include tightness or pain in the chest, neck, back or arms, as well as fatigue, lightheadedness, abnormal heartbeat and anxiety. Women are more likely to have atypical symptoms than men.
Treatment ranges from lifestyle changes and cardiac rehabilitation to medication, stents and bypass surgery.


With this notebook, I will be using **`Decision Tree`** to classify if a subject is likely to have Heart Attack or not.
This notebook is also a step by step guide to implement the **`DecisionTreeClassifier() model.`**

### Step1: Importing Dataframe and Analyzing 💻🔍

In [None]:
#Importing libraries 📚

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.tree import plot_tree
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score
from sklearn.model_selection import GridSearchCV
warnings.filterwarnings("ignore")

In [None]:
# Reading the dataset and having a look at the first 5 rows of the dataframe...

df = pd.read_csv("../input/heart-attack-analysis-prediction-dataset/heart.csv")
df.head()

In [None]:
# Checking the number of rows and columns...

df.shape

In [None]:
# Checking for any NULL values...

df.info()

- There are no NULL values in this dataframe.

In [None]:
# Descriptive Statistics of the numerical colums and percentile to find any potential outliers...

df.describe(percentiles=[0.25,0.5,0.75,0.90,0.95,0.98,0.99])

### Step2: Performing a brief Exploratory Data Analysis (EDA) 📊📉📈

In [None]:
# Visualizing the feature: "age"

sns.boxplot(df["age"])
plt.show()

- No outliers detected.

In [None]:
# Understanding the feature: "sex"

df['sex'].value_counts()

In [None]:
# Visualizing the feature: "sex" Assuming 0=Female and 1=Male (No Metadata was provided for gender)

sns.barplot(x = df["sex"], y = df["output"])
plt.xticks(ticks=[0,1], labels=['Female','Male'])
plt.show()

- Majority of people who have heart disease are "Male".

In [None]:
# Visualizing the feature: "trtbps"

sns.boxplot(df["trtbps"])
plt.show()

- This feature for resting blood pressure contains outliers.
- But since they are not so severe, we are going to let them be.

In [None]:
# Visualizing the feature: "chol"

sns.boxplot(df["chol"])
plt.show()

- Here, we can see that cholestrol has a very high value. 
- In this case, we need to remove it.

In [None]:
# Removing the top 1 percentile...

Q3 = df["chol"].quantile(0.99)
df = df[df["chol"] <= Q3]
sns.boxplot(df["chol"])
plt.show()

- Now, we can observe that there are no outliers as such.

In [None]:
# Visualizing the feature: "fbs"

df['fbs'].value_counts()

In [None]:
# Visualizing the feature: "fbs"

sns.barplot(x = df["fbs"], y = df["output"])
plt.xticks(ticks=[0,1], labels=['False','True'])
plt.show()

- Here, we do not observe any significant differences.

In [None]:
# Visualizing the feature: "restecg"

sns.boxplot(df["restecg"])
plt.show()

- No outliers detected.

In [None]:
# Visualizing the feature: "thalachh"

sns.boxplot(df["thalachh"])
plt.show()

- Over here, we see a slight outlier in the feature, but it is acceptable. It is not a huge outlier.

In [None]:
# Understanding the feature: "exng"

df['exng'].value_counts()

In [None]:
# Visualizing the feature: "exng"

sns.barplot(x = df["exng"], y = df["output"])
plt.xticks(ticks=[0,1], labels=['No','Yes'])
plt.show()

- Heart Attack due to Exercise Induced Angina is less.

In [None]:
# Visualizing the feature: "oldpeak"

sns.boxplot(df["oldpeak"])
plt.show()

- We can definitely see that there are some outliers and they have to be handled.

In [None]:
# Removing the top 1 percentile...

Q3 = df["oldpeak"].quantile(0.99)
df = df[df["oldpeak"] <= Q3]
sns.boxplot(df["oldpeak"])
plt.show()

### Step3: Performing the 🚂 train_test_split and Building the Model 🤖 Decision Tree 🌳

In [None]:
# Seperating the target(y) and the independent(X) features...

y = df.pop("output")
X = df

In [None]:
# Performing the train_test_split...

X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.7, random_state = 42)

In [None]:
# Verifying the split...

X_train.shape, y_train.shape

In [None]:
# Building a random Decision Tree as our first model. Here we will not tune it. Instead define a max_depth of 3

dt = DecisionTreeClassifier(max_depth = 3)

dt.fit(X_train, y_train)

In [None]:
# Visualizing the result from the Decision Tree

plt.figure(figsize=(60,30))
plot_tree(dt, feature_names = X.columns, class_names=['No Disease', "Disease"], filled=True);

- From the diagram above, we can observe that people with a chest pain category of 1 or above are high risk for Heart Attacks.

In [None]:
# Finding the y_train_pred and the y_test_pred... 🔮

y_train_pred = dt.predict(X_train)
y_test_pred = dt.predict(X_test)

In [None]:
# Evaluating the model: (Confusion Matrix) 🤔😥⁉

print("Confusion Matrix for training set:\n")
print(confusion_matrix(y_train, y_train_pred))
print('*'*20)
print("Confusion Matrix for test set:\n")
print(confusion_matrix(y_test, y_test_pred))

In [None]:
# Evaluating the model: (Accuracy) 🦾

print("Accuracy on the training set: " + str(accuracy_score(y_train, y_train_pred)))
print('*'*20)
print("Accuracy on the test set: " + str(accuracy_score(y_test, y_test_pred)))

In [None]:
# Evaluating the model: (Precision)🎯

print("Precision on the training set: " + str(precision_score(y_train, y_train_pred)))
print('*'*20)
print("Precision on the test set: " + str(precision_score(y_test, y_test_pred)))

In [None]:
# Evaluating the model: (Recall) 🤔

print("Recall on the training set: " + str(recall_score(y_train, y_train_pred)))
print('*'*20)
print("Recall on the test set: " + str(recall_score(y_test, y_test_pred)))

- From the above metrics used for evaluating the model, we can see that our model is underfitting.
- We need to perform some hyperparameter tuning to this model.

### Step4: Hyper-parameter Tuning 🔧 for Decision Tree 🌳

In [None]:
# Creating an object of the class DecisionTreeClassifier() and assigning a random_state

dt = DecisionTreeClassifier(random_state=42)

In [None]:
# Defining the parameters for the param_grid for our Grid Search...

params = {
    'max_depth':[2,3,10,15,20],
    'min_samples_leaf':[2,3,5,8,10,12,15,20],
    'criterion':['gini','entropy']
}

'''
Different combinations were tried out for max_depth and min_samples_leaf and this set seem to give a good model.
Trying out all the different combinations were avoided to keep the notebook clean and crisp.
You can try out different combination in the list or even tune some other hyper-parameters.
As of now, i will stick with this. :)
'''

In [None]:
# Now, we will let GridSearchCV try out all the possible combination of the hyperparameters for our Decision Tree

grid_search = GridSearchCV(estimator=dt, param_grid=params, cv=4, n_jobs=-1, verbose=1, scoring='accuracy')

In [None]:
%%time
grid_search.fit(X_train, y_train)

In [None]:
# Storing the results of all the combinations that had been tried in a dataframe.

cv_df = pd.DataFrame(grid_search.cv_results_)

In [None]:
# Checking all the different combinations

cv_df

In [None]:
# Finding the best score...

grid_search.best_score_

In [None]:
# Finding the best hyper-parameters for the model...

hyper_param = grid_search.best_estimator_
hyper_param

In [None]:
# Assigning the hyper parameter and fitting the model

dt_ = hyper_param

dt_.fit(X_train, y_train)

In [None]:
# Predicting on the train and test... 

y_train_pred = dt_.predict(X_train)
y_test_pred = dt_.predict(X_test)

In [None]:
# Evaluating the model: (Confusion Matrix) 🤔😥⁉

print("Confusion Matrix for training set:\n")
print(confusion_matrix(y_train, y_train_pred))
print('*'*20)
print("Confusion Matrix for test set:\n")
print(confusion_matrix(y_test, y_test_pred))

In [None]:
# Evaluating the model: (Accuracy) 🦾

print("Accuracy on the training set: " + str(accuracy_score(y_train, y_train_pred)))
print('*'*20)
print("Accuracy on the test set: " + str(accuracy_score(y_test, y_test_pred)))

In [None]:
# Evaluating the model: (Precision)🎯

print("Precision on the training set: " + str(precision_score(y_train, y_train_pred)))
print('*'*20)
print("Precision on the test set: " + str(precision_score(y_test, y_test_pred)))

In [None]:
# Evaluating the model: (Recall) 🤔

print("Recall on the training set: " + str(recall_score(y_train, y_train_pred)))
print('*'*20)
print("Recall on the test set: " + str(recall_score(y_test, y_test_pred)))

- Since the **sensitivity** is pretty good for this model on the test set, we will go ahead with this model.
- Also, from the confusion matrix for the test data, we can observe that the chance of **Type II error** (i.e. The patient is likely to have a Heart Attack but is declared safe by the model) is very low.

### Conclusion 👩‍⚖️📜

In [None]:
# Visualizing the result from the Decision Tree

plt.figure(figsize=(65,25))
plot_tree(dt_, feature_names = X.columns, class_names=['No Disease', "Disease"], filled=True);

- A person regardless of gender, is most likely to have a heart attack if the level of Chest Pain is 1 or beyond.
- Even if the Chest Pain is more than 1 or more, Female subjects are more at risk then Male.
- Also, if the Heart Rate is more, combined with Chest Pain of atleast level 1, then the subject is definitely going to get Heart Attack.
- If the Chest Pain level is below 1, even then, if the subject has Exercise Induced Agnia, he/she is at a greater risk of Heart Attack


Therefore, if any of the above conditions are observed in a subject, they should be immediately given medical attention. 🩺👩‍⚕️

#### If you enjoyed reading this Notebook 📒, kindly upvote. 👍😁 It will help me grow. 📈