<div style=" text-align: center; justify-content: center; align-items: center; background-color: lightblue; padding: 10px;  border-radius: 10px;">
    <h1 style="color: black;  margin-top: 9px; margin-bottom: 9px; ">Introduction</h1>
</div>

"**Heart attacks** are a serious health issue worldwide". We aim to predict the likelihood of someone having a heart attack using data analysis and machine learning. **The goal of this analysis** is to help individuals understand their risk and take preventive actions, ultimately reducing heart attack cases and saving lives.

<div style=" text-align: center; justify-content: center; align-items: center; background-color: lightblue; padding: 10px;  border-radius: 10px;">
    <h1 style="color: black;  margin-top: 9px; margin-bottom: 9px; ">Table of Contents</h1>
</div>

> - [1 - Import Libraries 📚](#1)
> - [2 - Data Exploration 💾](#2)
>    - [2.1 - Data Sample and Info](#2.1)
> - [3 - Exploratory Data Analysis 📊](#3)
>    - [3.1- Univariate Analysis](#3.1)
>    - [3.2- Bivariate Analysis](#3.2)
> - [4- EDA Conclusion 📈](#4)
> - [5 - Data Preprocessing ⚒️](#5)
>    - [5.1- Outlier Handling via Removing](#5.1)
>    - [5.2- Data Split to Train and Test Sets](#5.2)
>    - [5.3- Feature Scaling](#5.3)
> - [6- Models Training and Evaluation ⚙️](#6)
>    - [6.1- Logistic Regression Model](#6.1)
>    - [6.2- K Nearest Keighbor Model](#6.2)
>    - [6.3- Support Vector Classifier Model](#6.3)
>    - [6.4- Naive Bayes Classifier Model](#6.4)
>    - [6.5- Decision Tree Classifier Model](#6.5)
>    - [6.6- Random Forest Classifier Model](#6.6)
>    - [6.7- Ada Boost Classifier Model](#6.7)
>    - [6.8- Gradient Boosting Classifier Model](#6.8)
>    - [6.9- XGBoost Classifier Model](#6.9)
> - [7- Models Predictions Conclusion💡](#7)

**``Data Source`` : https://www.kaggle.com/datasets/rashikrahmanpritom/heart-attack-analysis-prediction-dataset**

<a class="anchor"  id="1"></a>
<div style=" text-align: center; justify-content: center; align-items: center; background-color: lightblue; padding: 10px;  border-radius: 10px;">
    <h1 style="color: black;  margin-top: 9px; margin-bottom: 9px; ">1- Import Libraries 📚</h1>
</div>

In [1]:
# EDA Libraries
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.figure_factory as ff
from plotly.subplots import make_subplots
import plotly.graph_objects as go
import seaborn as sns
import matplotlib.pyplot as plt

# Data Preprocessing Libraries
from datasist.structdata import detect_outliers
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Machine Learing (classification models) Libraries
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier, GradientBoostingClassifier
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report
import xgboost as xgb
from sklearn.model_selection import GridSearchCV

%matplotlib inline  
sns.set(rc={'figure.figsize': [11, 4]}, font_scale=0.7)

--------------------------------------

<a class="anchor"  id="2"></a>
<div style=" text-align: center; justify-content: center; align-items: center; background-color: lightblue; padding: 10px;  border-radius: 10px;">
    <h1 style="color: black;  margin-top: 9px; margin-bottom: 9px; ">2- Data Exploration 💾</h1>
</div>

<a class="anchor"  id="2.1"></a>
## 2.1- Data Sample and Info

In [2]:
df = pd.read_csv('heart.csv')
df.sample(10)

FileNotFoundError: [Errno 2] No such file or directory: 'heart.csv'

### Domain knowledge

  <blockquote>
  <strong>Age</strong> : The patient's age.<br>
  <strong>Sex</strong> : The patient's gender (0 for female, 1 for male).<br>
  <strong>Chest Pain Type (cp)</strong> : Type of chest pain:
  <ul>
    <li>Value 0: Typical Angina : Severe chest pain, often during physical activity, relieved with rest or nitroglycerin.</li>
    <li>Value 1: Atypical Angina : Variations in chest discomfort, not fitting the typical angina pattern.</li>
    <li>Value 2: Non-Anginal Pain : Chest discomfort unrelated to heart issues, caused by various factors.</li>
    <li>Value 3: Asymptomatic : Chest discomfort unrelated to heart issues, caused by various factors.</li>
  </ul>
  <strong>Resting Blood Pressure (trtbps)</strong> : Resting blood pressure.<br>
  <strong>Serum Cholesterol Levels (chol)</strong> : Serum cholesterol levels.<br>
  <strong>Fasting Blood Sugar (fbs)</strong> : Fasting blood sugar:
  <ul>
    <li>Value 0: <= 120 mg/dL</li>
    <li>Value 1: > 120 mg/dL</li>
  </ul>
  <strong>Resting ECG Results (restecg)</strong> : Resting ECG results:
  <ul>
    <li>Value 0: Normal : Indicates a normal resting ECG result, suggesting no significant cardiac abnormalities.</li>
    <li>Value 1: ST-T Wave Abnormality : Suggests an abnormality in the ST-T wave pattern, which could be indicative of myocardial ischemia or other cardiac issues.</li>
    <li>Value 2: Probable or Definite Left Ventricular Hypertrophy : Suggests the presence of left ventricular hypertrophy, which is an enlargement of the left ventricle of the heart, often associated with high blood pressure or other heart conditions.</li>
  </ul>
  <strong>Maximum Heart Rate During Exercise (thalachh)</strong> : Maximum heart rate during exercise.<br>
  <strong>Exercise-Induced Angina (exng)</strong> : Exercise-induced angina:
  <ul>
    <li>Value 0: No</li>
    <li>Value 1: Yes</li>
  </ul>
  <strong>ST-Segment Depression (oldpeak)</strong> : ST-segment depression.<br>
  <strong>Slope of ST Segment (slp)</strong> : Slope of ST segment:
  <ul>
    <li>Value 0: Downsloping : Represents a downsloping ST segment on the ECG, which can indicate certain heart conditions.</li>
    <li>Value 1: Flat : Indicates a flat ST segment on the ECG, which may have clinical significance in diagnosing heart issues.</li>
    <li>Value 2: Upsloping : Refers to an upsloping ST segment on the ECG, which can also be relevant in cardiovascular diagnosis.</li>
  </ul>
  <strong>Number of Major Vessels Colored by Fluoroscopy (caa)</strong> : Number of major vessels colored by fluoroscopy.<br>
  <strong>Thalassemia Type (thall)</strong> : Thalassemia type:
  <ul>
    <li>Value 0: None (Normal) : Represents a normal thalassemia type, indicating no thalassemia-related issues.</li>
    <li>Value 1: Fixed Defect : Indicates the presence of a fixed thalassemia defect that is not reversible.</li>
    <li>Value 2: Reversible Defect : Suggests a thalassemia defect that is potentially reversible.</li>
    <li>Value 3: Thalassemia : Represents the general presence of thalassemia without specifying further details.</li>
  </ul>
  <strong>Risk of Heart Attack (output)</strong> : Risk of heart attack:
  <ul>
    <li>Value 0: No</li>
    <li>Value 1: Yes</li>
  </ul>
</blockquote>

In [None]:
df.columns

In [None]:
# check the dataset shape
print("Number of Columns in data",df.shape[1])
print("---------------------------------------")
print("Number of Rows in data",df.shape[0])

In [None]:
# data information
df.info()

In [None]:
# checking for missing values in data
df.isna().sum()

- Data does not contain any missing values

In [None]:
# checking for duplicated values
df.duplicated().sum()

- Data contain 1 duplicated value

In [None]:
# Removing duplicated data
df.drop_duplicates(inplace=True)

In [None]:
# checking if duplicated value has been removed
df.duplicated().sum()

In [None]:
# checking count the number of unique values in each column of the data
df.nunique()

In [None]:
# Descriptive analysis for numerical data
df.describe().style.background_gradient()

-----------------------------------------------

<a class="anchor"  id="3"></a>
<div style=" text-align: center; justify-content: center; align-items: center; background-color: lightblue; padding: 10px;  border-radius: 10px;">
    <h1 style="color: black;  margin-top: 9px; margin-bottom: 9px; ">3- Exploratory Data Analysis 📊</h1>
</div>

<a class="anchor"  id="3.1"></a>
## 3.1- Univariate Analysis

### Exploration (age, sex, cp, fbs, restecg, exng, slp, caa, thall, output) Features

In [None]:
fig, axes = plt.subplots(5, 2, figsize=(12, 20))

count_features = ['age', 'sex', 'cp', 'fbs', 'restecg', 'exng', 'slp', 'caa', 'thall', 'output']

for i, ax in enumerate(axes.flat):
    if i < len(count_features):
        sns.countplot(data=df, x=count_features[i], ax=ax, palette='Set2', orient='h')
        ax.set_title(f'Countplot for {count_features[i]}', fontsize=14)
        ax.set_yticklabels([]) 

plt.tight_layout()

plt.show()

### Exploration (trtbps, chol, thalachh, oldpeak) Features

In [None]:
fig, axes = plt.subplots(2, 2, figsize=(14, 8))

num_features = ['trtbps', 'chol', 'thalachh', 'oldpeak']

for i, ax in enumerate(axes.flat):
    if i < len(num_features):
        sns.boxplot(data=df, x=num_features[i], ax=ax, palette='Set2', orient='h')
        ax.set_title(f'Boxplot for {num_features[i]}', fontsize=14)
        ax.set_yticklabels([]) 

plt.tight_layout()

plt.show()

### **Observation of Univariate Analysis**

- **``Outliers Exist``**: Minor Outliers were detected in every numeric column of the dataset during the examination.
- **``Contextual Significance``**: These outliers are not anomalies or incorrect data points but are in alignment with the overall trends and patterns observed in the dataset.
- **``Categorical Features``**: Many of the categorical features are imbalanced.

<a class="anchor"  id="3.2"></a>
## 3.2- Bivariate Analysis

### Correlation Matrix Summary

In [None]:
correlation_matrix = df.corr()

fig = px.imshow(correlation_matrix, text_auto=True, color_continuous_scale='Viridis')

fig.update_layout(
    title="Correlation Matrix",
    paper_bgcolor='white',
    plot_bgcolor='white',
    xaxis_showticklabels=True,
    yaxis_showticklabels=True,
    autosize=False,  
    width=1000,  
    height=1000,  
)

fig.show()

### Age and Risk of Heart Attack (Output)

In [None]:
fig = px.histogram(df, x='age', color='output', title='Age vs. Risk of Heart Attack (Output)',
                   labels={'age': 'Age', 'output': 'Output'},
                   marginal='box', barmode='group',
                   color_discrete_sequence=['#1f77b4', '#ff7f0e']
                   )

fig.update_layout(
    xaxis=dict(tickmode='linear', dtick=5),
    bargap=0.1
)

fig.update_traces(marker=dict(line=dict(width=2, color='DarkSlateGrey')))
fig.update_xaxes(showgrid=True, gridcolor='lightgray')
fig.update_yaxes(showgrid=True, gridcolor='lightgray')
fig.update_layout(title_font=dict(size=18), title_x=0.5, title_y=0.95)
fig.update_layout(plot_bgcolor='white', paper_bgcolor='white')

fig.show()

### Sex and Risk of Heart Attack (Output)

In [None]:
df_male = df[df['sex'] == 1]
df_female = df[df['sex'] == 0]

male_counts = df_male['output'].value_counts()
female_counts = df_female['output'].value_counts()

colors = ['#1f77b4', '#ff7f0e']  

fig = make_subplots(rows=1, cols=2, subplot_titles=('Male', 'Female'), specs=[[{'type':'domain'}, {'type':'domain'}]])

fig.add_trace(go.Pie(values=male_counts, name='Male',
                     marker=dict(colors=colors)), 1,1)

fig.add_trace(go.Pie(values=female_counts, name='Female',
                     marker=dict(colors=colors)), 1,2)
fig.update_traces(hole=.4)
fig.update_layout(title_text='Sex vs. Risk of Heart Attack (Output)', title_font=dict(size=18), title_x=0.5, title_y=0.95,
                 annotations=[dict(text='Male', x=0.22, y=0.45, font_size=25, showarrow=False),
                 dict(text='Female', x=0.78, y=0.45, font_size=25, showarrow=False)])
fig.update_layout(plot_bgcolor='white', paper_bgcolor='white')

fig.show()

### Cp (Chest Pain Types) and Risk of Heart Attack (Output) 

In [None]:
fig = px.histogram(df, x='cp', color='output', title='Cp (Chest Pain Types) vs. Risk of Heart Attack (Output)',
                   labels={'cp': 'Chest Pain Types', 'output': 'Output'}, barmode='group',
                   color_discrete_sequence=['#1f77b4', '#ff7f0e']
       
                   )
fig.update_layout(
    bargap=0.1
   )

fig.update_traces(marker=dict(line=dict(width=2, color='DarkSlateGrey')))
fig.update_xaxes(showgrid=True, gridcolor='lightgray', tickvals=[0, 1, 2, 3],
                 ticktext=['Typical Angina (0)', 'Atypical Angina (1)', 'Non-Anginal Pain (2)', 'Asymptomatic (3)'])
fig.update_yaxes(showgrid=True, gridcolor='lightgray')
fig.update_layout(title_font=dict(size=18), title_x=0.5, title_y=0.95)
fig.update_layout(plot_bgcolor='white', paper_bgcolor='white')

fig.show()

### Resting Blood Pressure (trtbps) and Risk of Heart Attack (output)

In [None]:
fig = px.box(df, x='output', y='trtbps', color='output', title='Resting Blood Pressure (trtbps) vs. Risk of Heart Attack (output)',
              labels={'output': 'Heart Attack', 'trtbps': 'Resting Blood Pressure'},
              points="all", color_discrete_sequence=['#1f77b4', '#ff7f0e']
             )

fig.update_layout(
    xaxis_title='Heart Attack (output)',  
    yaxis_title='Resting Blood Pressure (trtbps)',
    showlegend= False
)
fig.update_layout(plot_bgcolor='white', paper_bgcolor='white')
fig.update_traces(marker=dict(line=dict(width=2, color='DarkSlateGrey')))
fig.update_xaxes(showgrid=True, gridcolor='lightgray')
fig.update_yaxes(showgrid=True, gridcolor='lightgray')

fig.show()


### Serum Cholesterol Levels (chol) and Risk of Heart Attack (output)

In [None]:
fig = px.box(df, x='output', y='chol', color='output', title='Serum Cholesterol Levels (chol) vs. Risk of Heart Attack (output)',
              labels={'output': 'Heart Attack', 'chol': 'Serum Cholesterol'},
              points="all", color_discrete_sequence=['#1f77b4', '#ff7f0e']
             )

fig.update_layout(
    xaxis_title='Heart Attack (output)',  
    yaxis_title='Serum Cholesterol (chol)',
    showlegend= False
)
fig.update_layout(plot_bgcolor='white', paper_bgcolor='white')
fig.update_traces(marker=dict(line=dict(width=2, color='DarkSlateGrey')))
fig.update_xaxes(showgrid=True, gridcolor='lightgray')
fig.update_yaxes(showgrid=True, gridcolor='lightgray')

fig.show()

### Fasting Blood Sugar (fbs) and Risk of Heart Attack (Output)

In [None]:
fig = px.histogram(df, x='fbs', color='output', title='Fasting Blood Sugar (fbs) vs. Risk of Heart Attack (Output)',
                   labels={'fbs': 'Fasting Blood Sugar (fbs)', 'output': 'Output'}, 
                   category_orders={'fbs': ['0', '1']},  
                   barmode='group',
                   color_discrete_sequence=['#1f77b4', '#ff7f0e']   
                   )

fig.update_xaxes(tickvals=[0, 1], ticktext=['<= 120 mg/dL (0)', '> 120 mg/dL (1)']) 

fig.update_traces(marker=dict(line=dict(width=2, color='DarkSlateGrey')))
fig.update_xaxes(showgrid=True, gridcolor='lightgray')
fig.update_yaxes(showgrid=True, gridcolor='lightgray')
fig.update_layout(title_font=dict(size=18), title_x=0.5, title_y=0.95)
fig.update_layout(plot_bgcolor='white', paper_bgcolor='white')

fig.show()


### Resting ECG Results (restecg) and Risk of Heart Attack (Output)

In [None]:
fig = px.histogram(df, x='restecg', color='output', title='Resting ECG Results (restecg) vs. Risk of Heart Attack (Output)',
                   labels={'restecg': 'Resting ECG Results (restecg)', 'output': 'Output'}, barmode='group',
                   color_discrete_sequence=['#1f77b4', '#ff7f0e'],
                   category_orders={'restecg': ['0', '1', '2']},  
                   )

fig.update_xaxes(tickvals=[0, 1, 2], ticktext=['Normal (0)', 'ST-T Wave Abnormality (1)', 'Probable/Definite LVH (2)'])

fig.update_traces(marker=dict(line=dict(width=2, color='DarkSlateGrey')))
fig.update_xaxes(showgrid=True, gridcolor='lightgray')
fig.update_yaxes(showgrid=True, gridcolor='lightgray')
fig.update_layout(title_font=dict(size=18), title_x=0.5, title_y=0.95)
fig.update_layout(plot_bgcolor='white', paper_bgcolor='white')

fig.show()

### Maximum Heart Rate During Exercise (thalachh) and Risk of Heart Attack (Output)

In [None]:
thalachh_output_data = df.groupby('output').agg({
    'thalachh': 'mean'
}).reset_index()

fig = px.histogram(thalachh_output_data ,x='output',  y='thalachh', 
                   color='output',
                   title='Maximum Heart Rate  (thalachh) vs. Risk of Heart Attack (Output)',
                   color_discrete_sequence=['#1f77b4', '#ff7f0e']
                   )

fig.update_layout( 
    xaxis_title='Heart Attack Presence (Output)',  
    yaxis_title='Maximum Heart Rate (thalachh)',  
    xaxis=dict(tickmode='array', tickvals=[0, 1]), 
    bargap=0.1
   )

fig.update_traces(marker=dict(line=dict(width=2, color='DarkSlateGrey')))
fig.update_xaxes(showgrid=True, gridcolor='lightgray')
fig.update_yaxes(showgrid=True, gridcolor='lightgray')
fig.update_layout(title_font=dict(size=18), title_x=0.5, title_y=0.95)
fig.update_layout(plot_bgcolor='white', paper_bgcolor='white')

fig.show()

### Exercise-Induced Angina (exng) and Risk of Heart Attack (Output)

In [None]:
fig = px.histogram(df, x='exng', color='output', title='Exercise-Induced Angina (exng) vs. Risk of Heart Attack (Output)',
                   labels={'exng': 'Exercise-Induced Angina (exng)', 'output': 'Output'}, 
                   barmode='group',
                   color_discrete_sequence=['#1f77b4', '#ff7f0e']   
                   )
fig.update_layout(
    bargap=0.1
   )

fig.update_traces(marker=dict(line=dict(width=2, color='DarkSlateGrey')))
fig.update_xaxes(showgrid=True, gridcolor='lightgray')
fig.update_yaxes(showgrid=True, gridcolor='lightgray')
fig.update_layout(title_font=dict(size=18), title_x=0.5, title_y=0.95)
fig.update_layout(plot_bgcolor='white', paper_bgcolor='white')

fig.show()

### Oldpeak vs. Risk of Heart Attack (Output)

In [None]:
fig = px.histogram(df, x='oldpeak', color='output', title='Oldpeak vs. Risk of Heart Attack (Output)',
                   labels={'oldpeak': 'Oldpeak', 'output': 'Output'}, barmode='group',
                   color_discrete_sequence=['#1f77b4', '#ff7f0e'])

fig.update_layout(
    bargap=0.1
)

fig.update_traces(marker=dict(line=dict(width=2, color='DarkSlateGrey')))
fig.update_xaxes(showgrid=True, gridcolor='lightgray')
fig.update_yaxes(showgrid=True, gridcolor='lightgray')
fig.update_layout(title_font=dict(size=18), title_x=0.5, title_y=0.95)
fig.update_layout(plot_bgcolor='white', paper_bgcolor='white')

fig.show()


what does it mean when oldpeak =0 has the highest output = 1 and oldpeak started to decrease from 1 to 6 with high output = 0

### Slope of ST Segment (slp) and Risk of Heart Attack (Output)

In [None]:
fig = px.histogram(df, x='slp', color='output', title='slp (Slope of ST Segment) vs. Risk of Heart Attack (Output)',
                   labels={'slp': 'Slope of ST Segment', 'output': 'Output'}, barmode='group',
                   color_discrete_sequence=['#1f77b4', '#ff7f0e']   
                   )
fig.update_layout(
    bargap=0.1
   )

fig.update_traces(marker=dict(line=dict(width=2, color='DarkSlateGrey')))
fig.update_xaxes(showgrid=True, gridcolor='lightgray', tickvals=[0, 1, 2, 3],
                ticktext=['Downsloping (0)', 'Flat (1)', 'Upsloping (2)'])
fig.update_yaxes(showgrid=True, gridcolor='lightgray')
fig.update_layout(title_font=dict(size=18), title_x=0.5, title_y=0.95)
fig.update_layout(plot_bgcolor='white', paper_bgcolor='white')

fig.show()

### CAA and Risk of Heart Attack (Output)

In [None]:
fig = px.histogram(df, x='caa', color='output', barmode='group', 
                   color_discrete_sequence=['#1f77b4', '#ff7f0e'],
                   labels={'caa': 'CAA (Number of Major Vessels)', 'output': 'Output'})
fig.update_xaxes(title_text='CAA (Number of Major Vessels)')
fig.update_yaxes(title_text='Count')

fig.update_layout(title_text='CAA vs. Risk of Heart Attack  (Output)', title_font=dict(size=18), title_x=0.5, title_y=0.95)
fig.update_layout(plot_bgcolor='white', paper_bgcolor='white')

fig.show()

### Thall and Risk of Heart Attack (Output)

In [None]:
fig = px.histogram(df, x='thall', color='output', barmode='group', 
                   color_discrete_sequence=['#1f77b4', '#ff7f0e'],
                   labels={'thall': 'Thall (Thalassemia Type)', 'output': 'Output'},
                   category_orders={'thall': [0, 1, 2, 3]},
                   title='Thall vs. Risk of Heart Attack  (Output)')
fig.update_xaxes(title_text='Thall (Thalassemia Type)', tickvals=[0, 1, 2, 3], 
                 ticktext=['None (Normal) (0)', 'Fixed Defect (1)', 'Reversible Defect (2)', 'Thalassemia (3)'])
fig.update_yaxes(title_text='Count')

fig.update_layout(title_font=dict(size=18), title_x=0.5, title_y=0.95)
fig.update_layout(plot_bgcolor='white', paper_bgcolor='white')

fig.show()

---------------------------------------------------------

<a class="anchor"  id="4"></a>
<div style=" text-align: center; justify-content: center; align-items: center; background-color: lightblue; padding: 10px;  border-radius: 10px;">
    <h1 style="color: black;  margin-top: 9px; margin-bottom: 9px; ">4- EDA Conclusion 📈</h1>
</div>

<ul>
  <li>Age has small effect on heart attack risk, with higher occurrences around ages 44-45 and 52-53. (Males are at higher risk than females)</li><br>
  <li>Chest pain type (cp) plays a crucial role: People with Value 2 (Non-Anginal pain) face the highest heart attack risk, while those with Value 0 (Typical Angina) have the lowest heart attack risk.</li><br>
  <li>Cholesterol levels (chol) and resting blood pressure (trtbps) exhibit weak correlations with heart attacks.</li><br>
  <li>For people with <=120 mg/dL cholesterol, heart attack risk varies, whereas >120 mg/dL is associated with lower risk.</li><br>
  <li>ST-T Wave Abnormality (restecg = 1) correlates with a higher likelihood of heart attacks, while Normal (restecg = 0) suggests lower heart attack risk. More data is needed for Probable/Definite LVH (restecg = 2).</li><br>
  <li>A higher maximum heart rate during exercise (thalachh) indicates a higher risk of heart attacks.</li><br>
  <li>People without exercise-induced angina (exng = 0) tend to have a higher chance of heart attacks.</li><br>
  <li>An oldpeak of 0 signifies a greater risk of heart attack, while increasing oldpeak (1 to 6) suggests lower risk.</li><br>
  <li>An upsloping ST segment (slp = 2) indicates a high risk of heart attacks, a flat ST segment (slp = 1) suggests a lower risk, and downsloping (slp = 0) indicates the lowest risk.</li><br>
  <li>People with fewer major vessels (caa) face a much higher chance of heart attacks.</li><br>
  <li>People with Thall = Reversible Defect (2) are associated with a significantly higher risk of heart attacks.</li><br>
</ul>


---------------------------------------------------------------------------------------------

<a class="anchor"  id="5"></a>
<div style=" text-align: center; justify-content: center; align-items: center; background-color: lightblue; padding: 10px;  border-radius: 10px;">
    <h1 style="color: black;  margin-top: 9px; margin-bottom: 9px; ">5- Data Preprocessing ⚒️</h1>
</div>

<a class="anchor"  id="5.1"></a>
## 5.1- Outlier Handling via Removing

In [None]:
# outliers_indices = detect_outliers(df, n=0, features=df.columns)
# len(outliers_indices)

In [None]:
# # # Removing outliers for better model performance
# df.drop(outliers_indices, inplace=True)

<a class="anchor"  id="5.2"></a>
## 5.2- Data Split to Train and Test Sets

In [None]:
# First we extract the x Featues and y Label
x = df.drop(['output'], axis=1)
y = df['output']

In [None]:
x.shape, y.shape

In [None]:
# Then we Split the data into training and testing sets (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.20, random_state=1)

X_train.shape,X_test.shape,y_train.shape,y_test.shape

<a class="anchor"  id="5.3"></a>
## 5.3- Feature Scaling

### Standardizing data with StandardScaler

In [None]:
scaler = StandardScaler()

scaler.fit(X_train)

X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

In [None]:
X_train

----------------------------------------------------------------

<a class="anchor"  id="6"></a>
<div style=" text-align: center; justify-content: center; align-items: center; background-color: lightblue; padding: 10px;  border-radius: 10px;">
    <h1 style="color: black;  margin-top: 9px; margin-bottom: 9px; ">6- Models Training and Evaluation ⚙️</h1>
</div>

<a class="anchor"  id="6.1"></a>
## 6.1- Logistic Regression Model

In [None]:
lr=LogisticRegression()
lr.fit(X_train, y_train)
y_pred = lr.predict(X_test)
print(f'Training Accuracy: {accuracy_score(y_train, lr.predict(X_train))}')
print(f'Testing Accuracy: {accuracy_score(y_test, y_pred)}')
print('------------------------------------------------------------------')
print(f'Testing Confusion Matrix: \n{confusion_matrix(y_test, y_pred)}')
print('------------------------------------------------------------------')
print(f'Testing Classification report: \n{classification_report(y_test, y_pred)}')

<a class="anchor"  id="6.2"></a>
## 6.2- K Nearest Keighbor Model

In [None]:
knn=KNeighborsClassifier()
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)
print(f'Training Accuracy: {accuracy_score(y_train, knn.predict(X_train))}')
print(f'Testing Accuracy: {accuracy_score(y_test, y_pred)}')
print('------------------------------------------------------------------')
print(f'Testing Confusion Matrix: \n{confusion_matrix(y_test, y_pred)}')
print('------------------------------------------------------------------')
print(f'Testing Classification report: \n{classification_report(y_test, y_pred)}')

<a class="anchor"  id="6.3"></a>
## 6.3- Support Vector Classifier Model

In [None]:
svc=SVC()
svc.fit(X_train, y_train)
y_pred = svc.predict(X_test)
print(f'Training Accuracy: {accuracy_score(y_train, svc.predict(X_train))}')
print(f'Testing Accuracy: {accuracy_score(y_test, y_pred)}')
print('------------------------------------------------------------------')
print(f'Testing Confusion Matrix: \n{confusion_matrix(y_test, y_pred)}')
print('------------------------------------------------------------------')
print(f'Testing Classification report: \n{classification_report(y_test, y_pred)}')

<a class="anchor"  id="6.4"></a>
## 6.4- Naive Bayes Classifier Model

In [None]:
nb=GaussianNB()
nb.fit(X_train, y_train)
y_pred = nb.predict(X_test)
print(f'Training Accuracy: {accuracy_score(y_train, nb.predict(X_train))}')
print(f'Testing Accuracy: {accuracy_score(y_test, y_pred)}')
print('------------------------------------------------------------------')
print(f'Testing Confusion Matrix: \n{confusion_matrix(y_test, y_pred)}')
print('------------------------------------------------------------------')
print(f'Testing Classification report: \n{classification_report(y_test, y_pred)}')

<a class="anchor"  id="6.5"></a>
## 6.5- Decision Tree Classifier Model

In [None]:
dt=DecisionTreeClassifier()
dt.fit(X_train, y_train)
y_pred = dt.predict(X_test)
print(f'Training Accuracy: {accuracy_score(y_train, dt.predict(X_train))}')
print(f'Testing Accuracy: {accuracy_score(y_test, y_pred)}')
print('------------------------------------------------------------------')
print(f'Testing Confusion Matrix: \n{confusion_matrix(y_test, y_pred)}')
print('------------------------------------------------------------------')
print(f'Testing Classification report: \n{classification_report(y_test, y_pred)}')

<a class="anchor"  id="6.6"></a>
## 6.6- Random Forest Classifier Model

In [None]:
rf=RandomForestClassifier()
rf.fit(X_train, y_train)
y_pred = dt.predict(X_test)
print(f'Training Accuracy: {accuracy_score(y_train, rf.predict(X_train))}')
print(f'Testing Accuracy: {accuracy_score(y_test, y_pred)}')
print('------------------------------------------------------------------')
print(f'Testing Confusion Matrix: \n{confusion_matrix(y_test, y_pred)}')
print('------------------------------------------------------------------')
print(f'Testing Classification report: \n{classification_report(y_test, y_pred)}')

<a class="anchor"  id="6.7"></a>
## 6.7- Ada Boost Classifier Model

In [None]:
ada=AdaBoostClassifier()
ada.fit(X_train, y_train)
y_pred = ada.predict(X_test)
print(f'Training Accuracy: {accuracy_score(y_train, ada.predict(X_train))}')
print(f'Testing Accuracy: {accuracy_score(y_test, y_pred)}')
print('------------------------------------------------------------------')
print(f'Testing Confusion Matrix: \n{confusion_matrix(y_test, y_pred)}')
print('------------------------------------------------------------------')
print(f'Testing Classification report: \n{classification_report(y_test, y_pred)}')

<a class="anchor"  id="6.8"></a>
## 6.8- Gradient Boosting Classifier Model

In [None]:
gb=GradientBoostingClassifier()
gb.fit(X_train, y_train)
y_pred = gb.predict(X_test)
print(f'Training Accuracy: {accuracy_score(y_train, gb.predict(X_train))}')
print(f'Testing Accuracy: {accuracy_score(y_test, y_pred)}')
print('------------------------------------------------------------------')
print(f'Testing Confusion Matrix: \n{confusion_matrix(y_test, y_pred)}')
print('------------------------------------------------------------------')
print(f'Testing Classification report: \n{classification_report(y_test, y_pred)}')

<a class="anchor"  id="6.9"></a>
## 6.9- XGBoost Classifier Model

In [None]:
xgboost=xgb.XGBClassifier()
xgboost.fit(X_train, y_train)
y_pred = xgboost.predict(X_test)
print(f'Training Accuracy: {accuracy_score(y_train, xgboost.predict(X_train))}')
print(f'Testing Accuracy: {accuracy_score(y_test, y_pred)}')
print('------------------------------------------------------------------')
print(f'Testing Confusion Matrix: \n{confusion_matrix(y_test, y_pred)}')
print('------------------------------------------------------------------')
print(f'Testing Classification report: \n{classification_report(y_test, y_pred)}')

------------------------------------------------------------------------------

<a class="anchor"  id="7"></a>
<div style=" text-align: center; justify-content: center; align-items: center; background-color: lightblue; padding: 10px;  border-radius: 10px;">
    <h1 style="color: black;  margin-top: 9px; margin-bottom: 9px; ">7- Models Predictions Conclusion💡</h1>
</div>

>**Note :** The key factor in model selection is "recall," which measures Type 2 errors. The goal is to minimize recall errors to avoid false negatives for heart attack predictions.

<ul>
<li>Logistic Regression achieved good accuracy in both training (86%) and testing (89%) with a low recall error of 3.</li><br>
<li>K-Nearest Neighbors (KNN) performed well in training (89%) but had a slightly lower testing accuracy (82%) and a recall error of 4.</li><br>
<li>Support Vector Classifier (SVC) excelled in training with a 92% accuracy but showed overfitting in testing with 82% accuracy.</li><br>
    <li>Naive Bayes maintained consistent accuracy (84%) but had a higher recall error of 6.</li><br>
<li>Models like Decision Trees, Random Forest, Ada Boost, Gradient Boost, and XGBoost performed exceptionally in training but struggled in testing, possibly due to overfitting.</li><br>
<li>Logistic Regression is the preferred model for this problem, Considering its good testing performance and low recall error.</li><br>
</ul>