<a name="top"></a>
# Heart Attack - EDA

## 1. Introduction
- [1.1 Data Dictionary](#11-data-dictionary)
- [1.2 Task](#12-task)

## 2. Preparation
- [2.1 Packages](#21-packages)
- [2.2 Data](#22-data)
- [2.3 Understanding Data](#23-understanding-data)

## 3. Exploratory Data Analysis
- [3.1 Univariate Analysis](#31-univariate-analysis)
- [3.2 Bivariate Analysis](#32-bivariate-analysis)

## 4. Data Preprocessing
- [4.1 Conclusions from the EDA](#41-conclusions-from-the-eda)
- [4.2 Packages](#42-packages)
- [4.3 Making features model ready](#43-making-features-model-ready)

## 5. Modeling
- [5.1 Linear Classifiers](#51-linear-classifiers)
- [5.2 Tree Models](#52-tree-models)


<a name="top"></a>


# 1. Introduction 
A dataset for heart attack classification 

The dataset can be downloaded with the following link https://www.kaggle.com/datasets/rashikrahmanpritom/heart-attack-analysis-prediction-dataset/code?datasetId=1226038&sortBy=voteCount
[back to top](#top)

## 1.1 Data Dictionary

- `age` - Age of the patient
- `sex` - Sex of the patient
- `cp` - Chest pain type `~` 0 = Typical Angina, 1 = Atypical Angina, 2 = Non-anginal Pain, 3 = Asymptomatic
- `trtbps` - Resting blood pressure (in mm Hg)
- `chol` - Cholesterol in mg/dl fetched via BMI sensor
- `fbs` - (fasting blood sugar > 120 mg/dl) `~` 1 = True, 0 = False
- `restecg` - Resting electrocardiographic results `~` 0 = Normal, 1 = ST-T wave normality, 2 = Left ventricular hypertrophy
- `thalachh` - Maximum heart rate achieved
- `oldpeak` - Previous peak
- `slp` - Slope
- `caa` - Number of major vessels
- `thall` - Thallium Stress Test result `~` 0,3
- `exng` - Exercise induced angina `~` 1 = Yes, 0 = No
- `output` - Target variable

## 1.2 Task
To perform EDA and predict if a person is prone to a heart attack or not.

## 2. Preparation 
[back to top](#top)

### 2.1 Packages

In [21]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

import warnings
warnings.filterwarnings("ignore")

### 2.2 Data

In [22]:
df = pd.read_csv('heart.csv')

### 2.3 Understanding Data 

#### The shape of the data

In [23]:
print(df.shape)
df.head()

(303, 14)


Unnamed: 0,age,sex,cp,trtbps,chol,fbs,restecg,thalachh,exng,oldpeak,slp,caa,thall,output
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


The shape of the dataset is :  (303, 14)

#### 2.3.3 Checking the number of unique values in each column

In [24]:

dict = {}
for i in list(df.columns):
    dict[i] = df[i].value_counts().shape[0]

pd.DataFrame(dict,index=["unique count"]).transpose()

Unnamed: 0,unique count
age,41
sex,2
cp,4
trtbps,49
chol,152
fbs,2
restecg,3
thalachh,91
exng,2
oldpeak,40


In [25]:
#### 2.3.4 Separating the columns in categorical and continuous

In [26]:
cat_cols = ['sex','exng','caa','cp','fbs','restecg','slp','thall']
con_cols = ["age","trtbps","chol","thalachh","oldpeak"]
target_col = ["output"]
print("The categorial cols are : ", cat_cols)
print("The continuous cols are : ", con_cols)
print("The target variable is :  ", target_col)

The categorial cols are :  ['sex', 'exng', 'caa', 'cp', 'fbs', 'restecg', 'slp', 'thall']
The continuous cols are :  ['age', 'trtbps', 'chol', 'thalachh', 'oldpeak']
The target variable is :   ['output']


#### 2.3.5 Summary Statistics

In [27]:
df[con_cols].describe().transpose()
df[cat_cols].describe().transpose()

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
sex,303.0,0.683168,0.466011,0.0,0.0,1.0,1.0,1.0
exng,303.0,0.326733,0.469794,0.0,0.0,0.0,1.0,1.0
caa,303.0,0.729373,1.022606,0.0,0.0,0.0,1.0,4.0
cp,303.0,0.966997,1.032052,0.0,0.0,1.0,2.0,3.0
fbs,303.0,0.148515,0.356198,0.0,0.0,0.0,0.0,1.0
restecg,303.0,0.528053,0.52586,0.0,0.0,1.0,1.0,2.0
slp,303.0,1.39934,0.616226,0.0,1.0,1.0,2.0,2.0
thall,303.0,2.313531,0.612277,0.0,2.0,2.0,3.0,3.0


#### 2.3.6 Missing values 

In [28]:
df.isnull().sum()

age         0
sex         0
cp          0
trtbps      0
chol        0
fbs         0
restecg     0
thalachh    0
exng        0
oldpeak     0
slp         0
caa         0
thall       0
output      0
dtype: int64

## 3. Exploratory Data analysis 
[back to top](#top)

### 3.1 Univariaty analysis 