# Heart Attack Analysis and Prediction

## About Dataset

This dataset contains medical data of different patients having various health inicators using which we can analyze and predict the risk of heart attacks more accurately.

## Source

This dataset is present in Kaggle in the following link:
> https://www.kaggle.com/datasets/rashikrahmanpritom/heart-attack-analysis-prediction-dataset

## Data Dictionary

The dataset includes the following features:

- **Age**: Age of the patient.
- **Sex**: Sex of the patient
- **cp**: Chest pain type
  - Value 1: Typical angina
  - Value 2: Atypical angina
  - Value 3: Non-anginal pain
  - Value 4: Asymptomatic
- **trtbps**: Resting blood pressure (in mm Hg)
- **chol**: Cholesterol in mg/dl fetched via BMI sensor
- **fbs**: Fasting blood sugar > 120 mg/dl (1 = true; 0 = false)
- **rest_ecg**: Resting electrocardiographic results
  - Value 0: Normal
  - Value 1: Having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV)
  - Value 2: Showing probable or definite left ventricular hypertrophy by Estes' criteria
- **thalachh**: Maximum heart rate achieved
- **exng**: Exercise-induced angina (1 = yes; 0 = no)
- **oldpeak**: Numeric Data. This represents ST depression induced by exercise relative to rest for the patients.
- **slp**: This represents the slope of the peak exercise ST segment for the patients. Values are 0,1 and 2.
- **caa**: Number of major vessels (0-3)
- **thal**: Categorical Data. The thalassemia level in blood of patients. Values are 0, 1, 2 and 3.
- **output**: Heart attack risk indicator (0 = less chance of heart attack, 1 = more chance of heart attack)

## Problem Statements

1. **Feature Engineering**: The objective of feature engineering is to encode the categorical features into numerical using appropriate encoding techniques.
2. **Feature Selection**: The objective of feature selection is to select the most significant features for detecting the chances of heart attack.

### Load Libraries

In [1]:
# General
import pandas as pd
import numpy as np
import os
import warnings

# Feature Selection
from sklearn.feature_selection import SelectKBest, chi2

### Settings

In [2]:
# warning
warnings.filterwarnings("ignore")

# File Path
data_path = "../data"
csv_path = os.path.join(data_path, "heart.csv")

### Load Dataset

In [3]:
df = pd.read_csv(csv_path)

In [4]:
# Check Data
df.head()

Unnamed: 0,age,sex,cp,trtbps,chol,fbs,restecg,thalachh,exng,oldpeak,slp,caa,thall,output
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


### Feature Selection

In [5]:
# Separate input and output feature
X = df.iloc[:, :-1]
y = df.iloc[:, -1]

In [6]:
# Define Selector
selector = SelectKBest(chi2, k= 11)
# train the selector
selector.fit(X, y)

In [7]:
# Get the index of selected features
selected_feature_index = selector.get_support(indices=True)
# get the selected features
selected_features = list(df.columns[selected_feature_index])
# Add output feature
selected_features.append("output")
selected_features

['age',
 'sex',
 'cp',
 'trtbps',
 'chol',
 'thalachh',
 'exng',
 'oldpeak',
 'slp',
 'caa',
 'thall',
 'output']

In [8]:
# Get data for selected features
df_selected = df[selected_features]

In [9]:
# Sanity check
df_selected.head()

Unnamed: 0,age,sex,cp,trtbps,chol,thalachh,exng,oldpeak,slp,caa,thall,output
0,63,1,3,145,233,150,0,2.3,0,0,1,1
1,37,1,2,130,250,187,0,3.5,0,0,2,1
2,41,0,1,130,204,172,0,1.4,2,0,2,1
3,56,1,1,120,236,178,0,0.8,2,0,2,1
4,57,0,0,120,354,163,1,0.6,2,0,2,1


In [10]:
# Save selected
selected_path = os.path.join(data_path, "heart_selected.csv")
df_selected.to_csv(selected_path, index= False)