# Heart Attack Risk Prediction using AdaBoost Classifier  



## 1. Introduction  

Heart disease is one of the leading causes of death worldwide, and early detection of heart attack risk can significantly help in preventive healthcare. In this project, we use the **AdaBoost Classifier** to predict the risk of a heart attack based on various health parameters.  

## 2. Libraries Used  

The following Python libraries were used in this project:  

- **pandas** - For data loading and manipulation  
- **numpy** - For numerical operations  
- **seaborn** - For data visualization  
- **sklearn.preprocessing.LabelEncoder** - To convert categorical variables into numerical form  
- **sklearn.model_selection.train_test_split** - For splitting the dataset into training and testing sets  
- **sklearn.ensemble.AdaBoostClassifier** - For model training  
- **sklearn.metrics.accuracy_score** - For model evaluation  

## 3. About the Dataset  

The dataset contains multiple features that help determine heart attack risk based on various health and lifestyle factors. The dataset consists of the following columns:  

- **Gender**: Male or Female  
- **Age**: Patient's age  
- **Physical_Activity_Level**: Level of exercise performed  
- **Stress_Level**: Patient's stress level  
- **Chest_Pain_Type**: Type of chest pain experienced  
- **Thalassemia**: Blood disorder information  
- **ECG_Results**: Electrocardiogram readings  
- **Diabetes**: Whether the patient has diabetes (Target Variable)  
- **Heart_Attack_Risk**: Risk classification for heart attack  

### Relationships Between Variables  

- Higher stress levels and lower physical activity levels may indicate an increased risk of heart disease.  
- Abnormal ECG results and certain chest pain types are strong indicators of heart-related conditions.  
- Patients with diabetes often have a higher risk of heart disease.  

## 4. Data Preprocessing  

### Checking for Null Values  

The dataset was analyzed for missing values, and no null values were found. This ensures that no data imputation was required.  

### Encoding Categorical Variables  

Since the dataset contains categorical features, we used **Label Encoding** to convert them into numerical form:  


In [None]:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
data['Gender'] = le.fit_transform(data['Gender'])
data['Physical_Activity_Level'] = le.fit_transform(data['Physical_Activity_Level'])
data['Stress_Level'] = le.fit_transform(data['Stress_Level'])
data['Chest_Pain_Type'] = le.fit_transform(data['Chest_Pain_Type'])
data['Thalassemia'] = le.fit_transform(data['Thalassemia'])
data['ECG_Results'] = le.fit_transform(data['ECG_Results'])
data['Heart_Attack_Risk'] = le.fit_transform(data['Heart_Attack_Risk'])

## 5. Model Building

The dataset was split into features (X) and the target variable (Y). The Diabetes column was chosen as the target variable.

In [None]:
from sklearn.model_selection import train_test_split
x = data.drop(['Diabetes'], axis=1)
y = data['Diabetes']

### Splitting Data into Training and Testing Sets

In [None]:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

### Training the AdaBoost Classifier

In [None]:
from sklearn.ensemble import AdaBoostClassifier
adaboost_model = AdaBoostClassifier()
adaboost_model.fit(x_train, y_train)