The objective of this assignment is to assess your understanding of the process of building, training, and evaluating various machine learning models using a medical dataset. This assignment aims to enhance your skills in data preprocessing, model training, performance evaluation, and result visualization, ultimately enabling you to determine which model performs best for predicting heart disease.

<h2> Dataset: </h2>

Use the `Heart Disease` dataset from this [link](https://drive.google.com/file/d/181O6AUx7naNTXQPYwLb7kC2wYwmHTWAK/view?usp=sharing). This dataset contains information on various medical attributes of patients and whether they have heart disease.

### Tasks:
1. **Data Preparation: (30 points)**
   - Load the Heart Disease dataset.
   - Perform any necessary data preprocessing, including handling missing values, encoding categorical variables, and normalizing/standardizing the data if needed.
   - Write a short description of your preprocessing steps with justification for your actions. Explain how you handled missing values and why you chose to remove or impute certain values, rows, or columns. Additionally, describe the type of encoding you used for categorical variables and the rationale behind your choices.

2. **Data Splitting:(10points)**
   - Split the dataset into training and testing sets using `train_test_split` from the `sklearn.model_selection` module. Use an 80-20 split for training and testing, respectively.

3. **Model Training:(30points)**
   - Initialize and train the following models using the training data:
     - Random Forest
     - Gradient Boosting
     - AdaBoost
     - Logistic Regression

4. **Model Evaluation:(30points)**
   - Evaluate the models on the test data using accuracy score.
   - Create a bar chart to visualize the accuracy of each model. The x-axis should have the model names, and the y-axis should represent the accuracy scores.

**Complete all of the above tasks, and then submit the notebook on Canvas.**

## Task 1

In [9]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.ensemble import RandomForestRegressor
import matplotlib.pyplot as plt

path = 'HeartDisease.csv'
df = pd.read_csv(filepath_or_buffer=path)
df.head()


# Identify numeric and categorical columns
numeric_features = ['Age', 'RestingBP', 'Cholesterol', 'FastingBS', 'Oldpeak', 'MaxHR']
categorical_features = ['Sex', 'ChestPainType', 'RestingECG', 'ST_Slope']

# Define transformers
numeric_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='median')),
    ('scaler', StandardScaler())
])

categorical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='most_frequent')),
    ('encoder', OneHotEncoder(handle_unknown='ignore'))
])

# Create a ColumnTransformer
preprocessor = ColumnTransformer(
    transformers=[
        ('num', numeric_transformer, numeric_features),
        ('cat', categorical_transformer, categorical_features)
    ]
)



Write your description here:

## Task 2

In [11]:
# Select features and target variable
X = df.drop(columns=['HeartDisease', 'ExerciseAngina'])
y = df['HeartDisease']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## Task 3

In [12]:
# Define the pipeline with preprocessing and model
pipe = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('model', RandomForestRegressor())
])

# Fit the pipeline to the training data
pipe.fit(X_train, y_train)

pipe.predict(X_test) 


array([0.16, 0.68, 0.99, 0.99, 0.  , 0.63, 0.85, 0.11, 0.67, 0.91, 0.54,
       0.01, 0.81, 0.08, 0.97, 0.88, 0.11, 0.55, 0.92, 0.3 , 0.7 , 0.83,
       0.  , 0.47, 0.71, 0.85, 0.  , 0.73, 0.  , 0.  , 0.95, 0.  , 0.57,
       0.97, 0.97, 0.43, 0.99, 0.  , 0.81, 0.75, 0.85, 0.82, 0.45, 0.  ,
       0.08, 0.59, 0.81, 0.9 , 1.  , 0.23, 0.05, 0.  , 0.95, 0.97, 0.19,
       0.05, 0.51, 0.83, 0.56, 0.9 , 0.34, 0.04, 0.  , 0.92, 0.16, 0.96,
       1.  , 0.94, 1.  , 0.63, 0.12, 0.  , 0.91, 0.49, 0.07, 0.88, 0.28,
       0.72, 0.04, 0.41, 0.73, 0.9 , 0.73, 0.  , 0.93, 0.9 , 0.05, 0.5 ,
       0.02, 0.34, 0.34, 0.94, 0.97, 0.  , 0.29, 0.  , 0.88, 0.45, 0.87,
       0.82, 0.12, 0.98, 0.72, 0.  , 0.58, 0.76, 0.02, 0.7 , 0.96, 0.01,
       0.96, 0.93, 0.05, 0.23, 0.97, 0.34, 0.92, 0.  , 0.96, 0.57, 0.8 ,
       0.59, 0.75, 0.68, 0.13, 0.  , 0.15, 0.26, 0.  , 0.49, 0.05, 1.  ,
       0.92, 0.14, 0.97, 0.09, 0.63, 0.78, 0.  , 0.84, 0.24, 0.19, 0.96,
       0.71, 0.88, 0.87, 0.9 , 0.24, 0.15, 0.39, 0.

## Task 4