# EazyML Modeling: Heart Attack Classification

## Define Imports

In [None]:
!pip install --upgrade eazyml-automl
!pip install gdown python-dotenv

In [None]:
import os
import pandas as pd
import eazyml as ez
import gdown
from dotenv import load_dotenv
load_dotenv()

## 1. Initialize EazyML

The `ez_init` function uses the `EAZYML_ACCESS_KEY` environment variable for authentication. If the variable is not set, it defaults to a trial license.

In [None]:
ez.ez_init(os.getenv('EAZYML_ACCESS_KEY'))

## 2. Define Dataset Files, Outcome Variable and Train model

### 2.1 Define Train Dataset and Other Model Parameters

In [None]:
gdown.download_folder(id='1EobxYR3pg_Z3Sd4sETfe4aJLAsT98fL2')

In [None]:
#classificaton
file_path = os.path.join('data', "Heart_Attack_traindata.csv")
outcome = "class"

# read dataframe and modify outcome column in numerical value
df = pd.read_csv(file_path)

# define options
options = {'model_type': 'predictive'}

### 2.2 Train Model

In [None]:
resp = ez.ez_build_model(df, outcome=outcome, options=options)

### 2.3 Show model performance

In [None]:
ez.ez_display_df(resp['model_performance'])

## 3. Dataset Information

The dataset used in this notebook is the **Heart Attack Dataset**, which is a well-known dataset in machine learning and statistics. It contains data about patients, with several features (such as age, gender, blood pressure levels, and heart-related measurements) to predict the likelihood of a heart attack.

### Columns in the Dataset:
- **age**: The age of the patient, measured in years.
- **gender**: The gender of the patient, represented as a categorical variable (e.g., 1 = male, 0 = female).
- **impulse**: Refers to the patient's pulse rate, measured in beats per minute (bpm).
- **pressurehight**: Refers to systolic blood pressure, the higher number in a blood pressure reading (e.g., 120/80 mmHg).
- **pressurelow**: Refers to diastolic blood pressure, the lower number in a blood pressure reading (e.g., 120/80 mmHg).
- **glucose**: A measurement related to the heart, likely referring to potassium (K) concentration.
- **kcm**: This refer to a measurement related to the heart, related to potassium (K) concentration.
- **troponin**: A protein found in the heart muscle, measured to assess heart damage (especially after a heart attack).
- **class**: The target variable, indicating the presence or absence of a condition or disease (e.g., 1 = heart attack, 0 = no heart attack).

### 3.1 Display the Dataset

Below is a preview of the dataset:

In [None]:
# Load the dataset from the provided file
train = pd.read_csv(file_path)

# Display the first few rows of the dataset
ez.ez_display_df(train.head())

## 4. Define Test Dataset and Predict on that Dataset

In [None]:
# In extra info, we have model information
model_info = resp["model_info"]

### 4.1 Define Test Dataset

In [None]:
test_file_path = os.path.join('data', "Heart_Attack_testdata.csv")
test_data = pd.read_csv(test_file_path)

### 4.2 Predict on Test Dataset

In [None]:
options = {}
pred_resp = ez.ez_predict(test_data, model_info=model_info, options=options)
pred_df = pred_resp['pred_df']


In [None]:
ez.ez_display_df(pred_df.head())