# EazyML Insights Template

## Define Imports

In [None]:
!pip install --upgrade eazyml-insight
!pip install --upgrade eazyml-automl
!pip install gdown python-dotenv

In [None]:
import os
from eazyml_insight import (
    ez_insight,
    ez_init,
    ez_validate
)

from eazyml import ez_display_df
import gdown
import pandas as pd

from dotenv import load_dotenv
load_dotenv()

## 1. Initialize EazyML

The `ez_init` function uses the `EAZYML_ACCESS_KEY` environment variable for authentication. If the variable is not set, it defaults to a trial license.

In [None]:
ez_init(access_key=os.getenv('EAZYML_ACCESS_KEY'))

## 2. Define Dataset Files and Outcome Variable

In [None]:
gdown.download_folder(id='1-RO9K9-YYGK7Wp__ioth0xPD8XqtgvKT')

In [None]:
# Names of the files that will be used by EazyML APIs
train_file_path = os.path.join('data', 'IRIS_Train.csv')
test_file_path = os.path.join('data', 'IRIS_Test.csv')

# The column name for outcome of interest
outcome = "species"

## 3. Dataset Information

The dataset used in this notebook is the **Iris Dataset**, which is a well-known dataset in machine learning and statistics. It contains data about 150 iris flowers, with four features (sepal length, sepal width, petal length, and petal width) and the species of the flower (setosa, versicolor, or virginica).

You can find more details and download the dataset from Kaggle using the following link:

[Kaggle Iris Dataset](https://www.kaggle.com/datasets/uciml/iris)

### Columns in the Dataset:
- **sepal_length**: Sepal length of the flower (cm)
- **sepal_width**: Sepal width of the flower (cm)
- **petal_length**: Petal length of the flower (cm)
- **petal_width**: Petal width of the flower (cm)
- **species**: Species of the iris flower (setosa, versicolor, virginica)

### 3.1 Display the Dataset

Below is a preview of the dataset:

In [None]:
# Load the dataset from the provided file
train = pd.read_csv(train_file_path)

# Display the first few rows of the dataset
train.head()

## 4. EazyML Insights

### 4.1 Auto-derive Insights

#### 4.1.1 Build Insight Model

In [None]:
response = ez_insight(train_file_path, outcome, options={})

#### 4.1.2 Convert Response to DataFrame

In [None]:
insights_df = pd.DataFrame(response['insights']['data'], columns=response['insights']['columns'])

#### 4.1.3 Display Augmented Insights

##### 4.1.3.1 For Class Iris-virginica

In [None]:
insights_df1 = insights_df[insights_df[outcome] == 'Iris-virginica']
ez_display_df(insights_df1.head())

##### 4.1.3.2 For Class Iris-versicolor

In [None]:
insights_df0 = insights_df[insights_df[outcome] == 'Iris-versicolor']
ez_display_df(insights_df0.head())

### 4.2 Validation of Insights

#### 4.2.1 Validating Insights

In [None]:
record_number = [3, 5]
options = {'record_number': record_number}
val_response = ez_validate(train_file_path, outcome, response['insights'], train_file_path, options=options)

#### 4.2.2 Convert Response to DataFrame

In [None]:
validate_df = pd.DataFrame(val_response['validations']['data'], columns=val_response['validations']['columns'])

#### 4.2.3 Display Validation Metrics

##### 4.2.3.1 For Class Iris-virginica

In [None]:
validate_df1 = validate_df[validate_df[outcome] == 'Iris-virginica']
ez_display_df(validate_df1.head())

##### 4.2.3.2 For Class Iris-versicolor

In [None]:
validate_df0 = validate_df[validate_df[outcome] == 'Iris-versicolor']
ez_display_df(validate_df0.head())

#### 4.2.4 Display Filtered Data for Specific Record Numbers

In [None]:
for i in range(len(record_number)):
    print (val_response['validation_filter'][i]['Augmented Intelligence Insights'])
    filter_df = pd.DataFrame(val_response['validation_filter'][i]['filtered_data']['data'], columns=val_response[
                             'validation_filter'][i]['filtered_data']['columns']) 
    ez_display_df(filter_df.head())
    print ('\n')