# EazyML Insights Template

## Define Imports

In [None]:
!pip install --upgrade eazyml-insight
!pip install --upgrade eazyml-automl
!pip install gdown python-dotenv

In [None]:
import os
from eazyml_insight import (
    ez_insight,
    ez_init,
    ez_validate
)

from eazyml import ez_display_df
import gdown
import pandas as pd

from dotenv import load_dotenv
load_dotenv()

## 1. Initialize EazyML

The `ez_init` function uses the `EAZYML_ACCESS_KEY` environment variable for authentication. If the variable is not set, it defaults to a trial license.

In [None]:
ez_init(access_key=os.getenv('EAZYML_ACCESS_KEY'))

## 2. Define Dataset Files and Outcome Variable

In [None]:
gdown.download_folder(id='1p7Udh2MjKyJPxI47FS89VowAz9ZEq_hG')

In [None]:
# Names of the files that will be used by EazyML APIs
train_file_path = os.path.join('data', "House Price Prediction - Train Data.xlsx")
test_file_path = os.path.join('data', "House Price Prediction - Test Data.xlsx")

# The column name for outcome of interest
outcome = "House_Price"

## 3. Dataset Information

The dataset used in this notebook is the **Housing Price Prediction Dataset**, which is a well-known dataset in machine learning and data science. It contains data about various house features and their corresponding sale prices. The goal is to predict the sale price of a house based on its attributes.

### Columns in the Dataset:
- **Square_Footage**: Total area of the house in square feet; larger homes typically have higher prices.
- **Num_Bedrooms**: Number of bedrooms in the house; more bedrooms usually increase the value.
- **Num_Bathrooms**: Number of bathrooms in the house; more bathrooms often correlate with higher prices.
- **Year_Built**: The year the house was built; newer homes may have higher prices due to modern features.
- **Lot_Size**: Size of the property in square feet; larger lots can increase the property's value.
- **Garage_Size**: Size of the garage (e.g., number of cars it can hold); larger garages may increase value.
- **Neighborhood_Quality**: Qualitative rating of the neighborhood; higher quality usually means higher prices.
- **House_Price**: The selling price of the house; this is the target variable for prediction models.

### 3.1 Display the Dataset

Below is a preview of the dataset:

In [None]:
# Load the dataset from the provided file
train = pd.read_excel(train_file_path)

# Display the first few rows of the dataset
ez_display_df(train.head())

## 4. EazyML Insights

### 4.1 Auto-derive Insights

#### 4.1.1 Build Insight Model

In [None]:
response = ez_insight(train_file_path, outcome, options={})

#### 4.1.2 Convert Response to DataFrame

In [None]:
insights_df = pd.DataFrame(response['insights']['data'], columns=response['insights']['columns'])

#### 4.1.3 Display Augmented Insights

In [None]:
ez_display_df(insights_df.head())

### 4.2 Validation of Insights

#### 4.2.1 Validating Insights

In [None]:
record_number = [3, 5]
options = {'record_number': record_number}
val_response = ez_validate(train_file_path, outcome, response['insights'], test_file_path, options=options)

#### 4.2.2 Convert Response to DataFrame

In [None]:
validate_df = pd.DataFrame(val_response['validations']['data'], columns=val_response['validations']['columns'])

#### 4.2.3 Display Validation Metrics

In [None]:
ez_display_df(validate_df.head())

#### 4.2.4 Display Filtered Data for Specific Record Numbers

In [None]:
for i in range(len(record_number)):
    print (val_response['validation_filter'][i]['Augmented Intelligence Insights'])
    filter_df = pd.DataFrame(val_response['validation_filter'][i]['filtered_data']['data'], columns=val_response[
                             'validation_filter'][i]['filtered_data']['columns']) 
    ez_display_df(filter_df.head())
    print ('\n')