# EazyML Counterfactual Template

## Define Imports

In [None]:
!pip install --upgrade eazyml-counterfactual
!pip install gdown python-dotenv

In [None]:
import os
import pandas as pd
import eazyml as ez
from eazyml_counterfactual import (
        ez_cf_inference,
        ez_init        
)
import gdown

from dotenv import load_dotenv
load_dotenv()

## 1. Initialize EazyML

The `ez_init` function uses the `EAZYML_ACCESS_KEY` environment variable for authentication. If the variable is not set, it defaults to a trial license.

In [None]:
ez_init(os.getenv('EAZYML_ACCESS_KEY'))

## 2. Define Dataset Files and Outcome Variable

In [None]:
gdown.download_folder(id='1WvIOaIvS7hTlYSkeojYhnBpAWd8HWzDt')

In [None]:
# Defining file paths for training and test datasets and specifying the outcome variable
train_file = os.path.join('data', "airline_train_data.csv")
test_file = os.path.join('data', "airline_test_data.csv")
outcome = "satisfaction"

# Loading the training dataset and the test dataset
train_df = pd.read_csv(train_file)
test_df = pd.read_csv(test_file)

## 3. Dataset Information

The dataset used in this notebook is the **Airline Passenger Satisfaction Dataset**, which contains data on passenger satisfaction with airlines. It includes various features such as the type of flight, passenger demographics, and overall satisfaction with the service.

You can find more details and download the dataset from Kaggle using the following link:

[Kaggle Airline Passenger Satisfaction Dataset](https://www.kaggle.com/datasets/teejmahal20/airline-passenger-satisfaction)

### Columns in the Dataset:
- **id**: Unique identifier for each passenger.
- **Gender**: Gender of the passenger (Male/Female).
- **Customer Type**: Type of customer (Loyal Customer/Disloyal Customer).
- **Age**: Age of the passenger.
- **Type of Travel**: Purpose of travel (Business/Personal).
- **Class**: Flight class (Business/Eco/Eco Plus).
- **Flight Distance**: Distance traveled in kilometers.
- **Inflight wifi service**: Rating of inflight Wi-Fi service (1-5).
- **Departure/Arrival time convenient**: Satisfaction level of Departure/Arrival time convenient (1-5).
- **Ease of Online booking**: Satisfaction level of online booking (1-5).
- **Gate location**: Satisfaction level of Gate location (1-5).
- **Food and drink**: Rating of food and drink (1-5).
- **Online boarding**: Satisfaction level of online boarding (1-5).
- **Seat comfort**: Rating of seat comfort (1-5).
- **Inflight entertainment**: Rating of inflight entertainment (1-5).
- **On-board service**: Satisfaction level of On-board service (1-5).
- **Leg room service**: Satisfaction level of Leg room service (1-5).
- **Baggage handling**: Satisfaction level of baggage handling (1-5).
- **Checkin service**: Satisfaction level of Check-in service (1-5).
- **Inflight service**: Satisfaction level of inflight service (1-5).
- **Cleanliness**: Rating of cleanliness (1-5).
- **Departure Delay in Minutes**: Delay in minutes before departure.
- **Arrival Delay in Minutes**: Delay in minutes after arrival.
- **satisfaction**: Overall satisfaction of the passenger (Satisfied/Neutral or Dissatisfied).

### 3.1 Display the Dataset

Below is a preview of the dataset:

In [None]:
# Display the first few rows of the training DataFrame for inspection
ez.ez_display_df(train_df)

## 4. EazyML Modeling

### 4.1 Building model using the EazyML Modeling API

In [None]:
# Clean and preprocess training data
train_df.drop(columns=['id'], inplace=True)  # Remove unnecessary columns
train_df.dropna(inplace=True)  # Remove missing values
train_df.reset_index(drop=True, inplace=True)  # Reset index


# Define model parameters
model_options = {
    "model_type": "predictive",
}

# Build predictive model using EazyML API
build_model_response = ez.ez_build_model(train_df, outcome=outcome, options=model_options)

### 4.2 Feature Importance

In [None]:
ez.ez_display_df(build_model_response['global_importance'])

### 4.3 Model Importance

In [None]:
ez.ez_display_df(build_model_response['model_performance'])

### 4.4 Predict Using the Trained EazyML Model

In [None]:
# Extract model information from the response dictionary
model_info = build_model_response["model_info"]

# Read test data from a CSV file into a pandas DataFrame
test_data = pd.read_csv(test_file)

# Make predictions using the model, requesting confidence scores and class probabilities
predicted_resp = ez.ez_predict(test_data, model_info, options={"confidence_score": True, "class_probability": True})

# Check if the prediction was successful
if predicted_resp['success']:
    print("Prediction successful")  
    predicted_df = predicted_resp['pred_df']  # Extract the predicted DataFrame
    ez.ez_display_df(predicted_df.head())  # Display the first few rows of the predicted DataFrame
else:
    print("Prediction failed")  
    print(predicted_resp['message'])  

## 5. EazyML Counterfactual Inference

### 5.1 Define Counterfactual Inference Configuration

In [None]:
# Define the selected features for prediction
selected_features = ['Gender', 'Customer Type', 'Age', 'Type of Travel', 'Class', 'Flight Distance', 
                     'Inflight wifi service', 'Departure/Arrival time convenient', 'Ease of Online booking', 
                     'Gate location', 'Food and drink', 'Online boarding', 'Seat comfort', 
                     'Inflight entertainment', 'On-board service', 'Leg room service', 
                     'Baggage handling', 'Checkin service', 'Inflight service', 'Cleanliness', 
                     'Departure Delay in Minutes', 'Arrival Delay in Minutes']

# Define variant (modifiable) features
invariants = ['Gender', 'Customer Type', 'Age', 'Type of Travel', 'Class', 'Flight Distance']
variants = [feature for feature in selected_features if feature not in invariants]

# Define configurable parameters for counterfactual inference
cf_options = {   
    "variants": variants,  
    "outcome_ordinality": "SATISFIED",  # Desired outcome 
    "train_data": train_file  
}

### 5.2 Perform Counterfactual Inference

In [None]:
# Specify the index of the test record for counterfactual inference
test_index_no = 0  
test_data = predicted_df.loc[[test_index_no]]  

# Perform Inference 
result, optimal_transition_df = ez_cf_inference(
    test_data=test_data,  
    outcome=outcome,  
    selected_features=selected_features,  
    model_info=model_info,  
    options=cf_options  
)

### 5.3 Display Results

In [None]:
# Summarizes whether an optimal transition was found and the improvement in outcome probability.
ez.ez_display_json(result)

In [None]:
# Details the feature changes needed to achieve the optimal outcome.
ez.ez_display_df(optimal_transition_df)