## Setting Up and Importing Libraries
In this step, we'll import the necessary libraries and modules. We'll also ensure the correct path is set to access our custom modules.

In [1]:
# Import necessary libraries
import pandas as pd
import os
import sys
import json


In [2]:
# Going back to the root directory to make sure that we execute from trustCE folder
os.chdir(os.path.join(os.getcwd(), '..'))
print(os.getcwd())

/home/rita/TRUST_AI/trustframework/trustCE


In [3]:
from trustce.cfsearch import CFsearch
from trustce.dataset import Dataset
from trustce.cemodels.base_model import BaseModel
from trustce.cemodels.sklearn_model import SklearnModel
from trustce.ceinstance.instance_sampler import CEInstanceSampler
from trustce.config import Config
from trustce.transformer import Transformer
from trustce.ceinstance.instance_factory import InstanceFactory
from trustce import load_datasets

## Loading Configuration
Here, we'll load our configuration files which dictate various parameters for our counterfactual search. It includes dataset details, feature management, and other related configurations.

In [4]:
# Load configuration
config_file_path = "config/conf_homeloan_coherence.yaml"
config = Config(config_file_path)

with open("config/constraints_homeloan_ch.json", 'r') as file:
    constraints = json.load(file)

print("Configuration Loaded:")
print(config)

Configuration Loaded:
<trustce.config.Config object at 0x7f4e673814e0>


## Preparing Dataset and Model
In this section, we initialize our dataset, model, and the required transformers. We'll also define a sample instance for which we wish to find the counterfactuals.

In [5]:
# Load the dataset and set up the necessary objects
load_datasets.download("homeloan")

In [6]:
data = Dataset(config.get_config_value("dataset"), "Loan_Status")
normalization_transformer = Transformer(data, config)
instance_factory = InstanceFactory(data)
sampler = CEInstanceSampler(config, normalization_transformer, instance_factory)

model = SklearnModel(config.get_config_value("model"))

Features verified
Continious features: ['ApplicantIncome', 'CoapplicantIncome', 'LoanAmount', 'Loan_Amount_Term']
Categorical features: ['Gender', 'Married', 'Dependents', 'Education', 'Self_Employed', 'Property_Area', 'Credit_History']
Dataset preprocessed


  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)


 MAD for feature %s is 0, so replacing it with 1.0 to avoid error. Loan_Amount_Term
Feature: Married
Range: [0, 1]
Feature: Property_Area
Range: [0, 2]
Feature: ApplicantIncome
Range: [-0.8484208485011338, 12.130392628461765]
Feature: Dependents
Range: [0, 3]
Feature: Self_Employed
Range: [0, 1]
Feature: Loan_Amount_Term
Range: [-5.044846090672822, 2.106513522957335]
Feature: Credit_History
Range: [0, 1]
Feature: Education
Range: [0, 1]
Feature: CoapplicantIncome
Range: [-0.5480568542195732, 13.372167288446013]
Feature: LoanAmount
Range: [-1.5999485916282457, 6.4030605082645256]
Feature: Gender
Range: [0, 1]
Constraint Type: immutable
Sanity check for model
Model input shape is  11
Sanity check prediciton  [1]


https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations


## Finding Counterfactuals
With everything set up, we'll now search for counterfactuals for our sample instance using the CFsearch object.

In [7]:
# Set the target instance path
target_instance_json = "input_instance/instance.json"

In [8]:
# Create a CFsearch object
config_for_cfsearch = config.get_config_value("cfsearch")
search = CFsearch(normalization_transformer, model, sampler, config,
                  optimizer_name=config_for_cfsearch["optimizer"], 
                  distance_continuous=config_for_cfsearch["continuous_distance"], 
                  distance_categorical=config_for_cfsearch["categorical_distance"], 
                  loss_type=config_for_cfsearch["loss_type"], 
                  coherence=config_for_cfsearch["coherence"],
                  objective_function_weights=config_for_cfsearch["objective_function_weights"])

# Load target instance and find counterfactuals
with open(target_instance_json, 'r') as file:
    target_instance_json_content = file.read()

target_instance = instance_factory.create_instance_from_json(target_instance_json_content)

In [9]:
counterfactuals = search.find_counterfactuals(target_instance, number_cf=1, desired_class="opposite", maxiterations=50)

Label encoder:  Male  for feature  Gender  is transformed into  1
Label encoder:  Yes  for feature  Married  is transformed into  1
Label encoder:  2  for feature  Dependents  is transformed into  2
Label encoder:  Graduate  for feature  Education  is transformed into  0
Label encoder:  No  for feature  Self_Employed  is transformed into  0
Label encoder:  Urban  for feature  Property_Area  is transformed into  2
Label encoder:  1.0  for feature  Credit_History  is transformed into  1
Valid counterfactuals were found:  {'Gender': 1, 'Married': 1, 'Dependents': 2, 'Education': 0, 'Self_Employed': 0, 'Property_Area': 2, 'Credit_History': 1, 'ApplicantIncome': 7.619454021865378, 'CoapplicantIncome': 14.15611744060467, 'LoanAmount': 0.05610433455803467, 'Loan_Amount_Term': 0.5913834794718836}


## Evaluation and Visualization
Once the counterfactuals are generated, it's crucial to evaluate and visualize them. This helps in understanding how the counterfactuals differ from the original instance and assessing their quality.

In [10]:
# Evaluate and visualize the counterfactuals
search.evaluate_counterfactuals(target_instance, counterfactuals)

# Display the counterfactuals and original instance in the notebook
display_df = search.visualize_as_dataframe(target_instance, counterfactuals)

Feature ApplicantIncome changed its value from -0.13149591358318322 to 7.619454021865378
probability_sign: [ 1. -1.], type: <class 'numpy.ndarray'>
required_label: 0, type: <class 'numpy.int64'>
Modified required_label: 0, type: <class 'numpy.int64'>
Feature CoapplicantIncome changed its value from -0.5480568542195732 to 14.15611744060467
probability_sign: [-1.  1.], type: <class 'numpy.ndarray'>
required_label: 0, type: <class 'numpy.int64'>
Modified required_label: 0, type: <class 'numpy.int64'>
Feature LoanAmount changed its value from -0.15222625083722335 to 0.05610433455803467
probability_sign: [ 1. -1.], type: <class 'numpy.ndarray'>
required_label: 0, type: <class 'numpy.int64'>
Modified required_label: 0, type: <class 'numpy.int64'>
Feature Loan_Amount_Term changed its value from 0.2728315707444741 to 0.5913834794718836
probability_sign: [-1.  1.], type: <class 'numpy.ndarray'>
required_label: 0, type: <class 'numpy.int64'>
Modified required_label: 0, type: <class 'numpy.int64'

Unnamed: 0,Gender,Married,Dependents,Education,Self_Employed,Property_Area,Credit_History,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term
0,Male,Yes,2,Graduate,No,Urban,1.0,4616.0,2.273737e-13,134.0,360.0



Counterfactual set (new outcome: [0])


Unnamed: 0,Gender,Married,Dependents,Education,Self_Employed,Property_Area,Credit_History,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term
0,-,-,-,-,-,-,-,52899.6357416902,44013.57507344848,151.98778843198517,380.8467062683135


## Storing the Results
For reproducibility and further analysis, we'll store the counterfactuals and their evaluations in designated folders.

In [11]:
# Store results
search.store_counterfactuals(config.get_config_value("output_folder"), "homeloan_first_test")
search.store_evaluations(config.get_config_value("output_folder"), "homeloan_first_eval")

Store counterfactuals to  results/homeloan_first_test_0.json
Store counterfactuals evaluation to  results/homeloan_first_eval_eval_0.json
