## Setting Up and Importing Libraries
In this step, we'll import the necessary libraries and modules. We'll also ensure the correct path is set to access our custom modules.

In [1]:
# Import necessary libraries
import pandas as pd
import os
import sys
import json


In [2]:
# Make sure that the current working directory is the parent directory of the project
os.chdir(os.path.join(os.getcwd(), '..'))
print(os.getcwd())

/home/rita/TRUST_AI/trustframework/codice


In [3]:
from codice.cfsearch import CFsearch
from codice.dataset import Dataset
from codice.explainable_model import ExplainableModel
from codice.ceinstance.instance_sampler import CEInstanceSampler
from codice.config import Config
from codice.transformer import Transformer
from codice.ceinstance.instance_factory import InstanceFactory
from codice import load_datasets

## Loading Configuration
Here, we'll load our configuration files which dictate various parameters for our counterfactual search. It includes dataset details, feature management, and other related configurations.

In [4]:
# Load configuration
config_file_path = "config/conf_diabetes.yaml"
config = Config(config_file_path)

with open("config/constraints_conf_diabetes.json", 'r') as file:
    constraints = json.load(file)

print("Configuration Loaded:")
print(config)

Configuration Loaded:
<codice.config.Config object at 0x7f42dd4be2c0>


## Preparing Dataset and Model
In this section, we initialize our dataset, model, and the required transformers. We'll also define a sample instance for which we wish to find the counterfactuals.

In [5]:
# Set the target instance path
target_instance_json = "input_instance/instance_diabetes.json"

# Load the dataset and set up the necessary objects
load_datasets.download("diabetes")

In [6]:
data = Dataset(config.get_config_value("dataset"), "Class variable")
normalization_transformer = Transformer(data, config)
instance_factory = InstanceFactory(data)
sampler = CEInstanceSampler(config, normalization_transformer, instance_factory)

model = ExplainableModel(config.get_config_value("model"))

Features verified
Continious features: ['Number of times pregnant', 'Plasma glucose concentration a 2 hours in an oral glucose tolerance test', 'Diastolic blood pressure (mm Hg)', 'Triceps skin fold thickness (mm)', '2-Hour serum insulin (mu U/ml)', 'Body mass index (weight in kg/(height in m)^2)', 'Diabetes pedigree function', 'Age (years)']
Categorical features: []
Dataset preprocessed
Feature: Triceps skin fold thickness (mm)
Range: [-1.2873732599334597, 4.918660451660556]
Feature: Body mass index (weight in kg/(height in m)^2)
Range: [-4.057829473903507, 4.452905629565911]
Feature: Diastolic blood pressure (mm Hg)
Range: [-3.5702705725858896, 2.732747375693038]
Feature: Age (years)
Range: [-1.0408711235255357, 4.061069241548041]
Feature: Diabetes pedigree function
Range: [-1.1887784755437325, 5.879733072364472]
Feature: 2-Hour serum insulin (mu U/ml)
Range: [-0.6924393247241302, 6.648506691892384]
Feature: Plasma glucose concentration a 2 hours in an oral glucose tolerance test
Ran



## Finding Counterfactuals
With everything set up, we'll now search for counterfactuals for our sample instance using the CFsearch object.

In [7]:
# Create a CFsearch object
config_for_cfsearch = config.get_config_value("cfsearch")
search = CFsearch(normalization_transformer, model, sampler, config,
                  optimizer_name=config_for_cfsearch["optimizer"], 
                  distance_continuous=config_for_cfsearch["continuous_distance"], 
                  distance_categorical=config_for_cfsearch["categorical_distance"], 
                  loss_type=config_for_cfsearch["loss_type"], 
                  coherence=config_for_cfsearch["coherence"],
                  objective_function_weights=config_for_cfsearch["objective_function_weights"])

# Load target instance and find counterfactuals
with open(target_instance_json, 'r') as file:
    target_instance_json_content = file.read()

target_instance = instance_factory.create_instance_from_json(target_instance_json_content)

In [8]:
counterfactuals = search.find_counterfactuals(target_instance, number_cf=1, desired_class="opposite", maxiterations=50)

## Evaluation and Visualization
Once the counterfactuals are generated, it's crucial to evaluate and visualize them. This helps in understanding how the counterfactuals differ from the original instance and assessing their quality.

In [9]:
# Evaluate and visualize the counterfactuals
search.evaluate_counterfactuals(target_instance, counterfactuals)

# Display the counterfactuals and original instance in the notebook
display_df = search.visualize_as_dataframe(target_instance, counterfactuals)
display(display_df)

Feature Number of times pregnant changed its value from -1.1411078811017759 to -1.5250032433334426
Feature Plasma glucose concentration a 2 hours in an oral glucose tolerance test changed its value from 0.5037269282016452 to -0.01829821323868308
Feature Diastolic blood pressure (mm Hg) changed its value from -1.5037073108550938 to -0.6374735357298377
Feature Triceps skin fold thickness (mm) changed its value from 0.9066790623472528 to 2.0482357783581544
Feature 2-Hour serum insulin (mu U/ml) changed its value from 0.7653371892139008 to 1.033344751472744
Feature Body mass index (weight in kg/(height in m)^2) changed its value from 1.4088275001580721 to -1.81636552293408
Feature Diabetes pedigree function changed its value from 5.481337032943515 to -0.05373047482266624
Feature Age (years) changed its value from -0.020483050510820364 to 1.4243972663735833
CF instance:  {'Number of times pregnant': -1.5250032433334426, 'Plasma glucose concentration a 2 hours in an oral glucose tolerance te

Unnamed: 0,Number of times pregnant,Plasma glucose concentration a 2 hours in an oral glucose tolerance test,Diastolic blood pressure (mm Hg),Triceps skin fold thickness (mm),2-Hour serum insulin (mu U/ml),Body mass index (weight in kg/(height in m)^2),Diabetes pedigree function,Age (years)
0,0.0,137.0,40.0,35.0,168.0,43.1,2.288,33.0



Counterfactual set (new outcome: 0)


Unnamed: 0,Number of times pregnant,Plasma glucose concentration a 2 hours in an oral glucose tolerance test,Diastolic blood pressure (mm Hg),Triceps skin fold thickness (mm),2-Hour serum insulin (mu U/ml),Body mass index (weight in kg/(height in m)^2),Diabetes pedigree function,Age (years)
0,-1.2935653909476603,120.3094894644664,56.766653916024126,53.210361099706574,198.88626413513447,17.67206114178513,0.45407385935097,49.99212707512977


None

## Storing the Results
For reproducibility and further analysis, we'll store the counterfactuals and their evaluations in designated folders.

In [10]:
# Store results
search.store_counterfactuals(config.get_config_value("output_folder"), "diabetes_first_test")
search.store_evaluations(config.get_config_value("output_folder"), "diabetes_first_test")