## GLANCE: Global Actions In A Nutshell for Counterfactual Explainability

**GLANCE** is a versatile and adaptive framework for generating *global counterfactual explanations*. <br>
These explanations are expressed as actions that offer recourse to large population subgroups.<br> The framework aims to provide explanations and insights, ensuring that the actions benefit as many individuals as possible.

## Preliminaries

### Import Dependencies 
As usual in python, the first step is to import all necessary packages.



In [1]:
from xgboost import XGBClassifier
import pandas as pd
from glance.glance.glance import GLANCE
from utils import load_models, preprocess_datasets

## Load Data and Model to be used for explanations
This will serve as the demonstrative model, which we will then treat as a black box and apply our algorithm.
Of course, any model can be used in its place.



In [2]:
dataset = "compas"
model_name = "xgb"

train_dataset, data, X_train, y_train, X_test, y_test, affected, _unaffected, model, feat_to_vary, target_name = (
    preprocess_datasets(dataset, load_models(dataset, model_name), model_name)
)

  If you are loading a serialized model (like pickle in Python, RDS in R) generated by
  older XGBoost, please export the model by calling `Booster.save_model` from that version
  first, then load it back in current version. See:

    https://xgboost.readthedocs.io/en/latest/tutorials/saving_model.html

  for more details about differences between saving model and serializing.

Accuracy: 0.68


## GLANCE 
GLANCE is a clustering-based algorithm designed to generate global counterfactual explanations. <br>It starts by forming initial clusters and gradually merges them until the number of clusters matches the user-defined final_clusters parameter.<br> From each of these final clusters, the best action is selected, and together, these actions form the global explanation.

GLANCE framework is loaded with:
 - the model to be explained
 - number of initial clusters, 
 - number of final clusters, from each of which the best action is extracted
 - number of local counterfactuals, that the Local Counterfactual Method generates for each centroid of the initial clusters

 GLANCE algorithm allows the users to specify the number of global actions generated and serves as a tool to explain and debug ML models.

In [3]:
global_method = GLANCE(
    model,
    initial_clusters=100,
    final_clusters=3,
    num_local_counterfactuals=10,
)
global_method.fit(
    data.drop(columns=["Status"]),
    data["Status"],
    train_dataset,
    feat_to_vary,
)

<glance.glance.glance.GLANCE at 0x3035df970>

In [4]:
clusters, clusters_res = global_method.explain_group(affected)

100%|██████████| 1/1 [00:00<00:00, 29.29it/s]
100%|██████████| 1/1 [00:00<00:00, 33.74it/s]
100%|██████████| 1/1 [00:00<00:00, 33.40it/s]
100%|██████████| 1/1 [00:00<00:00, 34.51it/s]
100%|██████████| 1/1 [00:00<00:00, 32.14it/s]
100%|██████████| 1/1 [00:00<00:00, 32.47it/s]
100%|██████████| 1/1 [00:00<00:00, 19.43it/s]
100%|██████████| 1/1 [00:00<00:00, 34.75it/s]
100%|██████████| 1/1 [00:00<00:00, 34.26it/s]
100%|██████████| 1/1 [00:00<00:00, 35.49it/s]
100%|██████████| 1/1 [00:00<00:00, 35.24it/s]
100%|██████████| 1/1 [00:00<00:00, 33.89it/s]
100%|██████████| 1/1 [00:00<00:00, 35.31it/s]
100%|██████████| 1/1 [00:00<00:00, 33.17it/s]
100%|██████████| 1/1 [00:00<00:00, 34.22it/s]
100%|██████████| 1/1 [00:00<00:00, 30.66it/s]
100%|██████████| 1/1 [00:00<00:00, 33.08it/s]
100%|██████████| 1/1 [00:00<00:00, 32.39it/s]
100%|██████████| 1/1 [00:00<00:00, 33.12it/s]
100%|██████████| 1/1 [00:00<00:00, 30.88it/s]
100%|██████████| 1/1 [00:00<00:00, 34.48it/s]
100%|██████████| 1/1 [00:00<00:00,

[1mAction 1 
[0m[1mAge_Cat[0m = [31mGreater than 45[39m 

[1mEffectiveness:[0m [32m99.42%[39m	[1mCost:[0m [35m1.00[39m


[1mAction 2 
[0m[1mRace[0m = [31mHispanic[39m 
[1mPriors_Count[0m [31m-24.2[39m 

[1mEffectiveness:[0m [32m98.53%[39m	[1mCost:[0m [35m7.28[39m


[1mAction 3 
[0m[1mRace[0m = [31mAsian[39m 
[1mPriors_Count[0m [31m-7.5[39m 

[1mEffectiveness:[0m [32m100.00%[39m	[1mCost:[0m [35m2.87[39m


[1mTOTAL EFFECTIVENESS:[0m [32m99.60%[39m
[1mTOTAL COST:[0m [35m1.94[39m





## GLANCE Output
GLANCE generates a set of final actions, with a focus on their overall impact when applied to the entire affected population. While each action is initially associated with a specific cluster, the key metrics we prioritize are the *Total Effectiveness* and *Total Cost* across the whole population.

- *Total Effectiveness* is the percentage of individuals that achieve the favorable outcome, if each one of the final actions is applied to the whole affected population.<br>
- *Total Cost* is calculated as the mean recourse cost of the whole set of final actions over the entire population.

Additionally, for each generated action the suggested changes are also reported, as well as the *effectiveness* and *cost* they achieve on the population of the cluster they were extracted from. More specifically:

- *Effectiveness*, for each cluster-action pair ($C$, $a$), represents the percentage of individuals in $C$ who get the favorable outcome when the action $a$ is applied.
<br>
- *Cost*, for each cluster-action pair ($C$, $a$), is the mean recourse cost computed when the action $a$ is applied to the individuals of cluster $C$.
<br>

## GLANCE Modularity
Our framework is highly **modular**, allowing users to customize various aspects of it. <br>

Specifically:
- **Choice of Local Counterfactual Methods**: Users can select from a variety of local counterfactual methods to generate candidate counterfactual explanations, such as:
    - **NearestNeighbors**: When queried to provide *k* counterfactuals for an affected individual, it retrieves the k nearest neighbors from the set of unaffected instances based on their proximity to the affected individual.
    - **Random Sampling**: To find counterfactuals for an affected instance, this method iteratively modifies its features one at a time. The process begins by randomly altering one feature at a time, generating multiple new candidate instances

- **Strategy for Selecting Actions**: Additionally, users can choose different strategies for selecting the best actions from the generated counterfactuals. This enables fine-tuning of the process, allowing for the optimal balance between effectiveness and recourse cost, based on user-defined preferences.
    - **max-eff** : Selects actions based on maximizing the effectiveness.
    - **low-cost** : Selects the action with the lowest cost that flips a sufficient number of instances.
    - **mean-act** : Selects the mean action from a set of candidate actions.

In order to use them, the user should provide the **fit** method with the **cf_generator** and **cluster_action_choice_algo** variables and choose the methods of his/hers liking.

In [5]:
global_method = GLANCE(
    model, initial_clusters=100, final_clusters=3, num_local_counterfactuals=10
)
global_method.fit(
    data.drop(columns=["Status"]),
    data["Status"],
    train_dataset,
    feat_to_vary,
    cf_generator="NearestNeighbors",
)

<glance.glance.glance.GLANCE at 0x3035ec190>

In [6]:
clusters, clusters_res = global_method.explain_group(affected)

100%|██████████| 350/350 [00:00<00:00, 603.94it/s]
100%|██████████| 530/530 [00:00<00:00, 637.82it/s]
100%|██████████| 120/120 [00:00<00:00, 597.32it/s]

[1mAction 1 
[0m[1mRace[0m = [31mHispanic[39m 
[1mPriors_Count[0m [31m-25.0[39m 
[1mTime_Served[0m +[31m20.0[39m 

[1mEffectiveness:[0m [32m100.00%[39m	[1mCost:[0m [35m7.76[39m


[1mAction 2 
[0m[1mAge_Cat[0m = [31mGreater than 45[39m 
[1mRace[0m = [31mHispanic[39m 
[1mPriors_Count[0m [31m-13.0[39m 
[1mTime_Served[0m [31m-1.0[39m 

[1mEffectiveness:[0m [32m100.00%[39m	[1mCost:[0m [35m5.37[39m


[1mAction 3 
[0m[1mAge_Cat[0m = [31mGreater than 45[39m 
[1mPriors_Count[0m [31m-3.0[39m 

[1mEffectiveness:[0m [32m100.00%[39m	[1mCost:[0m [35m1.75[39m


[1mTOTAL EFFECTIVENESS:[0m [32m100.00%[39m
[1mTOTAL COST:[0m [35m2.48[39m



