In the following notebook we will show how you can use the CARLA library.

# How to use CARLA


In [None]:
from IPython.display import display

import warnings
warnings.filterwarnings('ignore')

## Data

Before we can do anything else we need some data.
You could import one of the datasets in our [OnlineCatalog](https://carla-counterfactual-and-recourse-library.readthedocs.io/en/latest/data.html#module-data.catalog.online_catalog),
however maybe you want to use your own data instead. This can easily be done by using the [CsvCatalog](https://carla-counterfactual-and-recourse-library.readthedocs.io/en/latest/data.html#module-data.catalog.csv_catalog).

For the CsvCatalog there are 5 attributes.
The *file_path* should be the path of the csv file you want to use.
Then we have two different types of features, *continous* and *categorical*, of which some can be *immutables*.
Finally the *target* attribute is the column which contains the targets/labels.


In [2]:
from carla.data.catalog import CsvCatalog

continuous = ["age", "fnlwgt", "education-num", "capital-gain", "hours-per-week", "capital-loss"]
categorical = ["marital-status", "native-country", "occupation", "race", "relationship", "sex", "workclass"]
immutable = ["age", "sex"]

dataset = CsvCatalog(file_path="adult.csv",
                     continuous=continuous,
                     categorical=categorical,
                     immutables=immutable,
                     target='income')



display(dataset.df)

Unnamed: 0,age,fnlwgt,education-num,capital-gain,capital-loss,...,occupation_Other,race_White,relationship_Non-Husband,sex_Male,workclass_Private
0,0.301370,0.044131,0.800000,0.021740,0.0,...,0.0,1.0,1.0,1.0,0.0
1,0.452055,0.048052,0.800000,0.000000,0.0,...,0.0,1.0,0.0,1.0,0.0
2,0.287671,0.137581,0.533333,0.000000,0.0,...,1.0,1.0,1.0,1.0,1.0
3,0.493151,0.150486,0.400000,0.000000,0.0,...,1.0,0.0,0.0,1.0,1.0
4,0.150685,0.220635,0.800000,0.000000,0.0,...,0.0,0.0,1.0,0.0,1.0
...,...,...,...,...,...,...,...,...,...,...,...
48827,0.301370,0.137428,0.800000,0.000000,0.0,...,0.0,1.0,1.0,0.0,1.0
48828,0.643836,0.209130,0.533333,0.000000,0.0,...,0.0,0.0,1.0,1.0,1.0
48829,0.287671,0.245379,0.800000,0.000000,0.0,...,0.0,1.0,0.0,1.0,1.0
48830,0.369863,0.048444,0.800000,0.054551,0.0,...,0.0,0.0,1.0,1.0,1.0


## Model

Now that we have the data loaded we also need a classification model.
You could define your own [model](https://carla-counterfactual-and-recourse-library.readthedocs.io/en/latest/examples.html#black-box-model),
however here we will show how you can train one of our [catalog](https://carla-counterfactual-and-recourse-library.readthedocs.io/en/latest/mlmodel.html#module-models.catalog.catalog) models.
Note that depending on your data you might need to tweak the training hyperparameters.

For example for the ann used here we need to define the *learning rate*, *number of epochs*, *batch size*, and the *sizes of the hidden layers*.
Then after defining the model using *MLModelCatalog*, just call the *train* method with those parameters and you are good to go!

In [3]:
from carla.models.catalog import MLModelCatalog

In [4]:
training_params = {"lr": 0.002, "epochs": 10, "batch_size": 1024, "hidden_size": [18, 9, 3]}

ml_model = MLModelCatalog(
    dataset, model_type="ann", load_online=False, backend="pytorch"
)
ml_model.train(
    learning_rate=training_params["lr"],
    epochs=training_params["epochs"],
    batch_size=training_params["batch_size"],
    hidden_size=training_params["hidden_size"]
)

Loaded model from /home/johan/carla/models/custom/ann_layers_18_9_3.pt
test accuracy for model: 0.8352719528178244


## Recourse

Now that we have both the data, and a model we can start using CARLA to generate counterfactuals.
You can pick a [recourse method](https://carla-counterfactual-and-recourse-library.readthedocs.io/en/latest/recourse.html) from the catalog, or implement one yourself.
In the following example we are getting negative labeled samples for which we want counterfactuals.

In [5]:
from carla.models.negative_instances import predict_negative_instances
import carla.recourse_methods.catalog as recourse_catalog

In [6]:
factuals = predict_negative_instances(ml_model, dataset.df)
test_factual = factuals.iloc[:5]

display(test_factual)

Unnamed: 0,age,fnlwgt,education-num,capital-gain,capital-loss,...,occupation_Other,race_White,relationship_Non-Husband,sex_Male,workclass_Private
0,0.30137,0.044131,0.8,0.02174,0.0,...,0.0,1.0,1.0,1.0,0.0
1,0.452055,0.048052,0.8,0.0,0.0,...,0.0,1.0,0.0,1.0,0.0
2,0.287671,0.137581,0.533333,0.0,0.0,...,1.0,1.0,1.0,1.0,1.0
3,0.493151,0.150486,0.4,0.0,0.0,...,1.0,0.0,0.0,1.0,1.0
4,0.150685,0.220635,0.8,0.0,0.0,...,0.0,0.0,1.0,0.0,1.0


### Wachter (gradient method)

In [7]:
hyperparams = {"loss_type": "BCE", "binary_cat_features": False}
recourse_method = recourse_catalog.Wachter(ml_model, hyperparams)
df_cfs = recourse_method.get_counterfactuals(test_factual)

display(df_cfs)

[INFO] Counterfactual Explanation Found [wachter.py wachter_recourse]
[INFO] Counterfactual Explanation Found [wachter.py wachter_recourse]
[INFO] Counterfactual Explanation Found [wachter.py wachter_recourse]
[INFO] Counterfactual Explanation Found [wachter.py wachter_recourse]
[INFO] Counterfactual Explanation Found [wachter.py wachter_recourse]


Unnamed: 0,age,capital-gain,capital-loss,education-num,fnlwgt,...,race_White,relationship_Non-Husband,sex_Male,workclass_Private,income
0,0.367432,0.089006,0.066957,0.867384,0.112537,...,1.0,1.0,0.0,-0.066813,1.0
1,0.481818,0.029809,0.029803,0.829806,0.077834,...,1.0,0.0,1.0,0.026993,1.0
2,0.443121,0.155943,0.155183,0.689581,0.18522,...,0.0,1.0,0.0,0.847155,1.0
3,0.557433,0.066598,0.06698,0.467272,0.217566,...,0.0,0.0,1.0,0.935313,1.0
4,0.293216,0.14294,0.141745,0.943278,0.328395,...,0.0,1.0,0.0,0.860906,1.0


### CCHVAE (manifold method)

In [8]:
hyperparams = {
    "data_name": dataset.name,
    "n_search_samples": 100,
    "p_norm": 1,
    "step": 0.1,
    "max_iter": 1000,
    "clamp": True,
    "binary_cat_features": False,
    "vae_params": {
        "layers": [len(ml_model.feature_input_order), 512, 256, 8],
        "train": True,
        "lambda_reg": 1e-6,
        "epochs": 5,
        "lr": 1e-3,
        "batch_size": 32,
    },
}

cchvae = recourse_catalog.CCHVAE(ml_model, hyperparams)
df_cfs = cchvae.get_counterfactuals(test_factual)

display(df_cfs)

[INFO] Start training of Variational Autoencoder... [models.py fit]
[INFO] [Epoch: 0/5] [objective: 0.381] [models.py fit]
[INFO] [ELBO train: 0.38] [models.py fit]
[INFO] [ELBO train: 0.14] [models.py fit]
[INFO] [ELBO train: 0.12] [models.py fit]
[INFO] [ELBO train: 0.12] [models.py fit]
[INFO] [ELBO train: 0.12] [models.py fit]
[INFO] ... finished training of Variational Autoencoder. [models.py fit]


Unnamed: 0,age,capital-gain,capital-loss,education-num,fnlwgt,...,race_White,relationship_Non-Husband,sex_Male,workclass_Private,income
0,0.296436,0.036346,0.039863,0.611104,0.119577,...,1.0,0.0,1.0,0.735209,1.0
1,0.29644,0.036346,0.039861,0.611104,0.119578,...,1.0,0.0,1.0,0.735207,1.0
2,0.296438,0.036347,0.039863,0.611091,0.119577,...,1.0,0.0,1.0,0.735207,1.0
3,0.296442,0.036346,0.039863,0.611091,0.119577,...,1.0,0.0,1.0,0.735208,1.0
4,0.296436,0.036347,0.039862,0.611106,0.119577,...,1.0,0.0,1.0,0.735207,1.0


### FOCUS (tree method)

For tree methods we need to use a tree model.

In [9]:
from carla.recourse_methods.catalog.focus.tree_model import ForestModel, XGBoostModel
ml_model = XGBoostModel(dataset)

factuals = predict_negative_instances(ml_model, dataset.df)
test_factual = factuals.iloc[:5]

display(test_factual)

[0]	validation_0-logloss:0.58430	validation_1-logloss:0.58264
[1]	validation_0-logloss:0.52405	validation_1-logloss:0.52231
[2]	validation_0-logloss:0.48641	validation_1-logloss:0.48288
[3]	validation_0-logloss:0.46130	validation_1-logloss:0.45807
[4]	validation_0-logloss:0.44276	validation_1-logloss:0.43849


Unnamed: 0,age,fnlwgt,education-num,capital-gain,capital-loss,...,occupation_Other,race_White,relationship_Non-Husband,sex_Male,workclass_Private
0,0.30137,0.044131,0.8,0.02174,0.0,...,0.0,1.0,1.0,1.0,0.0
1,0.452055,0.048052,0.8,0.0,0.0,...,0.0,1.0,0.0,1.0,0.0
2,0.287671,0.137581,0.533333,0.0,0.0,...,1.0,1.0,1.0,1.0,1.0
3,0.493151,0.150486,0.4,0.0,0.0,...,1.0,0.0,0.0,1.0,1.0
4,0.150685,0.220635,0.8,0.0,0.0,...,0.0,0.0,1.0,0.0,1.0


In [11]:
hyperparams = {
    "optimizer": "adam",
    "lr": 0.001,
    "n_class": 2,
    "n_iter": 1000,
    "sigma": 1.0,
    "temperature": 1.0,
    "distance_weight": 0.01,
    "distance_func": "l1",
}

focus = recourse_catalog.FOCUS(ml_model, hyperparams)
df_cfs = focus.get_counterfactuals(test_factual)
display(df_cfs)

 [deprecation_wrapper.py __getattr__]
 [deprecation_wrapper.py __getattr__]
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where [deprecation.py new_func]
 [deprecation_wrapper.py __getattr__]
 [deprecation_wrapper.py __getattr__]


Unnamed: 0,age,fnlwgt,education-num,capital-gain,hours-per-week,capital-loss
0,0.301334,0.044131,0.800122,0.050959,0.397957,0.0
1,0.452,0.048052,0.799939,0.050966,0.122447,0.0
2,0.287669,0.137581,0.533282,0.051229,0.397948,0.0
3,0.493188,0.150486,0.399974,0.051238,0.397911,0.0
4,0.150717,0.220635,0.800025,0.07059,0.39796,0.0
