In the following notebook we will show how you can use the CARLA library.

# Notebook

## Data

Before we can do anything else we need some data.
You could import one of the datasets in our [OnlineCatalog](https://carla-counterfactual-and-recourse-library.readthedocs.io/en/latest/data.html#module-data.catalog.online_catalog),
however maybe you want to use your own data instead. This can easily be done by using the [CsvCatalog](https://carla-counterfactual-and-recourse-library.readthedocs.io/en/latest/data.html#module-data.catalog.csv_catalog).

In [3]:
import warnings
warnings.filterwarnings('ignore')

from carla.data.catalog import CsvCatalog

In [2]:
continuous = ["age", "fnlwgt", "education-num", "capital-gain", "hours-per-week", "capital-loss"]
categorical = ["marital-status", "native-country", "occupation", "race", "relationship", "sex", "workclass"]
immutable = ["age", "sex"]

dataset = CsvCatalog(file_path="adult.csv",
                     continuous=continuous,
                     categorical=categorical,
                     immutables=immutable,
                     target='income')

print(dataset.df)

            age    fnlwgt  education-num  capital-gain  capital-loss  ...  \
0      0.301370  0.044131       0.800000      0.021740           0.0  ...   
1      0.452055  0.048052       0.800000      0.000000           0.0  ...   
2      0.287671  0.137581       0.533333      0.000000           0.0  ...   
3      0.493151  0.150486       0.400000      0.000000           0.0  ...   
4      0.150685  0.220635       0.800000      0.000000           0.0  ...   
...         ...       ...            ...           ...           ...  ...   
48827  0.301370  0.137428       0.800000      0.000000           0.0  ...   
48828  0.643836  0.209130       0.533333      0.000000           0.0  ...   
48829  0.287671  0.245379       0.800000      0.000000           0.0  ...   
48830  0.369863  0.048444       0.800000      0.054551           0.0  ...   
48831  0.246575  0.114919       0.800000      0.000000           0.0  ...   

       occupation_Other  race_White  relationship_Non-Husband  sex_Male  \


## Model

Now that we have the data loaded we also need a classification model.
You could define your own [model](https://carla-counterfactual-and-recourse-library.readthedocs.io/en/latest/examples.html#black-box-model),
however here we will show how you can train one of our [catalog](https://carla-counterfactual-and-recourse-library.readthedocs.io/en/latest/mlmodel.html#module-models.catalog.catalog) models.
Note that depending on your data you might need to tweak the training hyperparameters.

In [4]:
from carla.models.catalog import MLModelCatalog

In [9]:
training_params = {"lr": 0.002, "epochs": 10, "batch_size": 1024, "hidden_size": [18, 9, 3]}

ml_model = MLModelCatalog(
    dataset, model_type="ann", load_online=False, backend="pytorch"
)
ml_model.train(
    learning_rate=training_params["lr"],
    epochs=training_params["epochs"],
    batch_size=training_params["batch_size"],
    hidden_size=training_params["hidden_size"]
)

balance on test set 0.2397608125819135, balance on test set 0.23804062909567497
Epoch 0/9
----------
train Loss: 0.4660 Acc: 0.7750

test Loss: 0.4003 Acc: 0.8048

Epoch 1/9
----------
train Loss: 0.3926 Acc: 0.8112

test Loss: 0.3731 Acc: 0.8213

Epoch 2/9
----------
train Loss: 0.3726 Acc: 0.8233

test Loss: 0.3570 Acc: 0.8304

Epoch 3/9
----------
train Loss: 0.3608 Acc: 0.8304

test Loss: 0.3534 Acc: 0.8316

Epoch 4/9
----------
train Loss: 0.3527 Acc: 0.8334

test Loss: 0.3430 Acc: 0.8369

Epoch 5/9
----------
train Loss: 0.3489 Acc: 0.8343

test Loss: 0.3543 Acc: 0.8351

Epoch 6/9
----------
train Loss: 0.3439 Acc: 0.8368

test Loss: 0.3350 Acc: 0.8410

Epoch 7/9
----------
train Loss: 0.3410 Acc: 0.8371

test Loss: 0.3450 Acc: 0.8351

Epoch 8/9
----------
train Loss: 0.3376 Acc: 0.8405

test Loss: 0.3308 Acc: 0.8439

Epoch 9/9
----------
train Loss: 0.3386 Acc: 0.8401

test Loss: 0.3403 Acc: 0.8403



## Recourse

Now that we have both the data, and a model we can start using CARLA to generate counterfactuals.
You can pick a [recourse method](https://carla-counterfactual-and-recourse-library.readthedocs.io/en/latest/recourse.html) from the catalog, or implement one yourself.
In the following example we are getting negative labeled samples for which we want counterfactuals.

In [10]:
from carla.models.negative_instances import predict_negative_instances
import carla.recourse_methods.catalog as recourse_catalog

In [11]:
factuals = predict_negative_instances(ml_model, dataset.df)
test_factual = factuals.iloc[:5]

[INFO] Counterfactual Explanation Found [wachter.py wachter_recourse]
[INFO] Counterfactual Explanation Found [wachter.py wachter_recourse]
[INFO] Counterfactual Explanation Found [wachter.py wachter_recourse]
[INFO] Counterfactual Explanation Found [wachter.py wachter_recourse]
[INFO] Counterfactual Explanation Found [wachter.py wachter_recourse]
        age  capital-gain  capital-loss  education-num    fnlwgt  ...  \
0  0.367432      0.089006      0.066957       0.867384  0.112537  ...   
1  0.481818      0.029809      0.029803       0.829806  0.077834  ...   
2  0.443121      0.155943      0.155183       0.689581  0.185220  ...   
3  0.557433      0.066598      0.066980       0.467272  0.217566  ...   
4  0.293216      0.142940      0.141745       0.943278  0.328395  ...   

   race_White  relationship_Non-Husband  sex_Male  workclass_Private  income  
0         1.0                       1.0       0.0          -0.066813     1.0  
1         1.0                       0.0       1.0    

### Wachter (gradient method)

In [None]:
hyperparams = {"loss_type": "BCE", "binary_cat_features": False}
recourse_method = recourse_catalog.Wachter(ml_model, hyperparams)
df_cfs = recourse_method.get_counterfactuals(test_factual)

print(df_cfs)

### CCHVAE (manifold method)

In [None]:
hyperparams = {
    "data_name": dataset.name,
    "n_search_samples": 100,
    "p_norm": 1,
    "step": 0.1,
    "max_iter": 1000,
    "clamp": True,
    "binary_cat_features": False,
    "vae_params": {
        "layers": [len(ml_model.feature_input_order), 512, 256, 8],
        "train": True,
        "lambda_reg": 1e-6,
        "epochs": 5,
        "lr": 1e-3,
        "batch_size": 32,
    },
}

cchvae = recourse_catalog.CCHVAE(ml_model, hyperparams)
df_cfs = cchvae.get_counterfactuals(test_factual)

print(df_cfs)

### FOCUS (tree method)

For tree methods we need to use a tree model.

In [None]:
from carla.recourse_methods.catalog.focus.tree_model import ForestModel, XGBoostModel
ml_model = XGBoostModel(dataset)

factuals = predict_negative_instances(ml_model, dataset.df)
test_factual = factuals.iloc[:5]

In [None]:
hyperparams = {
    "optimizer": "adam",
    "lr": 0.001,
    "n_class": 2,
    "n_iter": 1000,
    "sigma": 1.0,
    "temperature": 1.0,
    "distance_weight": 0.01,
    "distance_func": "l1",
}

focus = recourse_catalog.FOCUS(ml_model, dataset, hyperparams)
df_cfs = focus.get_counterfactuals(test_factual)
print(df_cfs)