# Benchmarking different methods

The library provides various evaluation methods for benchmarking methods. Here we show you how to use the DistanceEvaluator and then we show how to do a quick comparison of a range of different recourse methods on a given dataset.

In [1]:
# Depending on whether you want example or custom dataset, import relevant module
import os
import sys
sys.path.append(os.path.join(os.path.dirname('rocelib'), '..'))

from rocelib.datasets.ExampleDatasets import get_example_dataset
from rocelib.datasets.custom_datasets.CsvDatasetLoader import CsvDatasetLoader

# Import your model
from rocelib.lib.models.pytorch_models.SimpleNNModel import SimpleNNModel

# Import the ClassificationTask module required for recourse generation
from rocelib.lib.tasks.ClassificationTask import ClassificationTask

# Import the recourse method you wish to use
from rocelib.generators.recourse_methods.BinaryLinearSearch import BinaryLinearSearch

# Import evaluator you wish to use
from rocelib.evaluations.DistanceEvaluator import DistanceEvaluator

You can import your own custom dataset. These datasets must be preprocessed to work with the provided recourse methods.

In [2]:
# Custom dataset
dl = CsvDatasetLoader("../tests/assets/standardized_recruitment_data.csv", target_column="HiringDecision")

print(dl.data.head())

        Age  Gender  EducationLevel  ExperienceYears  PreviousCompanies  \
0 -0.989083       1               2        -1.658237          -0.001418   
1  0.416376       1               4         0.928044          -0.001418   
2  1.389387       0               2        -1.011667          -0.710538   
3 -0.124185       1               2        -0.580620          -0.710538   
4 -0.556634       0               1        -0.365097          -1.419657   

   DistanceFromCompany  InterviewScore  SkillScore  PersonalityScore  \
0             0.087792       -0.089598    0.916174          1.418126   
1             0.024537       -0.543879    0.575386          1.043255   
2            -1.070200       -1.068049    0.541307         -1.240051   
3            -1.311444       -0.508934   -0.821844          0.702463   
4             1.208598       -0.963215    0.030126          1.213651   

   RecruitmentStrategy  HiringDecision  
0                    1               1  
1                    2            

Alternatively, you can also use an example dataset, remember to preprocess these too!

In [3]:
# Load the dataset, here we are using an example one, you can import a custom CSV too
dl = get_example_dataset("ionosphere")

print(dl.data.head())

dl.default_preprocess()
print(dl.data.head())

   feature_0  feature_1  feature_2  feature_3  feature_4  feature_5  \
0          1          0    0.99539   -0.05889    0.85243    0.02306   
1          1          0    1.00000   -0.18829    0.93035   -0.36156   
2          1          0    1.00000   -0.03365    1.00000    0.00485   
3          1          0    1.00000   -0.45161    1.00000    1.00000   
4          1          0    1.00000   -0.02401    0.94140    0.06531   

   feature_6  feature_7  feature_8  feature_9  ...  feature_25  feature_26  \
0    0.83398   -0.37708    1.00000    0.03760  ...    -0.51171     0.41078   
1   -0.10868   -0.93597    1.00000   -0.04549  ...    -0.26569    -0.20468   
2    1.00000   -0.12062    0.88965    0.01198  ...    -0.40220     0.58984   
3    0.71216   -1.00000    0.00000    0.00000  ...     0.90695     0.51613   
4    0.92106   -0.23255    0.77152   -0.16399  ...    -0.65158     0.13290   

   feature_27  feature_28  feature_29  feature_30  feature_31  feature_32  \
0    -0.46168     0.21266  

Now import your model, in this case we are using a Neural Network model. Be aware of the fact that certain recourse methods may not work with some models, e.g., MCE and other MILP based methods are exclusive to Neural Networks

In [4]:
# Importing Pytorch NN
model = SimpleNNModel(34, [10], 1)

Create a ClassificationTask using the model and dataset, and then call the train method to train the model on the dataset

In [5]:
task = ClassificationTask(model, dl)

task.train()

Instantiate your recourse generator

In [6]:
# Remember, the recourse generator takes in the task during instantiation
generator = BinaryLinearSearch(task)

Now generate!

In [7]:
# Have a look at the RecourseGenerator notebook to see the other generation functions you can use!
recourses = generator.generate_for_all(neg_value=0)

print(recourses.head())

   feature_0  feature_1  feature_2  feature_3  feature_4  feature_5  \
1   0.348433        0.0   0.721648   0.034454   0.270610  -0.609526   
3   0.348433        0.0  -0.009441  -0.399661   0.693481   0.398504   
5   0.348433        0.0  -0.322288  -0.038271  -0.356447  -0.178465   
7  -1.009339        0.0  -0.127197  -0.024435  -0.105169  -0.158651   
9   0.348433        0.0  -0.386766  -0.174533  -0.243983  -0.237226   

   feature_6  feature_7  feature_8  feature_9  ...  feature_26  feature_27  \
1   0.202667  -0.400303   0.496796  -0.372911  ...   -0.340909    0.334512   
3   0.038809  -0.929796  -0.549571  -0.402190  ...    0.183315    0.549624   
5  -0.173591  -0.113428   0.004847   0.010167  ...   -0.832162    0.833594   
7   0.913967  -0.893725   0.130893  -0.398197  ...    0.759366    1.142818   
9  -0.149285  -0.216060   0.006989  -0.753203  ...    0.008015   -0.004477   

   feature_28  feature_29  feature_30  feature_31  feature_32  feature_33  \
1   -0.501047    0.538691  

Now, you can use the evaluator you used to evaluate that particular aspect of the recourses

In [8]:
# The evaluators also take the task as an argument
distance_eval = DistanceEvaluator(task)

print(recourses["loss"].mean())

# Provide the recourses to evaluate
print(f"Average distance: {distance_eval.evaluate(recourses)}")

5.515753650817961
Average distance: 5.515753650817961


# Using QuickTabulate

But what if you want to evaluate and benchmark several methods in one go? You can use QuickTabulate!

First import the quick_tabulate function and all relevant generation methods

In [11]:
from rocelib.lib.QuickTabulate import quick_tabulate

from rocelib.generators.recourse_methods.Wachter import Wachter
from rocelib.generators.robust_recourse_methods.MCER import MCER
from rocelib.generators.recourse_methods.MCE import MCE

Now, create a dictionary consisting of the names of each recourse generation method and the respective class. Notice, we haven't instantiated the class (i.e., we haven't used () after the method name)!

In [12]:
# The keys are the names you would like to show in the table produced
methods = {"BLS": BinaryLinearSearch, "MCE": MCE, "MCER": MCER, "Wachter": Wachter}

And finally, call the quick_tabulate function!

It is important to note that each recourse generator may take different arguments. These arguments have default values in each recourse generator, however to pass a custom value you can simply pass them as a keyword argument into quick_tabulate, as we have done for delta here.

Furthermore, if you are using an example dataset, you can set the preprocess parameter to True if you would like to default preprocess the dataset.

In [13]:
quick_tabulate(dl, model, methods, neg_value=0, column_name="target", preprocess=False, delta=0.005)

Restricted license - for non-production use only - expires 2025-11-24
No solution found using MCE!
No possible solution for given parameters - maybe your delta is TOO HIGH!
+----------+----------------------+-----------------------+--------------------+-------------------------+
| Method   |   Execution Time (s) |   Validity proportion |   Average Distance |   Robustness proportion |
| BLS      |             0.404113 |                 1     |            5.35259 |                   0.152 |
+----------+----------------------+-----------------------+--------------------+-------------------------+
| MCE      |             1.98107  |                 1     |            8.14444 |                   0     |
+----------+----------------------+-----------------------+--------------------+-------------------------+
| MCER     |            22.4142   |                 0.992 |           12.4409  |                   0.992 |
+----------+----------------------+-----------------------+-------------------

If subset is not specified, quick_tabulate assumes you would like to produce recourses for all negative instances. You may specify a subset as shown below.

In [14]:
subset = dl.get_negative_instances(neg_value=0, column_name="target").sample(n=50)

quick_tabulate(dl, model, methods, subset=subset, neg_value=0, column_name="target", preprocess=False, delta=0.005)

+----------+----------------------+-----------------------+--------------------+-------------------------+
| Method   |   Execution Time (s) |   Validity proportion |   Average Distance |   Robustness proportion |
| BLS      |             0.17358  |                     1 |            4.85582 |                    0.08 |
+----------+----------------------+-----------------------+--------------------+-------------------------+
| MCE      |             0.779986 |                     1 |            7.73284 |                    0    |
+----------+----------------------+-----------------------+--------------------+-------------------------+
| MCER     |            14.088    |                     1 |           12.6518  |                    1    |
+----------+----------------------+-----------------------+--------------------+-------------------------+
| Wachter  |             0.254442 |                     1 |            1.70592 |                    0.16 |
+----------+----------------------+--

And there you have it! That's how you can compare various methods on a singular dataset.