<a href="https://colab.research.google.com/github/craigybaeb/MANIC/blob/main/experiments/blood_alcohol.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Heart Disease

The UCI Cleveland Heart Disease dataset, also referred to as the Cleveland Heart Disease dataset, is a widely recognized dataset frequently employed for binary classification tasks in the realm of machine learning. Originally compiled by researchers associated with the Cleveland Clinic Foundation, this dataset is publicly accessible through the UCI Machine Learning Repository.

Comprising a total of 768 instances (rows) and 14 attributes (columns), encompassing the target variable, the dataset centers around diagnosing the presence or absence of heart disease. Each instance pertains to a patient who underwent various medical examinations, and the attributes offer insights into their physiological and diagnostic measurements. The target variable is binary in nature, signifying whether the patient exhibits symptoms of heart disease.

The dataset encompasses the subsequent attributes:

1. `Age`: Age of the patient (years).
2. `Sex`: Gender of the patient (0: female, 1: male).
3. `CP` (Chest Pain Type): Type of chest pain experienced by the patient.
4. `Trestbps` (Resting Blood Pressure): Resting blood pressure (mm Hg).
5. `Chol` (Serum Cholesterol): Serum cholesterol level (mg/dl).
6. `Fbs (Fasting Blood Sugar): Fasting blood sugar level (1 if > 120 mg/dl, 0 otherwise).`
7. `Restecg` (Resting Electrocardiographic Results): Resting electrocardiographic results.
8. `Thalach` (Maximum Heart Rate Achieved): Maximum heart rate achieved during exercise.
9. `Exang` (Exercise Induced Angina): Presence of exercise-induced angina (1: yes, 0: no).
10. `Oldpeak`: ST depression induced by exercise relative to rest.
11. `Slope`: Slope of the peak exercise ST segment.
12. `Ca` (Number of Major Vessels): Number of major vessels colored by fluoroscopy.
13. `Thal`: Thallium stress test result.
14. `Target`: Target variable, 1 if the patient has heart disease, 0 otherwise.
The primary objective of utilizing this dataset is to construct a binary classification model capable of predicting the likelihood of heart disease based on the provided medical and diagnostic attributes. Renowned for its utility in assessing the performance of diverse machine learning algorithms, particularly those designed for classification tasks, the Cleveland Heart Disease dataset remains a valuable resource in the field.

## Table of Contents

>[Moodle](#updateTitle=true&folderId=1UuNwfjM9jRGNVKDtnrdXSMowsG8AW0TH&scrollTo=NUkgJBhgIFi4)

>>[Overview](#updateTitle=true&folderId=1UuNwfjM9jRGNVKDtnrdXSMowsG8AW0TH&scrollTo=QYAZZhGaIJKN)

>>[Dataset Details](#updateTitle=true&folderId=1UuNwfjM9jRGNVKDtnrdXSMowsG8AW0TH&scrollTo=QYAZZhGaIJKN)

>>[Features](#updateTitle=true&folderId=1UuNwfjM9jRGNVKDtnrdXSMowsG8AW0TH&scrollTo=QYAZZhGaIJKN)

>>[Target Variable](#updateTitle=true&folderId=1UuNwfjM9jRGNVKDtnrdXSMowsG8AW0TH&scrollTo=QYAZZhGaIJKN)

>>[Installation of Required Packages](#updateTitle=true&folderId=1UuNwfjM9jRGNVKDtnrdXSMowsG8AW0TH&scrollTo=gS6CjATdJjXm)

>>>[Import Packages](#updateTitle=true&folderId=1UuNwfjM9jRGNVKDtnrdXSMowsG8AW0TH&scrollTo=0JfhOD590dWO)

>>[Recording System Specification](#updateTitle=true&folderId=1UuNwfjM9jRGNVKDtnrdXSMowsG8AW0TH&scrollTo=-lhdpJhrJpir)

>>[Data Loading & Pre-Processing](#updateTitle=true&folderId=1UuNwfjM9jRGNVKDtnrdXSMowsG8AW0TH&scrollTo=PPfiWfryJwca)

>>[Training the Classifiers](#updateTitle=true&folderId=1UuNwfjM9jRGNVKDtnrdXSMowsG8AW0TH&scrollTo=r6WpQLIFJ5fH)

>>>[Random Forest (RF)](#updateTitle=true&folderId=1UuNwfjM9jRGNVKDtnrdXSMowsG8AW0TH&scrollTo=9yKYai4qJ73_)

>>>[Multi-layer Perceptron (MLP)](#updateTitle=true&folderId=1UuNwfjM9jRGNVKDtnrdXSMowsG8AW0TH&scrollTo=lZGw6t6SJ-JO)

>>>[Support Vector Classifier (SVC)](#updateTitle=true&folderId=1UuNwfjM9jRGNVKDtnrdXSMowsG8AW0TH&scrollTo=cQEUnfxcKDz4)

>>[Model Explanation](#updateTitle=true&folderId=1UuNwfjM9jRGNVKDtnrdXSMowsG8AW0TH&scrollTo=YtWaWgcwKI-W)

>>>[DiCE](#updateTitle=true&folderId=1UuNwfjM9jRGNVKDtnrdXSMowsG8AW0TH&scrollTo=dCJmnJC7QsFg)

>>>>[DiCE (Random Search)](#updateTitle=true&folderId=1UuNwfjM9jRGNVKDtnrdXSMowsG8AW0TH&scrollTo=5Qpj9MOHQxl8)

>>>>[DiCE (Genetic Algorithm)](#updateTitle=true&folderId=1UuNwfjM9jRGNVKDtnrdXSMowsG8AW0TH&scrollTo=7XQskKPnWP1l)

>>>>[DiCE (KD Tree)](#updateTitle=true&folderId=1UuNwfjM9jRGNVKDtnrdXSMowsG8AW0TH&scrollTo=Y1g9JPYOYUoK)

>>>[DisCERN](#updateTitle=true&folderId=1UuNwfjM9jRGNVKDtnrdXSMowsG8AW0TH&scrollTo=FiX6d_lIYzjS)

>>>>[DisCERN (LIME)](#updateTitle=true&folderId=1UuNwfjM9jRGNVKDtnrdXSMowsG8AW0TH&scrollTo=D3Mmrp1lZHMC)

>>>>[DisCERN (SHAP)](#updateTitle=true&folderId=1UuNwfjM9jRGNVKDtnrdXSMowsG8AW0TH&scrollTo=6r9_mjklpZBL)

>>>[NICE](#updateTitle=true&folderId=1UuNwfjM9jRGNVKDtnrdXSMowsG8AW0TH&scrollTo=xPLx2-oNwG8s)

>>>>[NICE (Sparsity)](#updateTitle=true&folderId=1UuNwfjM9jRGNVKDtnrdXSMowsG8AW0TH&scrollTo=OkAseMwIwkKa)

>>>>[NICE (Proximity)](#updateTitle=true&folderId=1UuNwfjM9jRGNVKDtnrdXSMowsG8AW0TH&scrollTo=37WMzigP4Hi9)

>>>[Counterfactuals Guided By Prototypes (CFProto)](#updateTitle=true&folderId=1UuNwfjM9jRGNVKDtnrdXSMowsG8AW0TH&scrollTo=xsZDcgK18er_)

>>[Experiments](#updateTitle=true&folderId=1UuNwfjM9jRGNVKDtnrdXSMowsG8AW0TH&scrollTo=1zrHQgS58-mb)

>>>[Selection of Counterfactual Based On Proximity](#updateTitle=true&folderId=1UuNwfjM9jRGNVKDtnrdXSMowsG8AW0TH&scrollTo=f4PWgWpT9MqE)

>>>[Selection of Counterfactual Based On Proximity and Disagreement](#updateTitle=true&folderId=1UuNwfjM9jRGNVKDtnrdXSMowsG8AW0TH&scrollTo=Ouvp-YClgbKP)

>>>[Metacounterfactual Generation](#updateTitle=true&folderId=1UuNwfjM9jRGNVKDtnrdXSMowsG8AW0TH&scrollTo=onQ9a8qu9K-E)

>>>[Population Ablation Study](#updateTitle=true&folderId=1UuNwfjM9jRGNVKDtnrdXSMowsG8AW0TH&scrollTo=mnUnbYFFguQG)

>>[References](#updateTitle=true&folderId=1UuNwfjM9jRGNVKDtnrdXSMowsG8AW0TH&scrollTo=sTT-YiC4gzhF)



## Installation of Required Packages

In [None]:
%%capture

# DICE, NICE, CFPROTO
!pip install dice_ml
!pip install NICEx
!pip install alibi

# DisCERN
!pip install lime
!pip install shap
!pip install discern-xai

# MANIC (Our contribution)
!pip install manic-xai==1.0.83

### Import Packages

In [None]:
# Import required libraries
import os
import json
import random
import pickle
import time

# For building models
from sklearn.model_selection import train_test_split, GridSearchCV, StratifiedKFold
from sklearn.neural_network import MLPClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.metrics import roc_auc_score, precision_score, recall_score, f1_score, classification_report, accuracy_score
from sklearn import preprocessing

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import tensorflow as tf

# Import XAI packages
from alibi.explainers import CounterfactualProto
from discern.discern_tabular import DisCERNTabular
from nice import NICE
import dice_ml

# Import significance testing packages
from scipy.stats import f_oneway

# MANIC (Our contribution)
from manic import Manic

UserConfigValidationException will be deprecated from dice_ml.utils. Please import UserConfigValidationException from raiutils.exceptions.


In [None]:
# Mount to Google Drive to save/fetch progress
from google.colab import drive
drive.mount('/content/gdrive')

# Folder to save files in/fetch files from
filepath = '/content/gdrive/MyDrive/PhD/MANIC/blood_alcohol/'

Mounted at /content/gdrive


## Recording System Specification

The system specification is recorded, as it may be interesting to note how long MANIC takes to run.

In [None]:
# Show the number of CPU cores active
num_cores = os.cpu_count()
print("Number of CPU cores:", num_cores)

Number of CPU cores: 4


In [None]:
# Show GPU information
!nvidia-smi

Tue Aug 15 08:20:00 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla V100-SXM2...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   35C    P0    26W / 300W |      2MiB / 16384MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [None]:
# Show memory information
!cat /proc/meminfo

MemTotal:       26687688 kB
MemFree:        20498552 kB
MemAvailable:   24595320 kB
Buffers:          107800 kB
Cached:          4244320 kB
SwapCached:            0 kB
Active:           809796 kB
Inactive:        4929068 kB
Active(anon):       1524 kB
Inactive(anon):  1387092 kB
Active(file):     808272 kB
Inactive(file):  3541976 kB
Unevictable:          12 kB
Mlocked:              12 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:              8408 kB
Writeback:             0 kB
AnonPages:       1385528 kB
Mapped:           824544 kB
Shmem:              1832 kB
KReclaimable:     138432 kB
Slab:             188968 kB
SReclaimable:     138432 kB
SUnreclaim:        50536 kB
KernelStack:        6336 kB
PageTables:        23400 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    13343844 kB
Committed_AS:    4137080 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       77940 kB
VmallocChunk:          0 kB
Percpu:          

## Data Loading & Pre-Processing

In [None]:
# We uploaded the datasets to GitHub
df = pd.read_csv('https://github.com/craigybaeb/MANIC/blob/main/experiments/data/blood_alcohol.csv?raw=true')

# Check the data loaded correctly
df.head()

Unnamed: 0,ID,weight,units,duration,gender,mr,meal,peak,BAC,limit,bac_sh
0,1,40,1,30,Male,0.015,Full,30,0.018352,under,0.02
1,2,40,1,45,Male,0.015,Full,30,0.014602,under,0.01
2,3,40,1,60,Male,0.015,Full,30,0.010852,under,0.01
3,4,40,1,75,Male,0.015,Full,30,0.007102,under,0.01
4,5,40,1,90,Male,0.015,Full,30,0.003352,under,0.0


An income of 1 indicates a higher income while an income of 0 indicates a lower income. The purpose of the counterfactuals will be to demonstrate how to go from a lower income to a higher income.

In [None]:
# Drop the columns we don't want
df.drop(columns=['ID','BAC','bac_sh'], inplace=True)

In [None]:
df.head()

Unnamed: 0,weight,units,duration,gender,mr,meal,peak,limit
0,40,1,30,Male,0.015,Full,30,under
1,40,1,45,Male,0.015,Full,30,under
2,40,1,60,Male,0.015,Full,30,under
3,40,1,75,Male,0.015,Full,30,under
4,40,1,90,Male,0.015,Full,30,under


In [None]:
# Label encode the dataframe
le = preprocessing.LabelEncoder()
df['gender'] = le.fit_transform(df['gender'])
df['meal'] = le.fit_transform(df['meal'])
df['limit'] = le.fit_transform(df['limit'])

In [None]:
df.head()

Unnamed: 0,weight,units,duration,gender,mr,meal,peak,limit
0,40,1,30,1,0.015,1,30,1
1,40,1,45,1,0.015,1,30,1
2,40,1,60,1,0.015,1,30,1
3,40,1,75,1,0.015,1,30,1
4,40,1,90,1,0.015,1,30,1


There are 768 instances and 8 features (excluding the target) in the dataset.

In [None]:
# Print the number of instances in the dataset
len(df)

97980

In [None]:
# Show dataset statistics to get an idea of the feature ranges
df.describe()

Unnamed: 0,weight,units,duration,gender,mr,meal,peak,limit
count,97980.0,97980.0,97980.0,97980.0,97980.0,97980.0,97980.0,97980.0
mean,75.0,8.0,195.0,0.5,0.016,0.5,15.0,0.435599
std,20.494006,4.320516,99.499251,0.500003,0.001,0.500003,15.000077,0.495838
min,40.0,1.0,30.0,0.0,0.015,0.0,0.0,0.0
25%,57.0,4.0,105.0,0.0,0.015,0.0,0.0,0.0
50%,75.0,8.0,195.0,0.5,0.016,0.5,15.0,0.0
75%,93.0,12.0,285.0,1.0,0.017,1.0,30.0,1.0
max,110.0,15.0,360.0,1.0,0.017,1.0,30.0,1.0


In [None]:
# Show data types and nulls
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 97980 entries, 0 to 97979
Data columns (total 8 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   weight    97980 non-null  int64  
 1   units     97980 non-null  int64  
 2   duration  97980 non-null  int64  
 3   gender    97980 non-null  int64  
 4   mr        97980 non-null  float64
 5   meal      97980 non-null  int64  
 6   peak      97980 non-null  int64  
 7   limit     97980 non-null  int64  
dtypes: float64(1), int64(7)
memory usage: 6.0 MB


In [None]:
df['gender'] = df['gender'].astype(int)
df['meal'] = df['meal'].astype(int)
df['limit'] = df['limit'].astype(int)

In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 97980 entries, 0 to 97979
Data columns (total 8 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   weight    97980 non-null  int64  
 1   units     97980 non-null  int64  
 2   duration  97980 non-null  int64  
 3   gender    97980 non-null  int64  
 4   mr        97980 non-null  float64
 5   meal      97980 non-null  int64  
 6   peak      97980 non-null  int64  
 7   limit     97980 non-null  int64  
dtypes: float64(1), int64(7)
memory usage: 6.0 MB


There are no missing values in the Income dataset and all instances are integer.

In [None]:
# Cleaner check for nulls
df.isnull().sum(axis = 0)

weight      0
units       0
duration    0
gender      0
mr          0
meal        0
peak        0
limit       0
dtype: int64

In [None]:
# Check for NaN values in any column
df.isna().any().any()

False

All features are integer values, suitable for our counterfactual methods and MANIC.

In [None]:
# Set seed for reproducability
seed = 42
np.random.seed(seed)
random.seed(seed)
tf.random.set_seed(seed)

In [None]:
# Number of unique random numbers you want to generate
num_unique_numbers = 20

# Range of random numbers (you can adjust the range as needed)
lower_bound = 1
upper_bound = 100

# We only want unique splits
unique_numbers_set = set()

# Loop until we get 20 unique data splits
while len(unique_numbers_set) < num_unique_numbers:
    # Generate a random number within the specified range
    random_number = np.random.randint(lower_bound, upper_bound + 1)
    unique_numbers_set.add(random_number)

# Convert the set of unique numbers to a Python list
repeats = list(unique_numbers_set)

# Show seeds to create the new splits
print(repeats)

[2, 3, 15, 21, 22, 24, 30, 38, 52, 53, 60, 61, 64, 72, 75, 83, 87, 88, 93, 100]


In [None]:
# Split the dataset into features (X) and target (y)
X = df.drop(columns=["limit"])
y = df["limit"]

# Our dataset is very large, 1% of the data should be enough for our experiments
X_discarded, X_reduced, y_discarded, y_reduced = train_test_split(X, y, test_size=0.01, random_state=seed, stratify=y)

# Check new dataset size
print(f'Reduced set size: ({len(X_reduced)}, {len(y_reduced)})')

Reduced set size: (980, 980)


In [None]:
# Initialise our dataset splits
repeat_train_data, repeat_test_data = [], []
repeat_y_train, repeat_y_test = [], []

# Create the repeated splits
for repeat_seed in repeats:

  # Split the dataset into training and test sets
  train_data, test_data, y_train, y_test = train_test_split(X_reduced, y_reduced, test_size=0.3, random_state=repeat_seed, stratify=y_reduced)

  # Save the splits
  repeat_train_data.append(train_data)
  repeat_test_data.append(test_data)
  repeat_y_train.append(y_train)
  repeat_y_test.append(y_test)

In [None]:
# Initialize the Stratified Shuffle Split cross-validator (5 Folds)
skf = StratifiedKFold(n_splits=5)

Stratified 5-fold cross validation is adopted to ensure reliability in our results over a repeated number of trials.

## Training the Classifiers

2 popular black-box models are implemented. For each model, we use stratified 10-fold cross-validation and a grid-search to optimise our hyperparameters. We then report the test accuracy on the hold-out dataset. The best classifier is obtained according to the best parameter configurations found from the grid search. A summary of the searched parameters is as follows:


1.   **Random Forest (RF)**:
      - *n_estimators*: the number of decision trees in the forest (100, 300, 500).
      - *max_depth*: the maximum depth of a decision tree in the forest (None, 10, 20).
      - *min_samples_split*: the minimum number of samples required to split an internal node (2, 5, 19).
2.   **Multi-layer Perceptron (MLP)**:
      - *hidden_layer_sizes*: number of hidden nodes ((128, 64), (256, 64)).
      - *activation*: Activation function used (relu).
      - *solver*: optimisation function used (adam).
      - *learning_rate*: learning rate schedule for weight updates (adaptive).
      - *learning_rate_init*: the initial learning rate used - it controls the step-size in updating the weights (0.001, 0.01, 0.1).

### Multi-layer Perceptron (MLP)

In [None]:
# Reset seed to ensure reproducability
np.random.seed(seed)
random.seed(seed)
tf.random.set_seed(seed)

In [None]:
# Repeat the MLP training 20 times to get different models and predictions
repeat_mlp_classifiers = []
repeat_mlp_predictions = []

# Initialise the repeated train scores
mlp_train_accuracies = []
mlp_train_f1_scores = []
mlp_train_precision_scores = []
mlp_train_recall_scores = []
mlp_train_roc_scores = []

# Initialise the repeated test scores
mlp_test_accuracies = []
mlp_test_f1_scores = []
mlp_test_precision_scores = []
mlp_test_recall_scores = []
mlp_test_roc_scores = []

# Train a model on each split (20 total)
for i, repeat_seed in enumerate(repeats):

  # Log repetition
  print(f'Repetition: {i + 1}')

  # Initialize the Multi-layer Perceptron classifier
  mlp_classifier = MLPClassifier(activation='relu', solver='adam', learning_rate='adaptive', random_state=seed)

  # Define the hyperparameter grid for the Multi-layer Perceptron
  mlp_param_grid = {
      'hidden_layer_sizes': [(128,64), (256,64)],
      'learning_rate_init': [0.001, 0.01, 0.1]
  }

  # Initialize the Grid Search with cross-validation
  mlp_grid_search = GridSearchCV(mlp_classifier, mlp_param_grid, cv=skf, scoring=['accuracy', 'precision', 'recall', 'f1', 'roc_auc'], refit='precision')

  # Pemlporm the Grid Search with cross-validation on the training data
  mlp_grid_search.fit(repeat_train_data[i], repeat_y_train[i])

  # Get the best hyperparameters from the Grid Search
  mlp_best_params = mlp_grid_search.best_params_
  print("Best Hyperparameters:", mlp_best_params)

  # Retrieve the results of the cross-validation from the Grid Search
  mlp_cv_results = pd.DataFrame(mlp_grid_search.cv_results_)

  # Store the training accuracies for the best classifiers in each repetition
  mlp_train_accuracies.append(np.max(mlp_cv_results['mean_test_accuracy']))
  mlp_train_precision_scores.append(np.max(mlp_cv_results['mean_test_precision']))
  mlp_train_recall_scores.append(np.max(mlp_cv_results['mean_test_recall']))
  mlp_train_f1_scores.append(np.max(mlp_cv_results['mean_test_f1']))
  mlp_train_roc_scores.append(np.max(mlp_cv_results['mean_test_roc_auc']))

  # Train the Multi-layer Perceptron classifier on the entire training data using the best hyperparameters
  best_mlp_classifier = MLPClassifier(**mlp_best_params, random_state=seed)
  best_mlp_classifier.fit(repeat_train_data[i], repeat_y_train[i])

  # Evaluate the final model on the test set
  final_mlp_test_precision = best_mlp_classifier.score(repeat_test_data[i], repeat_y_test[i])
  print("Final Test Precision:", final_mlp_test_precision)

  # Get Random Forest predictions for test set
  mlp_predictions = best_mlp_classifier.predict(repeat_test_data[i])

  # Calculate ROC-AUC, Precision, Recall and F1 Score
  mlp_test_roc_auc = roc_auc_score(repeat_y_test[i], mlp_predictions)
  mlp_test_precision = precision_score(repeat_y_test[i], mlp_predictions, average='weighted')
  mlp_test_recall = recall_score(repeat_y_test[i], mlp_predictions, average='weighted')
  mlp_test_f1_score = f1_score(repeat_y_test[i], mlp_predictions, average='weighted')
  mlp_test_accuracy = accuracy_score(repeat_y_test[i], mlp_predictions)

  # Print the classification report for the test set
  print("Classification Report:")
  print(classification_report(repeat_y_test[i], mlp_predictions))

  # Append to list to use later
  repeat_mlp_classifiers.append(best_mlp_classifier)
  repeat_mlp_predictions.append(mlp_predictions)

  # Record test scores for the repetition
  mlp_test_accuracies.append(mlp_test_accuracy)
  mlp_test_precision_scores.append(mlp_test_precision)
  mlp_test_roc_scores.append(mlp_test_roc_auc)
  mlp_test_recall_scores.append(mlp_test_recall)
  mlp_test_f1_scores.append(mlp_test_f1_score)

  # Separate repetitions for debugging
  print('--------------------------')

Repetition: 1


Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.


Best Hyperparameters: {'hidden_layer_sizes': (256, 64), 'learning_rate_init': 0.001}
Final Test Precision: 0.9727891156462585
Classification Report:
              precision    recall  f1-score   support

           0       0.96      0.99      0.98       166
           1       0.99      0.95      0.97       128

    accuracy                           0.97       294
   macro avg       0.98      0.97      0.97       294
weighted avg       0.97      0.97      0.97       294

--------------------------
Repetition: 2


Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.


Best Hyperparameters: {'hidden_layer_sizes': (128, 64), 'learning_rate_init': 0.001}
Final Test Precision: 0.9523809523809523
Classification Report:
              precision    recall  f1-score   support

           0       0.99      0.93      0.96       166
           1       0.91      0.98      0.95       128

    accuracy                           0.95       294
   macro avg       0.95      0.96      0.95       294
weighted avg       0.95      0.95      0.95       294

--------------------------
Repetition: 3


Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.


Best Hyperparameters: {'hidden_layer_sizes': (256, 64), 'learning_rate_init': 0.001}
Final Test Precision: 0.9591836734693877
Classification Report:
              precision    recall  f1-score   support

           0       1.00      0.93      0.96       166
           1       0.91      1.00      0.96       128

    accuracy                           0.96       294
   macro avg       0.96      0.96      0.96       294
weighted avg       0.96      0.96      0.96       294

--------------------------
Repetition: 4


Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.


Best Hyperparameters: {'hidden_layer_sizes': (256, 64), 'learning_rate_init': 0.001}
Final Test Precision: 0.9727891156462585
Classification Report:
              precision    recall  f1-score   support

           0       0.98      0.97      0.98       166
           1       0.96      0.98      0.97       128

    accuracy                           0.97       294
   macro avg       0.97      0.97      0.97       294
weighted avg       0.97      0.97      0.97       294

--------------------------
Repetition: 5


Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.


Best Hyperparameters: {'hidden_layer_sizes': (256, 64), 'learning_rate_init': 0.001}
Final Test Precision: 0.9591836734693877
Classification Report:
              precision    recall  f1-score   support

           0       0.99      0.93      0.96       166
           1       0.92      0.99      0.95       128

    accuracy                           0.96       294
   macro avg       0.96      0.96      0.96       294
weighted avg       0.96      0.96      0.96       294

--------------------------
Repetition: 6


Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control 

Best Hyperparameters: {'hidden_layer_sizes': (256, 64), 'learning_rate_init': 0.01}
Final Test Precision: 0.9387755102040817
Classification Report:
              precision    recall  f1-score   support

           0       0.90      1.00      0.95       166
           1       1.00      0.86      0.92       128

    accuracy                           0.94       294
   macro avg       0.95      0.93      0.94       294
weighted avg       0.94      0.94      0.94       294

--------------------------
Repetition: 7


Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zer

Best Hyperparameters: {'hidden_layer_sizes': (256, 64), 'learning_rate_init': 0.001}
Final Test Precision: 0.9795918367346939
Classification Report:
              precision    recall  f1-score   support

           0       0.97      1.00      0.98       166
           1       1.00      0.95      0.98       128

    accuracy                           0.98       294
   macro avg       0.98      0.98      0.98       294
weighted avg       0.98      0.98      0.98       294

--------------------------
Repetition: 8


Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control 

Best Hyperparameters: {'hidden_layer_sizes': (256, 64), 'learning_rate_init': 0.001}
Final Test Precision: 0.9489795918367347
Classification Report:
              precision    recall  f1-score   support

           0       0.92      1.00      0.96       166
           1       1.00      0.88      0.94       128

    accuracy                           0.95       294
   macro avg       0.96      0.94      0.95       294
weighted avg       0.95      0.95      0.95       294

--------------------------
Repetition: 9


Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet.
Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-d

Best Hyperparameters: {'hidden_layer_sizes': (128, 64), 'learning_rate_init': 0.01}
Final Test Precision: 0.9115646258503401
Classification Report:
              precision    recall  f1-score   support

           0       0.86      1.00      0.93       166
           1       1.00      0.80      0.89       128

    accuracy                           0.91       294
   macro avg       0.93      0.90      0.91       294
weighted avg       0.92      0.91      0.91       294

--------------------------
Repetition: 10


Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.


Best Hyperparameters: {'hidden_layer_sizes': (128, 64), 'learning_rate_init': 0.01}
Final Test Precision: 0.9523809523809523
Classification Report:
              precision    recall  f1-score   support

           0       0.92      1.00      0.96       166
           1       1.00      0.89      0.94       128

    accuracy                           0.95       294
   macro avg       0.96      0.95      0.95       294
weighted avg       0.96      0.95      0.95       294

--------------------------
Repetition: 11


Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.


Best Hyperparameters: {'hidden_layer_sizes': (128, 64), 'learning_rate_init': 0.001}
Final Test Precision: 0.9523809523809523
Classification Report:
              precision    recall  f1-score   support

           0       0.92      1.00      0.96       166
           1       1.00      0.89      0.94       128

    accuracy                           0.95       294
   macro avg       0.96      0.95      0.95       294
weighted avg       0.96      0.95      0.95       294

--------------------------
Repetition: 12


Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet.
Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-d

Best Hyperparameters: {'hidden_layer_sizes': (256, 64), 'learning_rate_init': 0.001}
Final Test Precision: 0.9659863945578231
Classification Report:
              precision    recall  f1-score   support

           0       1.00      0.94      0.97       166
           1       0.93      1.00      0.96       128

    accuracy                           0.97       294
   macro avg       0.96      0.97      0.97       294
weighted avg       0.97      0.97      0.97       294

--------------------------
Repetition: 13


Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zer

Best Hyperparameters: {'hidden_layer_sizes': (256, 64), 'learning_rate_init': 0.01}
Final Test Precision: 0.9591836734693877
Classification Report:
              precision    recall  f1-score   support

           0       1.00      0.93      0.96       166
           1       0.91      1.00      0.96       128

    accuracy                           0.96       294
   macro avg       0.96      0.96      0.96       294
weighted avg       0.96      0.96      0.96       294

--------------------------
Repetition: 14


Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.


Best Hyperparameters: {'hidden_layer_sizes': (128, 64), 'learning_rate_init': 0.01}
Final Test Precision: 0.9897959183673469
Classification Report:
              precision    recall  f1-score   support

           0       0.99      0.99      0.99       166
           1       0.99      0.98      0.99       128

    accuracy                           0.99       294
   macro avg       0.99      0.99      0.99       294
weighted avg       0.99      0.99      0.99       294

--------------------------
Repetition: 15


Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.


Best Hyperparameters: {'hidden_layer_sizes': (256, 64), 'learning_rate_init': 0.001}
Final Test Precision: 0.9251700680272109
Classification Report:
              precision    recall  f1-score   support

           0       1.00      0.87      0.93       166
           1       0.85      1.00      0.92       128

    accuracy                           0.93       294
   macro avg       0.93      0.93      0.92       294
weighted avg       0.94      0.93      0.93       294

--------------------------
Repetition: 16


Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zer

Best Hyperparameters: {'hidden_layer_sizes': (128, 64), 'learning_rate_init': 0.001}
Final Test Precision: 0.9081632653061225
Classification Report:
              precision    recall  f1-score   support

           0       1.00      0.84      0.91       166
           1       0.83      1.00      0.90       128

    accuracy                           0.91       294
   macro avg       0.91      0.92      0.91       294
weighted avg       0.92      0.91      0.91       294

--------------------------
Repetition: 17


Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zer

Best Hyperparameters: {'hidden_layer_sizes': (256, 64), 'learning_rate_init': 0.01}
Final Test Precision: 0.9591836734693877
Classification Report:
              precision    recall  f1-score   support

           0       0.94      0.99      0.96       166
           1       0.98      0.92      0.95       128

    accuracy                           0.96       294
   macro avg       0.96      0.95      0.96       294
weighted avg       0.96      0.96      0.96       294

--------------------------
Repetition: 18


Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zer

Best Hyperparameters: {'hidden_layer_sizes': (256, 64), 'learning_rate_init': 0.001}
Final Test Precision: 0.9387755102040817
Classification Report:
              precision    recall  f1-score   support

           0       0.90      1.00      0.95       166
           1       1.00      0.86      0.92       128

    accuracy                           0.94       294
   macro avg       0.95      0.93      0.94       294
weighted avg       0.94      0.94      0.94       294

--------------------------
Repetition: 19


Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.


Best Hyperparameters: {'hidden_layer_sizes': (128, 64), 'learning_rate_init': 0.001}
Final Test Precision: 0.9625850340136054
Classification Report:
              precision    recall  f1-score   support

           0       0.99      0.94      0.97       166
           1       0.93      0.99      0.96       128

    accuracy                           0.96       294
   macro avg       0.96      0.97      0.96       294
weighted avg       0.96      0.96      0.96       294

--------------------------
Repetition: 20


Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.
Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control 

Best Hyperparameters: {'hidden_layer_sizes': (128, 64), 'learning_rate_init': 0.01}
Final Test Precision: 0.9081632653061225
Classification Report:
              precision    recall  f1-score   support

           0       1.00      0.84      0.91       166
           1       0.83      1.00      0.90       128

    accuracy                           0.91       294
   macro avg       0.91      0.92      0.91       294
weighted avg       0.92      0.91      0.91       294

--------------------------


In [None]:
# Show the test results
print(f'Mean test accuracy over all repetitions: {np.mean(mlp_test_accuracies)}')
print(f'Mean test precision over all repetitions: {np.mean(mlp_test_precision_scores)}')
print(f'Mean test recall over all repetitions: {np.mean(mlp_test_recall_scores)}')
print(f'Mean test f1 score over all repetitions: {np.mean(mlp_test_f1_scores)}')
print(f'Mean test ROC AUC over all repetitions: {np.mean(mlp_test_roc_scores)}')

Mean test accuracy over all repetitions: 0.9508503401360544
Mean test precision over all repetitions: 0.9557359687060023
Mean test recall over all repetitions: 0.9508503401360544
Mean test f1 score over all repetitions: 0.9506932619958809
Mean test ROC AUC over all repetitions: 0.9503506212349396


In [None]:
# Show the train results
print(f'Mean train accuracy over all repetitions: {np.mean(mlp_train_accuracies)}')
print(f'Mean train precision over all repetitions: {np.mean(mlp_train_precision_scores)}')
print(f'Mean train recall over all repetitions: {np.mean(mlp_train_recall_scores)}')
print(f'Mean train f1 score over all repetitions: {np.mean(mlp_train_f1_scores)}')
print(f'Mean train ROC AUC over all repetitions: {np.mean(mlp_train_roc_scores)}')

Mean train accuracy over all repetitions: 0.9700465460700307
Mean train precision over all repetitions: 0.9718525961614771
Mean train recall over all repetitions: 0.9816158192090395
Mean train f1 score over all repetitions: 0.9660273594973893
Mean train ROC AUC over all repetitions: 0.9980612951079054


In [None]:
# # Store train scores in a dictionary
# mlp_train_scores = {
#     'accuracy': mlp_train_accuracies,
#     'precision': mlp_train_precision_scores,
#     'recall': mlp_train_recall_scores,
#     'f1': mlp_train_f1_scores,
#     'roc_auc': mlp_train_roc_scores
# }

# # Store test scores in a dictionary
# mlp_test_scores = {
#     'accuracy': mlp_test_accuracies,
#     'precision': mlp_test_precision_scores,
#     'recall': mlp_test_recall_scores,
#     'f1': mlp_test_f1_scores,
#     'roc_auc': mlp_test_roc_scores
# }

# # Save models to Drive
# file = open(f'{filepath}repeat_mlp_classifiers', 'wb')
# pickle.dump(repeat_mlp_classifiers, file)

# # Save predictions to Drive
# file = open(f'{filepath}repeat_mlp_predictions', 'wb')
# pickle.dump(repeat_mlp_predictions, file)

# # Save train scores to Drive
# file = open(f'{filepath}mlp_train_scores', 'wb')
# pickle.dump(mlp_train_scores, file)

# # Save test scores to Drive
# file = open(f'{filepath}mlp_test_scores', 'wb')
# pickle.dump(mlp_test_scores, file)

In [None]:
# Load the Multi-layer Perceptron train scores from Drive
with open(f'{filepath}mlp_train_scores', 'rb') as f:
  mlp_train_scores = pickle.load(f)
  f.close()

In [None]:
# Load the Multi-layer Perceptron test scores from Drive
with open(f'{filepath}mlp_test_scores', 'rb') as f:
  mlp_test_scores = pickle.load(f)
  f.close()

In [None]:
# Load the Multi-layer Perceptron models from Drive
with open(f'{filepath}repeat_mlp_classifiers', 'rb') as f:
  repeat_mlp_classifiers = pickle.load(f)
  f.close()

In [None]:
# Load the Multi-layer Perceptron predictions from Drive
with open(f'{filepath}repeat_mlp_predictions', 'rb') as f:
  repeat_mlp_predictions = pickle.load(f)
  f.close()

In [None]:
# Data for boxplots
data = [
    mlp_test_accuracies,
    mlp_test_precision_scores,
    mlp_test_roc_scores,
    mlp_test_recall_scores,
    mlp_test_f1_scores
]

# Labels for boxplots
labels = ['Test Accuracy', 'Precision', 'ROC-AUC', 'Recall', 'F1 Score']

# Boxplot
plt.figure(figsize=(10, 6))
plt.boxplot(data, labels=labels)
plt.xlabel('Metrics')
plt.ylabel('Score')
plt.title('Multi-layer Perceptron Model Evaluation')
plt.show()

NameError: ignored

In [None]:
# Combine the testing precision scores for both models
all_test_precision_scores = [rf_test_precision_scores, mlp_test_precision_scores]

# Create a boxplot to visualize the testing accuracies for each model
plt.figure(figsize=(8, 6))
plt.boxplot(all_test_precision_scores, labels=['Random Forest', 'MLP'])
plt.xlabel('Model')
plt.ylabel('Testing Precision')
plt.title('Testing Precision Scores for Random Forest and MLP')
plt.show()

NameError: ignored

## Model Explanation

We now start gathering the base set of counterfactuals for each model using existing methods. A set of parameters are set below, to keep consistent across the different methods. immutable_features indicate the features that should not change.

In [None]:
# Gender, Age, Race and Marital Status are immutable
immutable_features = ['gender', 'mr']
immutable_feature_idxs = [3,4]
idxs_to_vary = [1,2,5,6]
features_to_vary = ['weight', 'units', 'duration', 'meal', 'peak']

# Feature names
feature_names = df.columns[:-1]
feature_idxs = np.arange(len(feature_names))

# Get the categorical features from the data frame
categorical_feature_idxs = [3,5]
categorical_features = ['gender', 'meal']

# All features before late_coursework are continuous
continuous_feature_idxs = [0,1,2,4,6]
continuous_features = ['weight', 'units', 'duration', 'mr', 'peak']

# Target variable
class_name = "limit"
desired_idx = 1 # 1 indicates that the user likely is under the limit

We produce counterfactuals that explain how to move from an undesirable outcomes to a desirable outcome. Therefore, we prepare a dataset of desirable and undesirable query cases.

In [None]:
# Initialise the lists to hold the positive and negative instances for each repeat
repeat_mlp_positive_instances, repeat_mlp_negative_instances = [], []

# Initialise the lists to hold the positive and negative predictions for each repeat
repeat_mlp_positive_predictions, repeat_mlp_negative_predictions = [], []

# Get the positive and negative instances and predictions for each repetition
for repetition in range(len(repeats)):

  # Initialise the indices that will be used to filter the dataset
  mlp_positive_idxs, mlp_negative_idxs = [], []

  # Find the positive and negative cases
  for i, instance in enumerate(repeat_test_data[repetition].iterrows()):

    # Filter our misclassified samples from the Multi-layer Perceptron data
    if(repeat_mlp_predictions[repetition][i] == repeat_y_train[repetition].to_numpy()[i]):
      if(repeat_mlp_predictions[repetition][i] == desired_idx):
        mlp_positive_idxs.append(i)
      else:
        mlp_negative_idxs.append(i)

  # Filter out the positive instances, keep the negative
  mlp_negative_instances = repeat_test_data[repetition].iloc[mlp_negative_idxs]

  # Filter out the negative instances, keep the positive
  mlp_positive_instances = repeat_test_data[repetition].iloc[mlp_positive_idxs]

  # Since all negative predictions should be 0, we can use np.zeros()
  mlp_negative_predictions = np.ones(len(mlp_negative_idxs))

  # Since all desired predictions should be 1, we can use np.ones()
  # rf_positive_predictions = np.ones(len(rf_positive_idxs))
  mlp_positive_predictions = np.zeros(len(mlp_positive_idxs))

  # Store the Multi-layer Perceptron positive and negative instances and predictions
  repeat_mlp_positive_instances.append(mlp_positive_instances)
  repeat_mlp_negative_instances.append(mlp_negative_instances)
  repeat_mlp_positive_predictions.append(mlp_positive_predictions)
  repeat_mlp_negative_predictions.append(mlp_negative_predictions)

  # Print the filtered set sizes
  # print(f'Number of Random Forest Positive Cases: {len(rf_positive_instances)}, Negative Cases: {len(rf_negative_instances)}')
  print(f'Number of Multi-layer Perceptron Positive Cases: {len(mlp_positive_instances)}, Negative Cases: {len(mlp_negative_instances)}\n')

Number of Multi-layer Perceptron Positive Cases: 56, Negative Cases: 107

Number of Multi-layer Perceptron Positive Cases: 55, Negative Cases: 92

Number of Multi-layer Perceptron Positive Cases: 53, Negative Cases: 86

Number of Multi-layer Perceptron Positive Cases: 57, Negative Cases: 89

Number of Multi-layer Perceptron Positive Cases: 59, Negative Cases: 90

Number of Multi-layer Perceptron Positive Cases: 51, Negative Cases: 94

Number of Multi-layer Perceptron Positive Cases: 53, Negative Cases: 102

Number of Multi-layer Perceptron Positive Cases: 44, Negative Cases: 99

Number of Multi-layer Perceptron Positive Cases: 43, Negative Cases: 104

Number of Multi-layer Perceptron Positive Cases: 40, Negative Cases: 96

Number of Multi-layer Perceptron Positive Cases: 49, Negative Cases: 108

Number of Multi-layer Perceptron Positive Cases: 59, Negative Cases: 83

Number of Multi-layer Perceptron Positive Cases: 63, Negative Cases: 88

Number of Multi-layer Perceptron Positive Cases

In [None]:
def get_bounds(data):
  feature_ranges = []
  for i in range(len(data[0])):

      LOWER_BOUND = min(data[:, i])
      UPPER_BOUND = max(data[:, i])

      feature_ranges.append([LOWER_BOUND, UPPER_BOUND])

  return feature_ranges

In [None]:
from scipy.spatial.distance import euclidean
import heapq

def nearest_unlike_neighbors(data, labels, data_instance, target_class):
        unlike_neighbors = []
        distances = []

        for i, instance in enumerate(data):
            if target_class == labels[i]:
                distance = euclidean(data_instance, instance)
                distances.append((distance, i))  # Store both distance and index

        # Use heapq to efficiently find the n smallest distances and their corresponding indices
        smallest_distances = heapq.nsmallest(1, distances)

        # Get the actual instances for the smallest distances
        for distance, index in smallest_distances:
            neighbor = data[index]
            unlike_neighbors.append(neighbor)

        return random.choice(unlike_neighbors)

### DiCE

3 separate configurations of DiCE are used in our experiments, Random Search, Genetic Algorithm and KD Tree. We felt that although their fitness function is similar, their optimisation strategies are significantly different. As such, we believe that the counterfactuals produced by the different configurations will not signficantly bias the results of aggregation.

In [None]:
# DiCE outputs counterfactual as JSON string, so need to convert it back to an object
def extract_dice_counterfactual(dice_explanation):
  # JSON-formatted string
  json_string = dice_explanation.to_json()

  # Convert the JSON string to a Python dictionary
  data_dict = json.loads(json_string)

  return data_dict['cfs_list'][0][0]

In [None]:
# Generate a DiCE counterfactual for each instance in the filtered set
def get_dice_explanations(dice_data, dice_model, dice_test_data, method,  dice_feature_ranges = {}, verbose=False):

  # DiCE explanation instance
  dice_instance = dice_ml.Dice(dice_data, dice_model, method=method)

  # Initialise the DiCE counterfactuals list
  dice_counterfactuals = []

  # Get counterfactual for every test instance
  for i, _ in enumerate(dice_test_data.iterrows()):

    try:
      # Get the explanation
      dice_explanation = dice_instance.generate_counterfactuals(dice_test_data[i:i+1], total_CFs=1, desired_class='opposite', features_to_vary=features_to_vary, permitted_range=dice_feature_ranges)

      # Extract the counterfactual from the DiCE instance
      dice_counterfactual = extract_dice_counterfactual(dice_explanation)

      # Remove the last column as it's the class
      dice_counterfactuals.append(dice_counterfactual[:-1])

    except:
      # For debugging, print if no counterfactual was found
      if(verbose):
        print(f"No counterfactual found for instance: {i}")

      # Append empty array as placeholder to filter out later
      dice_counterfactuals.append([np.zeros(len(dice_test_data.to_numpy()[0][:-1]))])

  return dice_counterfactuals

In [None]:
# Initialise the lists for storing the DiCE counterfactuals of each repetition
# repeat_dice_rf_counterfactuals = []
repeat_dice_mlp_counterfactuals = []

# Get the DiCE counterfactuals for each repetition
for repetition in range(len(repeats)):

  # Log repetition
  print(f'Repetition {repetition}')

  # Reset seed to ensure reproducability
  np.random.seed(seed)
  random.seed(seed)
  tf.random.set_seed(seed)

  bounds = get_bounds(repeat_train_data[repetition].to_numpy())
  dice_feature_ranges = {}

  for i, col in enumerate(df.columns[:6]):
    if i not in immutable_feature_idxs and i not in categorical_feature_idxs:
      dice_feature_ranges[col] = bounds[i]

  # Prepare data for DiCE
  dice_train_data = pd.DataFrame(data=repeat_train_data[repetition], columns=df.columns)
  # dice_rf_test_data = pd.DataFrame(data=repeat_rf_negative_instances[repetition], columns=features_to_vary)
  dice_mlp_test_data = pd.DataFrame(data=repeat_mlp_negative_instances[repetition], columns=feature_names)

  # Initialise DiCE data
  dice_data_instance = dice_ml.Data(
      dataframe=dice_train_data,
      continuous_features=continuous_features,
      outcome_name=class_name
      )

  # Initialise DiCE models
  # dice_rf_model = dice_ml.Model(model=repeat_rf_classifiers[repetition], backend='sklearn')
  dice_mlp_model = dice_ml.Model(model=repeat_mlp_classifiers[repetition], backend='sklearn')

  # Get counterfactuals for every instance in the test set for both models
  # dice_rf_counterfactuals = get_dice_explanations(dice_data_instance, dice_rf_model, dice_rf_test_data, "random")
  dice_mlp_counterfactuals = get_dice_explanations(dice_data_instance, dice_mlp_model, dice_mlp_test_data, "random", dice_feature_ranges,)

  # Store the counterfactuals for this repetition
  # repeat_dice_rf_counterfactuals.append(dice_rf_counterfactuals)
  repeat_dice_mlp_counterfactuals.append(dice_mlp_counterfactuals)

  # Separate the logs for each repetition
  print('--------------------------')

Repetition 0


100%|██████████| 1/1 [00:00<00:00,  4.19it/s]
100%|██████████| 1/1 [00:00<00:00,  4.23it/s]
100%|██████████| 1/1 [00:00<00:00,  6.31it/s]
100%|██████████| 1/1 [00:00<00:00,  7.93it/s]
100%|██████████| 1/1 [00:00<00:00,  7.35it/s]
100%|██████████| 1/1 [00:00<00:00,  7.19it/s]
100%|██████████| 1/1 [00:00<00:00,  7.96it/s]
100%|██████████| 1/1 [00:00<00:00,  4.40it/s]
100%|██████████| 1/1 [00:00<00:00,  5.45it/s]
100%|██████████| 1/1 [00:00<00:00,  6.48it/s]
100%|██████████| 1/1 [00:00<00:00,  7.83it/s]
100%|██████████| 1/1 [00:00<00:00,  7.31it/s]
100%|██████████| 1/1 [00:00<00:00,  4.52it/s]
100%|██████████| 1/1 [00:00<00:00,  3.67it/s]
100%|██████████| 1/1 [00:00<00:00,  4.82it/s]
100%|██████████| 1/1 [00:00<00:00,  5.90it/s]
100%|██████████| 1/1 [00:00<00:00,  7.10it/s]
100%|██████████| 1/1 [00:00<00:00,  5.49it/s]
100%|██████████| 1/1 [00:00<00:00,  7.64it/s]
100%|██████████| 1/1 [00:00<00:00,  7.60it/s]
100%|██████████| 1/1 [00:00<00:00,  7.09it/s]
100%|██████████| 1/1 [00:00<00:00,

--------------------------
Repetition 1


100%|██████████| 1/1 [00:00<00:00,  6.43it/s]
100%|██████████| 1/1 [00:00<00:00,  6.61it/s]
100%|██████████| 1/1 [00:00<00:00,  7.84it/s]
100%|██████████| 1/1 [00:00<00:00,  7.85it/s]
100%|██████████| 1/1 [00:00<00:00,  7.89it/s]
100%|██████████| 1/1 [00:00<00:00,  7.40it/s]
100%|██████████| 1/1 [00:00<00:00,  8.17it/s]
100%|██████████| 1/1 [00:00<00:00,  8.48it/s]
100%|██████████| 1/1 [00:00<00:00,  8.56it/s]
100%|██████████| 1/1 [00:00<00:00,  8.24it/s]
100%|██████████| 1/1 [00:00<00:00,  8.40it/s]
100%|██████████| 1/1 [00:00<00:00,  8.83it/s]
100%|██████████| 1/1 [00:00<00:00,  8.53it/s]
100%|██████████| 1/1 [00:00<00:00,  8.36it/s]
100%|██████████| 1/1 [00:00<00:00,  8.36it/s]
100%|██████████| 1/1 [00:00<00:00,  8.67it/s]
100%|██████████| 1/1 [00:00<00:00,  8.51it/s]
100%|██████████| 1/1 [00:00<00:00,  6.14it/s]
100%|██████████| 1/1 [00:00<00:00,  8.51it/s]
100%|██████████| 1/1 [00:00<00:00,  7.89it/s]
100%|██████████| 1/1 [00:00<00:00,  8.31it/s]
100%|██████████| 1/1 [00:00<00:00,

--------------------------
Repetition 2


100%|██████████| 1/1 [00:00<00:00,  8.55it/s]
100%|██████████| 1/1 [00:00<00:00,  8.23it/s]
100%|██████████| 1/1 [00:00<00:00,  7.92it/s]
100%|██████████| 1/1 [00:00<00:00,  7.19it/s]
100%|██████████| 1/1 [00:00<00:00,  8.05it/s]
100%|██████████| 1/1 [00:00<00:00,  7.90it/s]
100%|██████████| 1/1 [00:00<00:00,  7.87it/s]
100%|██████████| 1/1 [00:00<00:00,  7.30it/s]
100%|██████████| 1/1 [00:00<00:00,  7.88it/s]
100%|██████████| 1/1 [00:00<00:00,  8.07it/s]
100%|██████████| 1/1 [00:00<00:00,  8.40it/s]
100%|██████████| 1/1 [00:00<00:00,  8.18it/s]
100%|██████████| 1/1 [00:00<00:00,  8.25it/s]
100%|██████████| 1/1 [00:00<00:00,  8.15it/s]
100%|██████████| 1/1 [00:00<00:00,  7.41it/s]
100%|██████████| 1/1 [00:00<00:00,  7.85it/s]
100%|██████████| 1/1 [00:00<00:00,  8.20it/s]
100%|██████████| 1/1 [00:00<00:00,  8.36it/s]
100%|██████████| 1/1 [00:00<00:00,  8.65it/s]
100%|██████████| 1/1 [00:00<00:00,  8.28it/s]
100%|██████████| 1/1 [00:00<00:00,  8.17it/s]
100%|██████████| 1/1 [00:00<00:00,

--------------------------
Repetition 3


100%|██████████| 1/1 [00:00<00:00,  7.21it/s]
100%|██████████| 1/1 [00:00<00:00,  4.96it/s]
100%|██████████| 1/1 [00:00<00:00,  7.57it/s]
100%|██████████| 1/1 [00:00<00:00,  6.48it/s]
100%|██████████| 1/1 [00:00<00:00,  5.79it/s]
100%|██████████| 1/1 [00:00<00:00,  8.56it/s]
100%|██████████| 1/1 [00:00<00:00,  8.37it/s]
100%|██████████| 1/1 [00:00<00:00,  8.45it/s]
100%|██████████| 1/1 [00:00<00:00,  8.42it/s]
100%|██████████| 1/1 [00:00<00:00,  8.54it/s]
100%|██████████| 1/1 [00:00<00:00,  7.76it/s]
100%|██████████| 1/1 [00:00<00:00,  8.14it/s]
100%|██████████| 1/1 [00:00<00:00,  8.18it/s]
100%|██████████| 1/1 [00:00<00:00,  8.35it/s]
100%|██████████| 1/1 [00:00<00:00,  7.70it/s]
100%|██████████| 1/1 [00:00<00:00,  8.36it/s]
100%|██████████| 1/1 [00:00<00:00,  8.69it/s]
100%|██████████| 1/1 [00:00<00:00,  8.71it/s]
100%|██████████| 1/1 [00:00<00:00,  7.85it/s]
100%|██████████| 1/1 [00:00<00:00,  8.28it/s]
100%|██████████| 1/1 [00:00<00:00,  8.52it/s]
100%|██████████| 1/1 [00:00<00:00,

--------------------------
Repetition 4


100%|██████████| 1/1 [00:00<00:00,  5.31it/s]
100%|██████████| 1/1 [00:00<00:00,  4.55it/s]
100%|██████████| 1/1 [00:00<00:00,  6.26it/s]
100%|██████████| 1/1 [00:00<00:00,  5.58it/s]
100%|██████████| 1/1 [00:00<00:00,  7.73it/s]
100%|██████████| 1/1 [00:00<00:00,  7.22it/s]
100%|██████████| 1/1 [00:00<00:00,  6.31it/s]
100%|██████████| 1/1 [00:00<00:00,  3.96it/s]
100%|██████████| 1/1 [00:00<00:00,  2.76it/s]
100%|██████████| 1/1 [00:00<00:00,  3.26it/s]
100%|██████████| 1/1 [00:00<00:00,  5.10it/s]
100%|██████████| 1/1 [00:00<00:00,  6.61it/s]
100%|██████████| 1/1 [00:00<00:00,  5.13it/s]
100%|██████████| 1/1 [00:00<00:00,  2.28it/s]
100%|██████████| 1/1 [00:00<00:00,  3.56it/s]
100%|██████████| 1/1 [00:00<00:00,  7.59it/s]
100%|██████████| 1/1 [00:00<00:00,  6.20it/s]
100%|██████████| 1/1 [00:00<00:00,  4.13it/s]
100%|██████████| 1/1 [00:00<00:00,  2.51it/s]
100%|██████████| 1/1 [00:00<00:00,  2.92it/s]
100%|██████████| 1/1 [00:00<00:00,  7.08it/s]
100%|██████████| 1/1 [00:00<00:00,

--------------------------
Repetition 5


100%|██████████| 1/1 [00:00<00:00,  7.13it/s]
100%|██████████| 1/1 [00:00<00:00,  4.25it/s]
100%|██████████| 1/1 [00:00<00:00,  4.60it/s]
100%|██████████| 1/1 [00:00<00:00,  4.34it/s]
100%|██████████| 1/1 [00:00<00:00,  5.30it/s]
100%|██████████| 1/1 [00:00<00:00,  4.75it/s]
100%|██████████| 1/1 [00:00<00:00,  6.46it/s]
100%|██████████| 1/1 [00:00<00:00,  5.64it/s]
100%|██████████| 1/1 [00:00<00:00,  6.38it/s]
100%|██████████| 1/1 [00:00<00:00,  7.08it/s]
100%|██████████| 1/1 [00:00<00:00,  5.25it/s]
100%|██████████| 1/1 [00:00<00:00,  6.79it/s]
100%|██████████| 1/1 [00:00<00:00,  7.42it/s]
100%|██████████| 1/1 [00:00<00:00,  8.37it/s]
100%|██████████| 1/1 [00:00<00:00,  4.60it/s]
100%|██████████| 1/1 [00:00<00:00,  4.32it/s]
100%|██████████| 1/1 [00:00<00:00,  3.97it/s]
100%|██████████| 1/1 [00:00<00:00,  6.30it/s]
100%|██████████| 1/1 [00:00<00:00,  6.07it/s]
100%|██████████| 1/1 [00:00<00:00,  4.22it/s]
100%|██████████| 1/1 [00:00<00:00,  6.34it/s]
100%|██████████| 1/1 [00:00<00:00,

--------------------------
Repetition 6


100%|██████████| 1/1 [00:00<00:00,  8.25it/s]
100%|██████████| 1/1 [00:00<00:00,  7.94it/s]
100%|██████████| 1/1 [00:00<00:00,  8.09it/s]
100%|██████████| 1/1 [00:00<00:00,  7.86it/s]
100%|██████████| 1/1 [00:00<00:00,  6.33it/s]
100%|██████████| 1/1 [00:00<00:00,  6.61it/s]
100%|██████████| 1/1 [00:00<00:00,  5.79it/s]
100%|██████████| 1/1 [00:00<00:00,  5.59it/s]
100%|██████████| 1/1 [00:00<00:00,  4.99it/s]
100%|██████████| 1/1 [00:00<00:00,  4.32it/s]
100%|██████████| 1/1 [00:00<00:00,  4.31it/s]
100%|██████████| 1/1 [00:00<00:00,  4.37it/s]
100%|██████████| 1/1 [00:00<00:00,  7.05it/s]
100%|██████████| 1/1 [00:00<00:00,  4.17it/s]
100%|██████████| 1/1 [00:00<00:00,  6.75it/s]
100%|██████████| 1/1 [00:00<00:00,  7.27it/s]
100%|██████████| 1/1 [00:00<00:00,  4.44it/s]
100%|██████████| 1/1 [00:00<00:00,  5.53it/s]
100%|██████████| 1/1 [00:00<00:00,  3.82it/s]
100%|██████████| 1/1 [00:00<00:00,  5.01it/s]
100%|██████████| 1/1 [00:00<00:00,  5.63it/s]
100%|██████████| 1/1 [00:00<00:00,

--------------------------
Repetition 7


100%|██████████| 1/1 [00:00<00:00,  8.30it/s]
100%|██████████| 1/1 [00:00<00:00,  8.25it/s]
100%|██████████| 1/1 [00:00<00:00,  5.71it/s]
100%|██████████| 1/1 [00:00<00:00,  6.87it/s]
100%|██████████| 1/1 [00:00<00:00,  6.91it/s]
100%|██████████| 1/1 [00:00<00:00,  4.57it/s]
100%|██████████| 1/1 [00:00<00:00,  4.32it/s]
100%|██████████| 1/1 [00:00<00:00,  4.94it/s]
100%|██████████| 1/1 [00:00<00:00,  2.91it/s]
100%|██████████| 1/1 [00:00<00:00,  8.02it/s]
100%|██████████| 1/1 [00:00<00:00,  4.25it/s]
100%|██████████| 1/1 [00:00<00:00,  4.28it/s]
100%|██████████| 1/1 [00:00<00:00,  5.59it/s]
100%|██████████| 1/1 [00:00<00:00,  6.02it/s]
100%|██████████| 1/1 [00:00<00:00,  7.83it/s]
100%|██████████| 1/1 [00:00<00:00,  5.40it/s]
100%|██████████| 1/1 [00:00<00:00,  5.37it/s]
100%|██████████| 1/1 [00:00<00:00,  4.04it/s]
100%|██████████| 1/1 [00:00<00:00,  4.62it/s]
100%|██████████| 1/1 [00:00<00:00,  3.97it/s]
100%|██████████| 1/1 [00:00<00:00,  4.84it/s]
100%|██████████| 1/1 [00:00<00:00,

--------------------------
Repetition 8


100%|██████████| 1/1 [00:00<00:00,  7.86it/s]
100%|██████████| 1/1 [00:00<00:00,  7.67it/s]
100%|██████████| 1/1 [00:00<00:00,  6.80it/s]
100%|██████████| 1/1 [00:00<00:00,  7.88it/s]
100%|██████████| 1/1 [00:00<00:00,  8.20it/s]
100%|██████████| 1/1 [00:00<00:00,  7.73it/s]
100%|██████████| 1/1 [00:00<00:00,  5.94it/s]
100%|██████████| 1/1 [00:00<00:00,  6.57it/s]
100%|██████████| 1/1 [00:00<00:00,  4.81it/s]
100%|██████████| 1/1 [00:00<00:00,  6.58it/s]
100%|██████████| 1/1 [00:00<00:00,  4.26it/s]
100%|██████████| 1/1 [00:00<00:00,  5.14it/s]
100%|██████████| 1/1 [00:00<00:00,  4.76it/s]
100%|██████████| 1/1 [00:00<00:00,  5.27it/s]
100%|██████████| 1/1 [00:00<00:00,  5.23it/s]
100%|██████████| 1/1 [00:00<00:00,  6.11it/s]
100%|██████████| 1/1 [00:00<00:00,  7.87it/s]
100%|██████████| 1/1 [00:00<00:00,  4.21it/s]
100%|██████████| 1/1 [00:00<00:00,  6.88it/s]
100%|██████████| 1/1 [00:00<00:00,  6.51it/s]
100%|██████████| 1/1 [00:00<00:00,  8.09it/s]
100%|██████████| 1/1 [00:00<00:00,

--------------------------
Repetition 9


100%|██████████| 1/1 [00:00<00:00,  7.74it/s]
100%|██████████| 1/1 [00:00<00:00,  8.32it/s]
100%|██████████| 1/1 [00:00<00:00,  8.49it/s]
100%|██████████| 1/1 [00:00<00:00,  8.48it/s]
100%|██████████| 1/1 [00:00<00:00,  8.53it/s]
100%|██████████| 1/1 [00:00<00:00,  8.48it/s]
100%|██████████| 1/1 [00:00<00:00,  8.28it/s]
100%|██████████| 1/1 [00:00<00:00,  8.34it/s]
100%|██████████| 1/1 [00:00<00:00,  7.34it/s]
100%|██████████| 1/1 [00:00<00:00,  8.24it/s]
100%|██████████| 1/1 [00:00<00:00,  6.58it/s]
100%|██████████| 1/1 [00:00<00:00,  7.73it/s]
100%|██████████| 1/1 [00:00<00:00,  6.80it/s]
100%|██████████| 1/1 [00:00<00:00,  4.60it/s]
100%|██████████| 1/1 [00:00<00:00,  7.00it/s]
100%|██████████| 1/1 [00:00<00:00,  7.19it/s]
100%|██████████| 1/1 [00:00<00:00,  7.37it/s]
100%|██████████| 1/1 [00:00<00:00,  7.16it/s]
100%|██████████| 1/1 [00:00<00:00,  7.08it/s]
100%|██████████| 1/1 [00:00<00:00,  6.86it/s]
100%|██████████| 1/1 [00:00<00:00,  6.05it/s]
100%|██████████| 1/1 [00:00<00:00,

--------------------------
Repetition 10


100%|██████████| 1/1 [00:00<00:00,  7.65it/s]
100%|██████████| 1/1 [00:00<00:00,  7.70it/s]
100%|██████████| 1/1 [00:00<00:00,  7.98it/s]
100%|██████████| 1/1 [00:00<00:00,  8.04it/s]
100%|██████████| 1/1 [00:00<00:00,  8.19it/s]
100%|██████████| 1/1 [00:00<00:00,  5.90it/s]
100%|██████████| 1/1 [00:00<00:00,  8.05it/s]
100%|██████████| 1/1 [00:00<00:00,  8.49it/s]
100%|██████████| 1/1 [00:00<00:00,  7.76it/s]
100%|██████████| 1/1 [00:00<00:00,  7.55it/s]
100%|██████████| 1/1 [00:00<00:00,  6.86it/s]
100%|██████████| 1/1 [00:00<00:00,  7.37it/s]
100%|██████████| 1/1 [00:00<00:00,  7.80it/s]
100%|██████████| 1/1 [00:00<00:00,  6.86it/s]
100%|██████████| 1/1 [00:00<00:00,  7.93it/s]
100%|██████████| 1/1 [00:00<00:00,  8.22it/s]
100%|██████████| 1/1 [00:00<00:00,  7.95it/s]
100%|██████████| 1/1 [00:00<00:00,  7.87it/s]
100%|██████████| 1/1 [00:00<00:00,  7.79it/s]
100%|██████████| 1/1 [00:00<00:00,  8.26it/s]
100%|██████████| 1/1 [00:00<00:00,  8.09it/s]
100%|██████████| 1/1 [00:00<00:00,

--------------------------
Repetition 11


100%|██████████| 1/1 [00:00<00:00,  8.12it/s]
100%|██████████| 1/1 [00:00<00:00,  7.90it/s]
100%|██████████| 1/1 [00:00<00:00,  7.80it/s]
100%|██████████| 1/1 [00:00<00:00,  8.33it/s]
100%|██████████| 1/1 [00:00<00:00,  7.22it/s]
100%|██████████| 1/1 [00:00<00:00,  8.06it/s]
100%|██████████| 1/1 [00:00<00:00,  8.16it/s]
100%|██████████| 1/1 [00:00<00:00,  8.19it/s]
100%|██████████| 1/1 [00:00<00:00,  8.13it/s]
100%|██████████| 1/1 [00:00<00:00,  8.02it/s]
100%|██████████| 1/1 [00:00<00:00,  7.84it/s]
100%|██████████| 1/1 [00:00<00:00,  7.78it/s]
100%|██████████| 1/1 [00:00<00:00,  7.36it/s]
100%|██████████| 1/1 [00:00<00:00,  5.81it/s]
100%|██████████| 1/1 [00:00<00:00,  8.03it/s]
100%|██████████| 1/1 [00:00<00:00,  7.85it/s]
100%|██████████| 1/1 [00:00<00:00,  7.90it/s]
100%|██████████| 1/1 [00:00<00:00,  7.96it/s]
100%|██████████| 1/1 [00:00<00:00,  7.69it/s]
100%|██████████| 1/1 [00:00<00:00,  7.28it/s]
100%|██████████| 1/1 [00:00<00:00,  7.54it/s]
100%|██████████| 1/1 [00:00<00:00,

--------------------------
Repetition 12


100%|██████████| 1/1 [00:00<00:00,  8.25it/s]
100%|██████████| 1/1 [00:00<00:00,  6.65it/s]
100%|██████████| 1/1 [00:00<00:00,  7.91it/s]
100%|██████████| 1/1 [00:00<00:00,  8.03it/s]
100%|██████████| 1/1 [00:00<00:00,  6.80it/s]
100%|██████████| 1/1 [00:00<00:00,  7.41it/s]
100%|██████████| 1/1 [00:00<00:00,  8.20it/s]
100%|██████████| 1/1 [00:00<00:00,  8.16it/s]
100%|██████████| 1/1 [00:00<00:00,  7.43it/s]
100%|██████████| 1/1 [00:00<00:00,  8.02it/s]
100%|██████████| 1/1 [00:00<00:00,  7.24it/s]
100%|██████████| 1/1 [00:00<00:00,  6.97it/s]
100%|██████████| 1/1 [00:00<00:00,  6.32it/s]
100%|██████████| 1/1 [00:00<00:00,  7.14it/s]
100%|██████████| 1/1 [00:00<00:00,  7.60it/s]
100%|██████████| 1/1 [00:00<00:00,  7.40it/s]
100%|██████████| 1/1 [00:00<00:00,  7.69it/s]
100%|██████████| 1/1 [00:00<00:00,  7.66it/s]
100%|██████████| 1/1 [00:00<00:00,  5.59it/s]
100%|██████████| 1/1 [00:00<00:00,  7.89it/s]
100%|██████████| 1/1 [00:00<00:00,  7.97it/s]
100%|██████████| 1/1 [00:00<00:00,

--------------------------
Repetition 13


100%|██████████| 1/1 [00:00<00:00,  8.07it/s]
100%|██████████| 1/1 [00:00<00:00,  7.50it/s]
100%|██████████| 1/1 [00:00<00:00,  8.34it/s]
100%|██████████| 1/1 [00:00<00:00,  8.21it/s]
100%|██████████| 1/1 [00:00<00:00,  7.99it/s]
100%|██████████| 1/1 [00:00<00:00,  7.85it/s]
100%|██████████| 1/1 [00:00<00:00,  8.42it/s]
100%|██████████| 1/1 [00:00<00:00,  7.85it/s]
100%|██████████| 1/1 [00:00<00:00,  7.36it/s]
100%|██████████| 1/1 [00:00<00:00,  8.26it/s]
100%|██████████| 1/1 [00:00<00:00,  7.94it/s]
100%|██████████| 1/1 [00:00<00:00,  7.97it/s]
100%|██████████| 1/1 [00:00<00:00,  7.55it/s]
100%|██████████| 1/1 [00:00<00:00,  8.07it/s]
100%|██████████| 1/1 [00:00<00:00,  7.82it/s]
100%|██████████| 1/1 [00:00<00:00,  7.78it/s]
100%|██████████| 1/1 [00:00<00:00,  8.29it/s]
100%|██████████| 1/1 [00:00<00:00,  7.91it/s]
100%|██████████| 1/1 [00:00<00:00,  8.28it/s]
100%|██████████| 1/1 [00:00<00:00,  7.88it/s]
100%|██████████| 1/1 [00:00<00:00,  8.01it/s]
100%|██████████| 1/1 [00:00<00:00,

--------------------------
Repetition 14


100%|██████████| 1/1 [00:00<00:00,  6.43it/s]
100%|██████████| 1/1 [00:00<00:00,  7.29it/s]
100%|██████████| 1/1 [00:00<00:00,  8.49it/s]
100%|██████████| 1/1 [00:00<00:00,  8.49it/s]
100%|██████████| 1/1 [00:00<00:00,  8.31it/s]
100%|██████████| 1/1 [00:00<00:00,  8.03it/s]
100%|██████████| 1/1 [00:00<00:00,  8.22it/s]
100%|██████████| 1/1 [00:00<00:00,  7.94it/s]
100%|██████████| 1/1 [00:00<00:00,  7.73it/s]
100%|██████████| 1/1 [00:00<00:00,  8.11it/s]
100%|██████████| 1/1 [00:00<00:00,  8.12it/s]
100%|██████████| 1/1 [00:00<00:00,  7.60it/s]
100%|██████████| 1/1 [00:00<00:00,  7.26it/s]
100%|██████████| 1/1 [00:00<00:00,  8.37it/s]
100%|██████████| 1/1 [00:00<00:00,  8.23it/s]
100%|██████████| 1/1 [00:00<00:00,  7.46it/s]
100%|██████████| 1/1 [00:00<00:00,  8.34it/s]
100%|██████████| 1/1 [00:00<00:00,  8.28it/s]
100%|██████████| 1/1 [00:00<00:00,  8.26it/s]
100%|██████████| 1/1 [00:00<00:00,  8.22it/s]
100%|██████████| 1/1 [00:00<00:00,  8.11it/s]
100%|██████████| 1/1 [00:00<00:00,

--------------------------
Repetition 15


100%|██████████| 1/1 [00:00<00:00,  6.86it/s]
100%|██████████| 1/1 [00:00<00:00,  7.94it/s]
100%|██████████| 1/1 [00:00<00:00,  6.59it/s]
100%|██████████| 1/1 [00:00<00:00,  7.83it/s]
100%|██████████| 1/1 [00:00<00:00,  6.05it/s]
100%|██████████| 1/1 [00:00<00:00,  5.73it/s]
100%|██████████| 1/1 [00:00<00:00,  8.23it/s]
100%|██████████| 1/1 [00:00<00:00,  6.95it/s]
100%|██████████| 1/1 [00:00<00:00,  6.65it/s]
100%|██████████| 1/1 [00:00<00:00,  5.57it/s]
100%|██████████| 1/1 [00:00<00:00,  4.30it/s]
100%|██████████| 1/1 [00:00<00:00,  6.53it/s]
100%|██████████| 1/1 [00:00<00:00,  6.01it/s]
100%|██████████| 1/1 [00:00<00:00,  4.37it/s]
100%|██████████| 1/1 [00:00<00:00,  5.20it/s]
100%|██████████| 1/1 [00:00<00:00,  5.49it/s]
100%|██████████| 1/1 [00:00<00:00,  3.97it/s]
100%|██████████| 1/1 [00:00<00:00,  5.16it/s]
100%|██████████| 1/1 [00:00<00:00,  4.74it/s]
100%|██████████| 1/1 [00:00<00:00,  4.72it/s]
100%|██████████| 1/1 [00:00<00:00,  5.06it/s]
100%|██████████| 1/1 [00:00<00:00,

--------------------------
Repetition 16


100%|██████████| 1/1 [00:00<00:00,  7.44it/s]
100%|██████████| 1/1 [00:00<00:00,  7.60it/s]
100%|██████████| 1/1 [00:00<00:00,  5.62it/s]
100%|██████████| 1/1 [00:00<00:00,  8.06it/s]
100%|██████████| 1/1 [00:00<00:00,  7.59it/s]
100%|██████████| 1/1 [00:00<00:00,  8.11it/s]
100%|██████████| 1/1 [00:00<00:00,  8.04it/s]
100%|██████████| 1/1 [00:00<00:00,  8.19it/s]
100%|██████████| 1/1 [00:00<00:00,  7.97it/s]
100%|██████████| 1/1 [00:00<00:00,  8.29it/s]
100%|██████████| 1/1 [00:00<00:00,  8.34it/s]
100%|██████████| 1/1 [00:00<00:00,  7.35it/s]
100%|██████████| 1/1 [00:00<00:00,  8.07it/s]
100%|██████████| 1/1 [00:00<00:00,  7.41it/s]
100%|██████████| 1/1 [00:00<00:00,  8.28it/s]
100%|██████████| 1/1 [00:00<00:00,  7.84it/s]
100%|██████████| 1/1 [00:00<00:00,  8.10it/s]
100%|██████████| 1/1 [00:00<00:00,  8.07it/s]
100%|██████████| 1/1 [00:00<00:00,  7.35it/s]
100%|██████████| 1/1 [00:00<00:00,  8.16it/s]
100%|██████████| 1/1 [00:00<00:00,  7.58it/s]
100%|██████████| 1/1 [00:00<00:00,

--------------------------
Repetition 17


100%|██████████| 1/1 [00:00<00:00,  8.06it/s]
100%|██████████| 1/1 [00:00<00:00,  7.41it/s]
100%|██████████| 1/1 [00:00<00:00,  8.20it/s]
100%|██████████| 1/1 [00:00<00:00,  8.07it/s]
100%|██████████| 1/1 [00:00<00:00,  8.28it/s]
100%|██████████| 1/1 [00:00<00:00,  7.73it/s]
100%|██████████| 1/1 [00:00<00:00,  6.17it/s]
100%|██████████| 1/1 [00:00<00:00,  8.40it/s]
100%|██████████| 1/1 [00:00<00:00,  8.59it/s]
100%|██████████| 1/1 [00:00<00:00,  8.45it/s]
100%|██████████| 1/1 [00:00<00:00,  8.36it/s]
100%|██████████| 1/1 [00:00<00:00,  8.11it/s]
100%|██████████| 1/1 [00:00<00:00,  7.68it/s]
100%|██████████| 1/1 [00:00<00:00,  7.96it/s]
100%|██████████| 1/1 [00:00<00:00,  8.14it/s]
100%|██████████| 1/1 [00:00<00:00,  7.49it/s]
100%|██████████| 1/1 [00:00<00:00,  7.43it/s]
100%|██████████| 1/1 [00:00<00:00,  8.22it/s]
100%|██████████| 1/1 [00:00<00:00,  7.13it/s]
100%|██████████| 1/1 [00:00<00:00,  7.68it/s]
100%|██████████| 1/1 [00:00<00:00,  6.88it/s]
100%|██████████| 1/1 [00:00<00:00,

--------------------------
Repetition 18


100%|██████████| 1/1 [00:00<00:00,  6.43it/s]
100%|██████████| 1/1 [00:00<00:00,  8.59it/s]
100%|██████████| 1/1 [00:00<00:00,  7.06it/s]
100%|██████████| 1/1 [00:00<00:00,  7.96it/s]
100%|██████████| 1/1 [00:00<00:00,  8.05it/s]
100%|██████████| 1/1 [00:00<00:00,  7.61it/s]
100%|██████████| 1/1 [00:00<00:00,  8.52it/s]
100%|██████████| 1/1 [00:00<00:00,  8.29it/s]
100%|██████████| 1/1 [00:00<00:00,  8.40it/s]
100%|██████████| 1/1 [00:00<00:00,  8.30it/s]
100%|██████████| 1/1 [00:00<00:00,  8.36it/s]
100%|██████████| 1/1 [00:00<00:00,  6.60it/s]
100%|██████████| 1/1 [00:00<00:00,  7.89it/s]
100%|██████████| 1/1 [00:00<00:00,  7.93it/s]
100%|██████████| 1/1 [00:00<00:00,  7.91it/s]
100%|██████████| 1/1 [00:00<00:00,  7.46it/s]
100%|██████████| 1/1 [00:00<00:00,  7.83it/s]
100%|██████████| 1/1 [00:00<00:00,  7.91it/s]
100%|██████████| 1/1 [00:00<00:00,  8.32it/s]
100%|██████████| 1/1 [00:00<00:00,  7.12it/s]
100%|██████████| 1/1 [00:00<00:00,  8.55it/s]
100%|██████████| 1/1 [00:00<00:00,

--------------------------
Repetition 19


100%|██████████| 1/1 [00:00<00:00,  5.85it/s]
100%|██████████| 1/1 [00:00<00:00,  5.80it/s]
100%|██████████| 1/1 [00:00<00:00,  8.52it/s]
100%|██████████| 1/1 [00:00<00:00,  8.46it/s]
100%|██████████| 1/1 [00:00<00:00,  7.18it/s]
100%|██████████| 1/1 [00:00<00:00,  7.48it/s]
100%|██████████| 1/1 [00:00<00:00,  7.05it/s]
100%|██████████| 1/1 [00:00<00:00,  8.24it/s]
100%|██████████| 1/1 [00:00<00:00,  8.11it/s]
100%|██████████| 1/1 [00:00<00:00,  8.01it/s]
100%|██████████| 1/1 [00:00<00:00,  8.49it/s]
100%|██████████| 1/1 [00:00<00:00,  7.99it/s]
100%|██████████| 1/1 [00:00<00:00,  7.15it/s]
100%|██████████| 1/1 [00:00<00:00,  7.68it/s]
100%|██████████| 1/1 [00:00<00:00,  7.83it/s]
100%|██████████| 1/1 [00:00<00:00,  8.27it/s]
100%|██████████| 1/1 [00:00<00:00,  8.43it/s]
100%|██████████| 1/1 [00:00<00:00,  8.35it/s]
100%|██████████| 1/1 [00:00<00:00,  8.39it/s]
100%|██████████| 1/1 [00:00<00:00,  7.54it/s]
100%|██████████| 1/1 [00:00<00:00,  8.28it/s]
100%|██████████| 1/1 [00:00<00:00,

--------------------------





In [None]:
# # Save Random Forest DiCE counterfactuals to Drive
# file = open(f'{filepath}repeat_dice_rf_counterfactuals', 'wb')
# pickle.dump(repeat_dice_rf_counterfactuals, file)

# Save Multi-layer Perceptron DiCE counterfactuals to Drive
# file = open(f'{filepath}repeat_dice_mlp_counterfactuals', 'wb')
# pickle.dump(repeat_dice_mlp_counterfactuals, file)

In [None]:
# # Load the Random Forest DiCE counterfactuals from Drive
# with open(f'{filepath}repeat_dice_rf_counterfactuals', 'rb') as f:
#   repeat_dice_rf_counterfactuals = pickle.load(f)
#   f.close()

# Load the Multi-layer Perceptron DiCE counterfactuals from Drive
with open(f'{filepath}repeat_dice_mlp_counterfactuals', 'rb') as f:
  repeat_dice_mlp_counterfactuals = pickle.load(f)
  f.close()

### DisCERN

In [None]:
# Function to get a DisCERN counterfactual explanation for each instance in the filtered set
def get_discern_explanations(discern_instance, test_data, predictions, train_data, train_predictions):
  discern_counterfactuals = []

  # Generate a DisCERN counterfactual for each instance in the filtered set
  for test_instance, test_label in zip(test_data, predictions):

    # Reset seed to ensure reproducability
    np.random.seed(seed)
    random.seed(seed)
    tf.random.set_seed(seed)

    try:
     (
      discern_counterfactual,
      disern_label,
      discern_sparsity,
      discern_proximity
     ) = discern_instance.find_cf(test_instance, test_label, cf_label='opposite')
     discern_counterfactuals.append(discern_counterfactual)
    except Exception as error:
      # Print message for debugging if no counterfactual found
      print(f"No counterfactual found for {test_instance}")

      print(f'Error: {error}')
      nun = nearest_unlike_neighbors(train_data, train_predictions, test_instance, desired_idx)
      print(f'Nun: {nun}')
      discern_counterfactuals.append(nun)

  return discern_counterfactuals

In [None]:
# Initialise the lists to store the DisCERN counterfactuals for each repetition
# repeat_discern_rf_counterfactuals = []
repeat_discern_mlp_counterfactuals = []

# Get the DisCERN counterfactuals for each repetition
for repetition in range(len(repeats)):

  # Log repetition
  print(f'Repetition {repetition + 1}')

  # Reset seed to ensure reproducability
  np.random.seed(seed)
  random.seed(seed)
  tf.random.set_seed(seed)

  train_predictions = repeat_mlp_classifiers[repetition].predict(repeat_train_data[repetition])

  # Instantiate the DisCERN classes
  # discern_rf = DisCERNTabular(repeat_rf_classifiers[repetition], "LIME")
  discern_mlp = DisCERNTabular(repeat_mlp_classifiers[repetition], "LIME")

  # Initialise the DisCERN instances
  # discern_rf.init_data(repeat_train_data[repetition].to_numpy(), repeat_y_train[repetition].to_numpy(), df.columns[:-1], [class_name], cat_feature_indices=categorical_feature_idxs, immutable_feature_indices=immutable_feature_idxs)
  discern_mlp.init_data(repeat_train_data[repetition].to_numpy(), repeat_y_train[repetition].to_numpy(), df.columns[:-1], [class_name], cat_feature_indices=categorical_feature_idxs, immutable_feature_indices=immutable_feature_idxs)

  # # Get DisCERN explanations for Random Forest
  # discern_rf_counterfactuals = get_discern_explanations(
  #     discern_rf,
  #     repeat_rf_negative_instances[repetition].to_numpy(),
  #     repeat_rf_negative_predictions[repetition]
  #     )

  # Get DisCERN explanations for Multi-layer Perceptron
  discern_mlp_counterfactuals = get_discern_explanations(
      discern_mlp,
      repeat_mlp_negative_instances[repetition].to_numpy(),
      repeat_mlp_negative_predictions[repetition],
      repeat_train_data[repetition].to_numpy(),
      train_predictions
      )

  # # Save counterfactuals for the repetition
  # repeat_discern_rf_counterfactuals.append(discern_rf_counterfactuals)
  repeat_discern_mlp_counterfactuals.append(discern_mlp_counterfactuals)

  # Separate repetition logs
  print('----------------------------')

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
74.0, 11.0, 225.0, 0.0, 0.017, 0.0, 0.0
81.0, 9.0, 225.0, 0.0, 0.017, 0.0, 0.0
new_label:  0 nun_label:  0 test_label:  1.0
44.0, 11.0, 315.0, 1.0, 0.015, 0.0, 0.0
45.0, 9.0, 315.0, 0.0, 0.017, 0.0, 0.0
new_label:  0 nun_label:  0 test_label:  1.0
102.0, 11.0, 135.0, 0.0, 0.017, 0.0, 0.0
100.0, 11.0, 135.0, 0.0, 0.017, 0.0, 0.0
new_label:  0 nun_label:  0 test_label:  1.0
No counterfactual found for [1.00e+02 1.10e+01 1.35e+02 0.00e+00 1.70e-02 0.00e+00 0.00e+00]
Error: float division by zero
Nun: [9.90e+01 4.00e+00 1.35e+02 1.00e+00 1.50e-02 0.00e+00 0.00e+00]
47.0, 10.0, 120.0, 0.0, 0.017, 1.0, 30.0
46.0, 9.0, 120.0, 1.0, 0.015, 1.0, 30.0
new_label:  0 nun_label:  0 test_label:  1.0
41.0, 8.0, 255.0, 0.0, 0.017, 0.0, 0.0
41.0, 6.0, 255.0, 0.0, 0.017, 0.0, 0.0
new_label:  0 nun_label:  0 test_label:  1.0
No counterfactual found for [4.10e+01 6.00e+00 2.55e+02 0.00e+00 1.70e-02 0.00e+00 0.00e+00]
Error: float division by 

In [None]:
# # Save Random Forest DisCERN counterfactuals to Drive
# file = open(f'{filepath}repeat_discern_rf_counterfactuals', 'wb')
# pickle.dump(repeat_discern_rf_counterfactuals, file)

# # Save Multi-layer Perceptron DisCERN counterfactuals to Drive
# file = open(f'{filepath}repeat_discern_mlp_counterfactuals', 'wb')
# pickle.dump(repeat_discern_mlp_counterfactuals, file)

In [None]:
# # Load the Random Forest DisCERN counterfactuals from Drive
# with open(f'{filepath}repeat_discern_rf_counterfactuals', 'rb') as f:
#   repeat_discern_rf_counterfactuals = pickle.load(f)
#   f.close()

# Load the Multi-layer Perceptron DisCERN counterfactuals from Drive
with open(f'{filepath}repeat_discern_mlp_counterfactuals', 'rb') as f:
  repeat_discern_mlp_counterfactuals = pickle.load(f)
  f.close()

### NICE

In [None]:
# Initialise the lists to store the NICE counterfactuals for each repetition
repeat_nice_mlp_counterfactuals = []

# Initialise the avg time to find NICE counterfactuals
repeat_avg_time = 0

# Get the NICE Multi-layer Perceptron counterfactuals for each repetition
for repetition in range(len(repeats)):

  # Log the repetition
  print(f'Repetition {repetition}')

  # Reset seed to ensure reproducability
  np.random.seed(seed)
  random.seed(seed)
  tf.random.set_seed(seed)

  # Initialize NICE for Multi-layer Perceptron
  nice_mlp_explainer = NICE(
      X_train=repeat_train_data[repetition].to_numpy(),
      predict_fn=repeat_mlp_classifiers[repetition].predict_proba,
      y_train=repeat_y_train[repetition],
      cat_feat=categorical_feature_idxs,
      num_feat=continuous_feature_idxs,
      justified_cf=True,
      optimization='proximity'
  )

  # Initialise the NICE Multi-layer Perceptron counterfactuals list
  nice_mlp_counterfactuals = []

  # We will calculate total time to find all NICE counterfactuals
  total_time = 0

  # Generate a NICE counterfactual for each undesirable outcome for Multi-layer Perceptron
  for i in range(len(repeat_mlp_negative_instances[repetition])):

    # The query instance
    instance_to_explain = repeat_mlp_negative_instances[repetition].to_numpy()[i]

    # Explain the instance
    start_time = time.time()
    nice_counterfactual = nice_mlp_explainer.explain(np.array(instance_to_explain).reshape(1, -1))

    # Calculate time taken
    end_time = time.time()
    time_elapsed = end_time - start_time
    total_time += time_elapsed

    nice_mlp_counterfactuals.append(nice_counterfactual[0])

  # Calculate average time taken
  avg_time = total_time / len(repeat_mlp_negative_instances[repetition])
  repeat_avg_time += avg_time
  print(f'Average time taken to find counterfactual: {avg_time}')

  # Store the NICE counterfactuals for this repetition
  repeat_nice_mlp_counterfactuals.append(nice_mlp_counterfactuals)

  # Separate the repetition logs
  print('---------------------------------------')

# Calculate average time taken over all repetitions
repeat_avg_time = repeat_avg_time / len(repeats)

Repetition 0
Average time taken to find counterfactual: 0.005568428574321426
---------------------------------------
Repetition 1
Average time taken to find counterfactual: 0.0023902006771253505
---------------------------------------
Repetition 2
Average time taken to find counterfactual: 0.004055641418279603
---------------------------------------
Repetition 3
Average time taken to find counterfactual: 0.004907996466990267
---------------------------------------
Repetition 4
Average time taken to find counterfactual: 0.0038479434119330514
---------------------------------------
Repetition 5
Average time taken to find counterfactual: 0.004904407135983731
---------------------------------------
Repetition 6
Average time taken to find counterfactual: 0.004974430682612401
---------------------------------------
Repetition 7
Average time taken to find counterfactual: 0.005558558184691149
---------------------------------------
Repetition 8
Average time taken to find counterfactual: 0.0024

In [None]:
repeat_nice_mlp_counterfactuals

[[array([ 44. ,   1. ,   0. , 130. , 249. ,   0. ,   0. , 144. ,   0. ,
           0.8,   2. ,   0. ,   3. ]),
  array([ 59. ,   1. ,   3. , 170. , 293. ,   0. ,   0. , 159. ,   0. ,
           1.2,   1. ,   2. ,   3. ]),
  array([ 51. ,   0. ,   2. , 120. , 295. ,   0. ,   0. , 126. ,   0. ,
           0.6,   2. ,   3. ,   2. ]),
  array([ 57. ,   1. ,   1. , 154. , 232. ,   0. ,   0. , 126. ,   0. ,
           0.8,   2. ,   1. ,   2. ]),
  array([ 59. ,   1. ,   0. , 134. , 204. ,   0. ,   1. , 162. ,   0. ,
           0.8,   2. ,   2. ,   2. ]),
  array([ 39. ,   1. ,   2. ,  94. , 199. ,   0. ,   1. , 126. ,   0. ,
           0.8,   2. ,   3. ,   2. ]),
  array([ 56. ,   0. ,   1. , 140. , 294. ,   0. ,   0. , 103. ,   0. ,
           1.3,   1. ,   0. ,   2. ]),
  array([ 54.,   1.,   1., 125., 309.,   0.,   1., 156.,   0.,   1.,   2.,
           2.,   3.]),
  array([ 59.,   0.,   0., 174., 249.,   0.,   1., 123.,   1.,   0.,   1.,
           0.,   2.]),
  array([ 51. ,   1. ,   2.

In [None]:
# # Save Multi-layer Perceptron NICE counterfactuals to Drive
# file = open(f'{filepath}repeat_nice_mlp_counterfactuals', 'wb')
# pickle.dump(repeat_nice_mlp_counterfactuals, file)

In [None]:
# Load the Multi-layer Perceptron NICE counterfactuals from Drive
with open(f'{filepath}repeat_nice_mlp_counterfactuals', 'rb') as f:
  repeat_discern_mlp_counterfactuals = pickle.load(f)
  f.close()

#### Wachter CF

In [None]:
base_counterfactuals = np.array([])
weights = [1,0,0] # We only want random instances to run Wachter CF
feature_ranges = {}
verbose=2
num_generations = 100

In [None]:
repeat_wachter_mlp_counterfactuals = []

for repetition in range(len(repeats)):
  print(f'Repetition: {repetition}')
  predict_fn = lambda instance: repeat_mlp_classifiers[repetition].predict([instance])[0]
  predict_proba_fn = lambda instance: repeat_mlp_classifiers[repetition].predict_proba([instance])[0]
  class_labels = repeat_y_train[repetition]
  data = repeat_train_data[repetition].to_numpy()
  train_predictions = repeat_mlp_classifiers[repetition].predict(data)
  wachter_mlp_counterfactuals = []
  for i in range(len(repeat_mlp_negative_instances[repetition])):
    data_to_explain = repeat_mlp_negative_instances[repetition].to_numpy()[i]
    wachter = Manic(
        data_to_explain,
        base_counterfactuals,
        categorical_feature_idxs,
        immutable_feature_idxs,
        feature_ranges,
        data,
        predict_fn,
        predict_proba_fn,
        class_labels,
        weights=weights,
        wachter=True,
        verbose=verbose,
        num_generations=300,
        population_size=100,
        feature_entropy=0.3,
        perturbation_fraction=0.1,
        early_stopping={'found': False, 'patience_generations': 10},
        sparse=False
        )
    wachter_mlp_counterfactual_instance = wachter.generate_counterfactuals()
    wachter_mlp_counterfactual = wachter_mlp_counterfactual_instance['best_counterfactual']

    if(wachter_mlp_counterfactual != None):
      wachter_mlp_counterfactuals.append(wachter_mlp_counterfactual)
    else:
      nun = nearest_unlike_neighbors(data, train_predictions, data_to_explain, desired_idx)
      wachter_mlp_counterfactuals.append(nun)

  repeat_wachter_mlp_counterfactuals.append(wachter_mlp_counterfactuals)
  print('----------------------------------------')

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
Generation 25: Best Counterfactual = [69.0, 6.4, 214.3, 0, 0.017, 0, 0.1], Fitness = 0.0038347592299969434
Generation 26: Best Counterfactual = [69.0, 6.4, 214.3, 0, 0.017, 0, 0.1], Fitness = 0.0038347592299969434
Generation 27: Best Counterfactual = [69.9, 6.4, 318.9, 0, 0.017, 0, 0.1], Fitness = 0.0038283343519008896
Generation 28: Best Counterfactual = [69.9, 6.4, 318.9, 0, 0.017, 0, 0.1], Fitness = 0.0038283343519008896
Generation 29: Best Counterfactual = [69.9, 6.4, 318.9, 0, 0.017, 0, 0.1], Fitness = 0.0038283343519008896
Generation 30: Best Counterfactual = [69.9, 6.4, 318.9, 0, 0.017, 0, 0.1], Fitness = 0.0038283343519008896
Generation 31: Best Counterfactual = [69.9, 6.6, 318.9, 0, 0.017, 0, 0.1], Fitness = 0.0038063148139644066
Generation 32: Best Counterfactual = [69.9, 6.6, 318.9, 0, 0.017, 0, 0.1], Fitness = 0.0038063148139644066
Generation 33: Best Counterfactual = [69.9, 6.6, 318.9, 0, 0.017, 0, 0.1], Fitn

In [None]:
len(repeat_wachter_mlp_counterfactuals[0])

107

In [None]:
import warnings
warnings.filterwarnings("ignore")

In [None]:
file = open(f'repeat_wachter_mlp_counterfactuals', 'wb')
pickle.dump(repeat_wachter_mlp_counterfactuals, file)

In [None]:
# Load the Multi-layer Perceptron NICE counterfactuals from Drive
with open(f'{filepath}repeat_wachter_mlp_counterfactuals', 'rb') as f:
  repeat_wachter_mlp_counterfactuals = pickle.load(f)
  f.close()

FileNotFoundError: ignored

### Experiments

In [None]:
def normalize_instance(instance, bounds):
  normalized_instance = []

  for i, (min_val, max_val) in enumerate(bounds):
      # Normalize the feature value using min-max scaling
      feature_value = instance[i]
      if min_val == max_val:
          normalized_instance.append(feature_value)
      else:
          normalized_value = (feature_value - min_val) / (max_val - min_val)
          normalized_instance.append(normalized_value)

  return normalized_instance

In [None]:
from scipy.spatial.distance import cityblock
def calculate_manhattan_distance(instance1, instance2):
    return cityblock(instance1, instance2)

In [None]:
def most_rad_counterfactual(base_counterfactuals, labels, data_instance, data):

        cf_actions = []
        for base_counterfactual in base_counterfactuals:
            actions = []
            for feature in range(len(data_instance)):
                difference = data_instance[feature] - base_counterfactual[feature]

                if(difference < 0):
                    actions.append("INC")
                elif(difference > 0):
                    actions.append("DEC")
                else:
                    actions.append("NONE")
                cf_actions.append(actions)
        direction_overlap_scores = np.zeros(len(base_counterfactuals))

        for i in range(len(base_counterfactuals)):
            for j in range(i + 1, len(base_counterfactuals)):
                # print(data_instance)
                # print(i,j)
                # print(base_counterfactuals[i])
                # print(base_counterfactuals[j])
                matching_count = sum(1 for counterfactual_i_feature, counterfactual_j_feature in zip(cf_actions[i], cf_actions[j]) if counterfactual_i_feature == counterfactual_j_feature)
                # print(matching_count)
                direction_overlap = 1 - (matching_count / len(data_instance))
                # print(direction_overlap)
                bounds = get_bounds(data)
                normalized_i = normalize_instance(base_counterfactuals[i], bounds)
                normalized_j = normalize_instance(base_counterfactuals[j], bounds)
                manhattan_distance = cityblock(normalized_i, normalized_j)
                # print(manhattan_distance)
                manhattan_direction_overlap = (0.3 * manhattan_distance) + (0.7 * direction_overlap)
                # print(manhattan_direction_overlap)
                direction_overlap_scores[i] += manhattan_direction_overlap
                direction_overlap_scores[j] += manhattan_direction_overlap
                # print(direction_overlap_scores)

        # print(direction_overlap_scores)

        average_direction_overlap_scores = [item / (len(base_counterfactuals) - 1) for item in direction_overlap_scores]
        # print(average_direction_overlap_scores)
        best_index = np.argsort(average_direction_overlap_scores)[0]
        best_counterfactual = base_counterfactuals[best_index]
        best_counterfactual_label = labels[best_index]

        return best_counterfactual

### Selection of Counterfactual Based On Proximity

In [None]:
repeat_mlp_proximal_counterfactuals = []

for repetition in range(len(repeats)):
  print(f'Repetition: {repetition + 1}')
  predict_fn = lambda instance: repeat_mlp_classifiers[repetition].predict([instance])[0]
  predict_proba_fn = lambda instance: repeat_mlp_classifiers[repetition].predict_proba([instance])[0]
  class_labels = repeat_y_train[repetition]
  data = repeat_train_data[repetition].to_numpy()
  mlp_proximal_counterfactuals = []
  for i in range(len(repeat_mlp_negative_instances[repetition])):
    base_counterfactuals = [repeat_discern_mlp_counterfactuals[repetition][i], np.array(repeat_dice_mlp_counterfactuals[repetition][i]).astype('float'), repeat_nice_mlp_counterfactuals[repetition][i], repeat_wachter_mlp_counterfactuals[repetition][i]]
    labels = ['Discern', 'Dice', 'Nice', 'Wachter']
    idx=0
    for base_counterfactual, base_counterfactual_label in zip(base_counterfactuals, ['Discern', 'Dice', 'Nice', 'Wachter']):
      if all(value == 0 for value in base_counterfactual):
        base_counterfactuals.pop(idx)
        labels.pop(idx)
        print(f'Removing {base_counterfactual_label} as does not have counterfactual.')
        idx += 1

    data_to_explain = repeat_mlp_negative_instances[repetition].to_numpy()[i]
    baseline = Manic(
        data_to_explain,
        base_counterfactuals,
        categorical_feature_idxs,
        immutable_feature_idxs,
        feature_ranges,
        data,
        predict_fn,
        predict_proba_fn,
        class_labels,
        weights=weights,
        wachter=False,
        verbose=verbose,
        num_generations=num_generations,
        labels=labels
        )

    mlp_proximal_counterfactual = baseline.baseline.most_proximal_counterfactual()
    mlp_proximal_counterfactuals.append(mlp_proximal_counterfactual)

  repeat_mlp_proximal_counterfactuals.append(mlp_proximal_counterfactuals)
  print('----------------------------------------')

Repetition: 1
----------------------------------------
Repetition: 2
----------------------------------------
Repetition: 3
----------------------------------------
Repetition: 4
----------------------------------------
Repetition: 5
----------------------------------------
Repetition: 6
----------------------------------------
Repetition: 7
----------------------------------------
Repetition: 8
----------------------------------------
Repetition: 9
----------------------------------------
Repetition: 10
----------------------------------------
Repetition: 11
----------------------------------------
Repetition: 12
----------------------------------------
Repetition: 13
----------------------------------------
Repetition: 14
----------------------------------------
Repetition: 15
----------------------------------------
Repetition: 16
----------------------------------------
Repetition: 17
----------------------------------------
Repetition: 18
----------------------------------------
R

In [None]:
# Save Multi-layer Perceptron proximity metacounterfactuals to Drive
file = open(f'{filepath}repeat_mlp_proximal_counterfactuals', 'wb')
pickle.dump(repeat_mlp_proximal_counterfactuals, file)

In [None]:
# Load the Multi-layer Perceptron proximity metacounterfactuals from Drive
with open(f'{filepath}repeat_mlp_proximal_counterfactuals', 'rb') as f:
  repeat_mlp_proximal_counterfactuals = pickle.load(f)
  f.close()

FileNotFoundError: ignored

### Selection of Counterfactual Based On Disagreement

In [None]:
repeat_mlp_agreeable_counterfactuals = []
repeat_labels = []
for repetition in range(len(repeats)):
  print(f'Repetition: {repetition + 1}')
  predict_fn = lambda instance: repeat_mlp_classifiers[repetition].predict([instance])[0]
  predict_proba_fn = lambda instance: repeat_mlp_classifiers[repetition].predict_proba([instance])[0]
  class_labels = repeat_y_train[repetition]
  data = repeat_train_data[repetition].to_numpy()
  mlp_agreeable_counterfactuals = []
  mlp_agreeable_counterfactuals_labels = []
  for i in range(len(repeat_mlp_negative_instances[repetition])):
    base_counterfactuals = [repeat_discern_mlp_counterfactuals[repetition][i], np.array(repeat_dice_mlp_counterfactuals[repetition][i]).astype('float'), repeat_nice_mlp_counterfactuals[repetition][i], repeat_wachter_mlp_counterfactuals[repetition][i]]
    labels = ['Discern', 'Dice', 'Nice', 'Wachter']
    idx=0
    for base_counterfactual, base_counterfactual_label in zip(base_counterfactuals, labels):
      if all(value == 0 for value in base_counterfactual):
        base_counterfactuals.pop(idx)
        labels.pop(idx)
        print(f'Removing {base_counterfactual_label} as does not have counterfactual.')
        idx += 1

    data_to_explain = repeat_mlp_negative_instances[repetition].to_numpy()[i]
    # baseline = Manic(
    #     data_to_explain,
    #     base_counterfactuals,
    #     categorical_feature_idxs,
    #     immutable_feature_idxs,
    #     feature_ranges,
    #     data,
    #     predict_fn,
    #     predict_proba_fn,
    #     class_labels,
    #     weights=weights,
    #     wachter=False,
    #     verbose=verbose,
    #     num_generations=num_generations,
    #     labels=labels,
    #     disagreement_method="direction_overlap"
    #     )

    mlp_agreeable_counterfactual = most_rad_counterfactual(base_counterfactuals, labels, data_to_explain, repeat_train_data[repetition].to_numpy())
    # break
    mlp_agreeable_counterfactuals.append(mlp_agreeable_counterfactual)
    # mlp_agreeable_counterfactuals_labels.append(label)

#   print(mlp_agreeable_counterfactuals_labels)
  repeat_mlp_agreeable_counterfactuals.append(mlp_agreeable_counterfactuals)
#   repeat_labels.append(mlp_agreeable_counterfactuals_labels)
  print('----------------------------------------')

Repetition: 1
----------------------------------------
Repetition: 2
----------------------------------------
Repetition: 3
----------------------------------------
Repetition: 4
----------------------------------------
Repetition: 5
----------------------------------------
Repetition: 6
----------------------------------------
Repetition: 7
----------------------------------------
Repetition: 8
----------------------------------------
Repetition: 9
----------------------------------------
Repetition: 10
----------------------------------------
Repetition: 11
----------------------------------------
Repetition: 12
----------------------------------------
Repetition: 13
----------------------------------------
Repetition: 14
----------------------------------------
Repetition: 15
----------------------------------------
Repetition: 16
----------------------------------------
Repetition: 17
----------------------------------------
Repetition: 18
----------------------------------------
R

In [None]:
file = open(f'{filepath}repeat_mlp_agreeable_counterfactuals', 'wb')
pickle.dump(repeat_mlp_agreeable_counterfactuals, file)

### Selection of Counterfactual Based on Sparsity

In [None]:
repeat_mlp_sparse_counterfactuals = []

for repetition in range(len(repeats)):
  print(f'Repetition: {repetition + 1}')
  predict_fn = lambda instance: repeat_mlp_classifiers[repetition].predict([instance])[0]
  predict_proba_fn = lambda instance: repeat_mlp_classifiers[repetition].predict_proba([instance])[0]
  class_labels = repeat_y_train[repetition]
  data = repeat_train_data[repetition].to_numpy()
  mlp_sparse_counterfactuals = []

  for i in range(len(repeat_mlp_negative_instances[repetition])):
    base_counterfactuals = [repeat_discern_mlp_counterfactuals[repetition][i], np.array(repeat_dice_mlp_counterfactuals[repetition][i]).astype('float'), repeat_nice_mlp_counterfactuals[repetition][i], repeat_wachter_mlp_counterfactuals[repetition][i]]

    labels = ['Discern', 'Dice', 'Nice', 'Wachter']
    idx=0
    for base_counterfactual, base_counterfactual_label in zip(base_counterfactuals, labels):
      if all(value == 0 for value in base_counterfactual):
        base_counterfactuals.pop(idx)
        labels.pop(idx)
        print(f'Removing {base_counterfactual_label} as does not have counterfactual.')
        idx += 1

    data_to_explain = repeat_mlp_negative_instances[repetition].to_numpy()[i]
    baseline = Manic(
        data_to_explain,
        base_counterfactuals,
        categorical_feature_idxs,
        immutable_feature_idxs,
        feature_ranges,
        data,
        predict_fn,
        predict_proba_fn,
        class_labels,
        weights=weights,
        wachter=False,
        verbose=verbose,
        num_generations=num_generations,
        labels=labels
        )

    mlp_sparse_counterfactual = baseline.baseline.most_sparse_counterfactual()
    mlp_sparse_counterfactuals.append(mlp_sparse_counterfactual)

  repeat_mlp_sparse_counterfactuals.append(mlp_sparse_counterfactuals)
  print('----------------------------------------')

Repetition: 1
----------------------------------------
Repetition: 2
----------------------------------------
Repetition: 3
----------------------------------------
Repetition: 4
----------------------------------------
Repetition: 5
----------------------------------------
Repetition: 6
----------------------------------------
Repetition: 7
----------------------------------------
Repetition: 8
----------------------------------------
Repetition: 9
----------------------------------------
Repetition: 10
----------------------------------------
Repetition: 11
----------------------------------------
Repetition: 12
----------------------------------------
Repetition: 13
----------------------------------------
Repetition: 14
----------------------------------------
Repetition: 15
----------------------------------------
Repetition: 16
----------------------------------------
Repetition: 17
----------------------------------------
Repetition: 18
----------------------------------------
R

In [None]:
# file = open(f'{filepath}repeat_mlp_sparse_counterfactuals', 'wb')
# pickle.dump(repeat_mlp_sparse_counterfactuals, file)

In [None]:
# Load the Multi-layer Perceptron sparsity metacounterfactuals from Drive
with open(f'{filepath}repeat_mlp_sparse_counterfactuals', 'rb') as f:
  repeat_mlp_sparse_counterfactuals = pickle.load(f)
  f.close()

EOFError: ignored

### Selection of Counterfactual Based on Rank Average

In [None]:
def average_metacounterfactual(data_instance, base_counterfactuals, labels, data):
  sparsity_scores = []
  proximity_scores = []
  disagreement_scores = []

  bounds = get_bounds(data)

  direction_overlap_scores = np.zeros(len(base_counterfactuals))

  cf_actions =[]
  for base_counterfactual in base_counterfactuals:
    actions = []
    for feature in range(len(data_instance)):
        difference = data_instance[feature] - base_counterfactual[feature]

        if(difference < 0):
            actions.append("INC")
        elif(difference > 0):
            actions.append("DEC")
        else:
            actions.append("NONE")
        cf_actions.append(actions)

  for i in range(len(base_counterfactuals)):

    sparsity_score = calculate_sparsity(data_instance, base_counterfactuals[i])

    normalized_counterfactual = normalize_instance(base_counterfactuals[i], bounds)
    normalized_instance = normalize_instance(data_instance, bounds)
    proximity_score = cityblock(normalized_instance, normalized_counterfactual)

    for j in range(i + 1, len(base_counterfactuals)):
      matching_count = sum(1 for counterfactual_i_feature, counterfactual_j_feature in zip(cf_actions[i], cf_actions[j]) if counterfactual_i_feature == counterfactual_j_feature)
      direction_overlap = 1 - (matching_count / len(data_instance))

      bounds = get_bounds(data)
      normalized_i = normalize_instance(base_counterfactuals[i], bounds)
      normalized_j = normalize_instance(base_counterfactuals[j], bounds)

      manhattan_distance = cityblock(normalized_i, normalized_j)

      manhattan_direction_overlap = (0.3 * manhattan_distance) + (0.7 * direction_overlap)

      direction_overlap_scores[i] += manhattan_direction_overlap
      direction_overlap_scores[j] += manhattan_direction_overlap

    average_direction_overlap_scores = [item / (len(base_counterfactuals) - 1) for item in direction_overlap_scores]

    sparsity_scores.append(sparsity_score)
    proximity_scores.append(proximity_score)
    disagreement_scores = average_direction_overlap_scores

  sparsity_ranking = np.argsort(sparsity_scores)
  proximity_ranking = np.argsort(proximity_scores)
  disagreement_ranking = np.argsort(disagreement_scores)

  average_ranking = np.mean([sparsity_ranking, proximity_ranking, disagreement_ranking], axis=0)
  best_counterfactual_index = np.argmin(average_ranking)

  return base_counterfactuals[best_counterfactual_index]

In [None]:
repeat_mlp_average_counterfactuals = []

for repetition in range(len(repeats)):
  print(f'Repetition: {repetition + 1}')
  predict_fn = lambda instance: repeat_mlp_classifiers[repetition].predict([instance])[0]
  predict_proba_fn = lambda instance: repeat_mlp_classifiers[repetition].predict_proba([instance])[0]
  class_labels = repeat_y_train[repetition]
  data = repeat_train_data[repetition].to_numpy()
  mlp_average_counterfactuals = []

  for i in range(len(repeat_mlp_negative_instances[repetition])):
    base_counterfactuals = [repeat_discern_mlp_counterfactuals[repetition][i], np.array(repeat_dice_mlp_counterfactuals[repetition][i]).astype('float'), repeat_nice_mlp_counterfactuals[repetition][i], repeat_wachter_mlp_counterfactuals[repetition][i]]

    labels = ['Discern', 'Dice', 'Nice', 'Wachter']
    idx=0
    for base_counterfactual, base_counterfactual_label in zip(base_counterfactuals, labels):
      if all(value == 0 for value in base_counterfactual):
        base_counterfactuals.pop(idx)
        labels.pop(idx)
        print(f'Removing {base_counterfactual_label} as does not have counterfactual.')
        idx += 1

    data_to_explain = repeat_mlp_negative_instances[repetition].to_numpy()[i]
    # baseline = Manic(
    #     data_to_explain,
    #     base_counterfactuals,
    #     categorical_feature_idxs,
    #     immutable_feature_idxs,
    #     feature_ranges,
    #     data,
    #     predict_fn,
    #     predict_proba_fn,
    #     class_labels,
    #     weights=weights,
    #     wachter=False,
    #     verbose=verbose,
    #     num_generations=num_generations,
    #     labels=labels
    #     )

    mlp_average_counterfactual = average_metacounterfactual(data_to_explain, base_counterfactuals, labels, repeat_train_data[repetition].to_numpy())
    mlp_average_counterfactuals.append(mlp_average_counterfactual)

  repeat_mlp_average_counterfactuals.append(mlp_average_counterfactuals)
  print('----------------------------------------')

Repetition: 1
----------------------------------------
Repetition: 2
----------------------------------------
Repetition: 3
----------------------------------------
Repetition: 4
----------------------------------------
Repetition: 5
----------------------------------------
Repetition: 6
----------------------------------------
Repetition: 7
----------------------------------------
Repetition: 8
----------------------------------------
Repetition: 9
----------------------------------------
Repetition: 10
----------------------------------------
Repetition: 11
----------------------------------------
Repetition: 12
----------------------------------------
Repetition: 13
----------------------------------------
Repetition: 14
----------------------------------------
Repetition: 15
----------------------------------------
Repetition: 16
----------------------------------------
Repetition: 17
----------------------------------------
Repetition: 18
----------------------------------------
R

In [None]:
file = open(f'{filepath}repeat_mlp_average_counterfactuals', 'wb')
pickle.dump(repeat_mlp_average_counterfactuals, file)

In [None]:
repeat_mlp_average_c

### Metacounterfactual Generation

In [None]:
repeat_mlp_manic_counterfactuals = []

for repetition in range(len(repeats)):
  print(f'Repetition: {repetition + 1}')
  predict_fn = lambda instance: repeat_mlp_classifiers[repetition].predict([instance])[0]
  predict_proba_fn = lambda instance: repeat_mlp_classifiers[repetition].predict_proba([instance])[0]
  class_labels = repeat_y_train[repetition]
  data = repeat_train_data[repetition].to_numpy()
  mlp_manic_counterfactuals = []

  train_predictions = repeat_mlp_classifiers[repetition].predict(repeat_train_data[repetition])

  for i in range(len(repeat_mlp_negative_instances[repetition])):
    base_counterfactuals = [np.array(repeat_discern_mlp_counterfactuals[repetition][i]), np.array(repeat_dice_mlp_counterfactuals[repetition][i]).astype('float'), np.array(repeat_nice_mlp_counterfactuals[repetition][i]), np.array(repeat_wachter_mlp_counterfactuals[repetition][i])]

    labels = ['Discern', 'Dice', 'Nice', 'Wachter']
    idx=0
    for base_counterfactual, base_counterfactual_label in zip(base_counterfactuals, labels):
      if all(value == 0 for value in base_counterfactual):
        base_counterfactuals.pop(idx)
        labels.pop(idx)
        print(f'Removing {base_counterfactual_label} as does not have counterfactual.')
        idx += 1
    bounds = get_bounds(repeat_train_data[repetition].to_numpy())
    feature_ranges = {}

    data_to_explain = repeat_mlp_negative_instances[repetition].to_numpy()[i]

    manic = Manic(
        data_to_explain,
        base_counterfactuals,
        categorical_feature_idxs,
        immutable_feature_idxs,
        feature_ranges,
        data,
        predict_fn,
        predict_proba_fn,
        class_labels.to_numpy(),
        weights=[1/3, 1/3, 1/3],
        wachter=False,
        # sparse=True,
        verbose=2,
        num_generations=300,
        labels=labels,
        gamma=0.35,
        alpha=0.35,
        theta=0.1,
        beta=0.2,
        disagreement_method="direction_overlap",
        feature_entropy=0.3,
        perturbation_fraction=0.1,
        early_stopping = {'found': False, 'patience_generations': 10})
    # manic_counterfactual = mlp_manic_counterfactuals['best_counterfactual']

    mlp_manic_counterfactual = manic.generate_counterfactuals()['best_counterfactual']
    if(mlp_manic_counterfactual == None):
        mlp_manic_counterfactual = nearest_unlike_neighbors(repeat_train_data[repetition].to_numpy(), train_predictions, data_to_explain, desired_idx)
    mlp_manic_counterfactuals.append(mlp_manic_counterfactual)

  repeat_mlp_manic_counterfactuals.append(mlp_manic_counterfactuals)
  print('----------------------------------------')

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
Sparsity: 0.14285714285714285
Number of changes made to produce the counterfactual: 1
Disagreement Score against Base Counterfactuals: 0.19807421150278295
Number of Generations: 17
Counterfactual found after 7 generations
Fitness Score: 0.12896907002369568
Time taken to find counterfactual: 0.5691 seconds
Total time searched: 1.3588 seconds
Total CPU cycles ran: 137.7700
------ End of Results ------

Generation 1: Best Counterfactual = None, Fitness = inf
Generation 2: Best Counterfactual = None, Fitness = inf
Generation 3: Best Counterfactual = None, Fitness = inf
Generation 4: Best Counterfactual = None, Fitness = inf
Generation 5: Best Counterfactual = None, Fitness = inf
Generation 6: Best Counterfactual = [76.0, 6.8, 143.7, 1, 0.015, 0, 0.0], Fitness = 0.12083000611967878
Generation 7: Best Counterfactual = [76.0, 6.8, 143.7, 1, 0.015, 0, 0.0], Fitness = 0.12083000611967878
Generation 8: Best Counterfactual = [76.0, 

In [None]:
# # Save Multi-layer Perceptron MANIC counterfactuals to Drive
file = open(f'{filepath}repeat_mlp_manic_counterfactuals', 'wb')
pickle.dump(repeat_mlp_manic_counterfactuals, file)

In [None]:
# # Save Multi-layer Perceptron MANIC counterfactuals to Drive
file = open(f'{filepath}repeat_mlp_wachter_counterfactuals', 'wb')
pickle.dump(repeat_wachter_mlp_counterfactuals, file)

In [None]:
file = open('/content/gdrive/MyDrive/Craig/diabetes/repeat_mlp_manic_counterfactuals_e03', 'wb')
pickle.dump(repeat_mlp_manic_counterfactuals, file)

In [None]:
# Load the Multi-layer Perceptron manic metacounterfactuals from Drive
with open('repeat_mlp_manic_counterfactuals', 'rb') as f:
  repeat_mlp_manic_counterfactuals = pickle.load(f)
  f.close()

## Evaluation

In [None]:
# with open('/content/gdrive/MyDrive/Craig/diabetes/repeat_wachter_mlp_counterfactuals', 'rb') as f:
#   repeat_wachter_mlp_counterfactuals = pickle.load(f)
#   f.close()

# with open('/content/gdrive/MyDrive/Craig/diabetes/repeat_dice_mlp_counterfactuals', 'rb') as f:
#   repeat_dice_mlp_counterfactuals = pickle.load(f)
#   f.close()

# with open('/content/gdrive/MyDrive/Craig/diabetes/repeat_discern_mlp_counterfactuals', 'rb') as f:
#   repeat_discern_mlp_counterfactuals = pickle.load(f)
#   f.close()

# with open('/content/repeat_mlp_agreeable_counterfactuals', 'rb') as f:
#   repeat_mcf_e_disagreement_mlp_counterfactuals = pickle.load(f)
#   f.close()

# with open('/content/repeat_mlp_average_counterfactuals', 'rb') as f:
#   repeat_mcf_e_all_mlp_counterfactuals = pickle.load(f)
#   f.close()

# with open('/content/repeat_mlp_manic_counterfactuals', 'rb') as f:
#   repeat_mcf_o_mlp_counterfactuals = pickle.load(f)
#   f.close()

In [None]:
def calculate_sparsity(counterfactual, data_instance):
        num_changes = 0

        for i in range(len(counterfactual)):
            if counterfactual[i] != data_instance[i]:
                num_changes += 1

        sparsity = num_changes / len(counterfactual)

        return sparsity

In [None]:
def calculate_direction_overlap(data_instance, counterfactual, base_counterfactuals, data):
  cf_actions = {}
  base_cf_actions = {}
  base_explainer_names = ["Dice", "DisCERN", "Nice", "Wachter"]

  actions = []
  for feature in range(len(data_instance)):
    difference = data_instance[feature] - counterfactual[feature]

    if(difference < 0):
        actions.append("INC")
    elif(difference > 0):
        actions.append("DEC")
    else:
        actions.append("NONE")
  cf_actions = actions

  for b, base_counterfactual in enumerate(base_counterfactuals):
    actions = []
    for feature in range(len(data_instance)):
      difference = data_instance[feature] - base_counterfactual[feature]

      if(difference < 0):
          actions.append("INC")
      elif(difference > 0):
          actions.append("DEC")
      else:
          actions.append("NONE")
    base_cf_actions[base_explainer_names[b]] = actions



  direction_overlap_scores = []

  bounds = get_bounds(data)

  for i in range(len(base_counterfactuals)):

    matching_count = sum(1 for counterfactual_feature, base_counterfactual_feature in zip(cf_actions, base_cf_actions[base_explainer_names[i]]) if counterfactual_feature == base_counterfactual_feature)
    # print(matching_count)
    direction_overlap = 1 - (matching_count / len(data_instance))

    normalised_counterfactual = normalize_instance(counterfactual, bounds)
    normalised_base_counterfactual = normalize_instance(base_counterfactuals[i], bounds)
    manhattan_distance = calculate_manhattan_distance(normalised_counterfactual, normalised_base_counterfactual) / len(data_instance)

    manhattan_direction_overlap = (0.3 * manhattan_distance) + (0.7 * direction_overlap)

    direction_overlap_scores.append(manhattan_direction_overlap)

  return np.mean(direction_overlap_scores)

In [None]:
import copy
def getCounterfactualResults(counterfactuals, labels, base_counterfactuals):
  explainers = counterfactuals
  sparsity_results = []
  proximity_results = []
  disagreement_results = []

  sparsity_averages = []
  proximity_averages = []
  disagreement_averages = []

  filter = []

  for i, explainer in enumerate(explainers):
    explainer_sparsity_total = 0
    explainer_proximity_total = 0
    explainer_disagreement_total = 0

    explainer_sparsity_averages = []
    explainer_proximity_averages = []
    explainer_disagreement_averages = []


    repetition_explainers = []

    for repetition in range(len(repeats)):
      repetition_sparsity_total = 0
      repetition_proximity_total = 0
      repetition_disagreement_total = 0

      bounds = get_bounds(np.array(repeat_train_data[repetition]))

      for j in range(len(repeat_mlp_negative_instances[repetition])):
        counterfactual = explainer[repetition][j]

        data_instance = repeat_mlp_negative_instances[repetition].to_numpy()[j]
        sparsity = calculate_sparsity(counterfactual, data_instance)

        normalised_data_instance = normalize_instance(data_instance, bounds)
        normalised_counterfactual = normalize_instance(counterfactual, bounds)

        proximity = calculate_manhattan_distance(normalised_counterfactual, normalised_data_instance) / len(data_instance)

        base_explainer_cfs = []
        for b in base_counterfactuals:
          base_explainer_cfs.append(b[repetition][j])

        # print(len(base_explainer_cfs), labels[i])
        total_disagreement = calculate_direction_overlap(data_instance, counterfactual, base_explainer_cfs, repeat_train_data[repetition].to_numpy())


        repetition_sparsity_total += sparsity
        repetition_proximity_total += proximity
        repetition_disagreement_total += total_disagreement

      repetition_sparsity_total = repetition_sparsity_total / len(repeat_mlp_negative_instances[repetition])
      repetition_proximity_total = repetition_proximity_total / len(repeat_mlp_negative_instances[repetition])
      repetition_disagreement_total = repetition_disagreement_total / len(repeat_mlp_negative_instances[repetition])

      explainer_sparsity_averages.append(repetition_sparsity_total)
      explainer_proximity_averages.append(repetition_proximity_total)
      explainer_disagreement_averages.append(repetition_disagreement_total)

      explainer_sparsity_total += repetition_sparsity_total
      explainer_proximity_total += repetition_proximity_total
      explainer_disagreement_total += repetition_disagreement_total

    explainer_sparsity_total = explainer_sparsity_total / len(repeats)
    explainer_proximity_total = explainer_proximity_total / len(repeats)
    explainer_disagreement_total = explainer_disagreement_total / len(repeats)


    sparsity_averages.append(explainer_sparsity_averages)
    proximity_averages.append(explainer_proximity_averages)
    disagreement_averages.append(explainer_disagreement_averages)


    sparsity_results.append(explainer_sparsity_total)
    proximity_results.append(explainer_proximity_total)
    disagreement_results.append(explainer_disagreement_total)

  return proximity_results, sparsity_results, disagreement_results

In [None]:
# eval_proximal = []
# eval_sparse = []
eval_agreeable = []
eval_average = []
eval_dice = []
for repetition in range(len(repeats)):
  # repeat_proximal = []
  # repeat_sparse = []
  repeat_agreeable = []
  repeat_average = []
  repeat_dice = []

  for i in range(len(repeat_mlp_negative_instances[repetition])):
    # repeat_proximal.append(repeat_mlp_proximal_counterfactuals[repetition][i][0])
    # repeat_sparse.append(repeat_mlp_sparse_counterfactuals[repetition][i][0])
    # repeat_agreeable.append(repeat_mcf_e_disageement_counterfactuals[repetition][i])
    # repeat_average.append(repeat_mcf_e_all_mlp_counterfactuals[repetition][i][0])
    repeat_dice.append(np.array(repeat_dice_mlp_counterfactuals[repetition][i]).astype('float'))

  # eval_proximal.append(repeat_proximal)
  # eval_sparse.append(repeat_sparse)
  eval_agreeable.append(repeat_agreeable)
  eval_average.append(repeat_average)
  eval_dice.append(repeat_dice)

In [None]:
eval_dice

[[array([9.1e+01, 5.0e+00, 6.8e+01, 1.0e+00, 1.5e-02, 0.0e+00, 0.0e+00]),
  array([7.90e+01, 5.00e+00, 2.23e+02, 0.00e+00, 1.70e-02, 0.00e+00,
         2.80e+01]),
  array([1.09e+02, 8.00e+00, 1.50e+02, 0.00e+00, 1.70e-02, 1.00e+00,
         3.00e+01]),
  array([4.10e+01, 1.00e+00, 3.33e+02, 0.00e+00, 1.70e-02, 0.00e+00,
         0.00e+00]),
  array([8.70e+01, 9.00e+00, 3.45e+02, 1.00e+00, 1.50e-02, 0.00e+00,
         1.80e+01]),
  array([5.10e+01, 3.00e+00, 1.35e+02, 0.00e+00, 1.70e-02, 0.00e+00,
         2.70e+01]),
  array([8.7e+01, 3.0e+00, 9.0e+01, 1.0e+00, 1.5e-02, 1.0e+00, 3.0e+01]),
  array([7.7e+01, 7.0e+00, 2.4e+02, 0.0e+00, 1.7e-02, 0.0e+00, 1.2e+01]),
  array([8.4e+01, 3.0e+00, 3.6e+02, 1.0e+00, 1.5e-02, 1.0e+00, 2.8e+01]),
  array([7.90e+01, 3.00e+00, 3.34e+02, 1.00e+00, 1.50e-02, 0.00e+00,
         0.00e+00]),
  array([4.0e+01, 1.0e+00, 3.0e+02, 1.0e+00, 1.5e-02, 1.0e+00, 2.5e+01]),
  array([6.80e+01, 2.00e+00, 3.15e+02, 0.00e+00, 1.70e-02, 1.00e+00,
         3.00e+01]),


In [None]:
labels = ['Dice', 'Discern', 'Nice', 'Wachter', 'Agreeable', 'Average', 'Manic']
proximity, sparsity, disagreement = getCounterfactualResults([eval_dice, repeat_discern_mlp_counterfactuals, repeat_nice_mlp_counterfactuals, repeat_wachter_mlp_counterfactuals, repeat_mlp_agreeable_counterfactuals, repeat_mlp_average_counterfactuals, repeat_mlp_manic_counterfactuals], labels,
 [eval_dice, repeat_discern_mlp_counterfactuals, repeat_nice_mlp_counterfactuals, repeat_wachter_mlp_counterfactuals])

for label in labels:
  label_idx = labels.index(label)
  print(f'Counterfactual: {label}, Proximity: {proximity[label_idx]:.3f}, Sparsity: {sparsity[label_idx]:.3f}, Disagreement: {disagreement[label_idx]:.3f}')

Counterfactual: Dice, Proximity: 0.133, Sparsity: 0.249, Disagreement: 0.173
Counterfactual: Discern, Proximity: 0.068, Sparsity: 0.216, Disagreement: 0.183
Counterfactual: Nice, Proximity: 0.060, Sparsity: 0.188, Disagreement: 0.148
Counterfactual: Wachter, Proximity: 0.154, Sparsity: 0.523, Disagreement: 0.234
Counterfactual: Agreeable, Proximity: 0.063, Sparsity: 0.202, Disagreement: 0.146
Counterfactual: Average, Proximity: 0.047, Sparsity: 0.191, Disagreement: 0.172
Counterfactual: Manic, Proximity: 0.092, Sparsity: 0.227, Disagreement: 0.171


#### Statistical Testing

In [None]:
stat_sparsity_averages = []

for explainer in range(len(explainers)):
  for repetition in range(len(repeats)):
    for i in range(len(repeat_mlp_negative_instances[repetition][i])):
      stat_sparsity_averages.append()


In [None]:
import numpy as np
from scipy.stats import ttest_rel

# Example data: average sparsity scores for each method
dice_scores = sparsity_averages[0]
discern_scores = sparsity_averages[1]
nice_scores = sparsity_averages[2]

# Perform paired t-test with your method against each base method
methods = [dice_scores, nice_scores]
method_names = ['Dice', 'Nice']

for i, base_scores in enumerate(methods):
    t_statistic, p_value = ttest_rel(discern_scores, base_scores)

    print(f"Paired t-test Results for {method_names[i]}:")
    print("T-statistic:", t_statistic)
    print("P-value:", p_value)

    alpha = 0.05  # Significance level
    if p_value < alpha:
        print("Reject null hypothesis: Your method performs significantly better.")
    else:
        print("Fail to reject null hypothesis: No significant difference.")
    print()


Paired t-test Results for Dice:
T-statistic: -13.656810061481984
P-value: 2.832107447559744e-11
Reject null hypothesis: Your method performs significantly better.

Paired t-test Results for Nice:
T-statistic: -15.234527383189734
P-value: 4.188183107933548e-12
Reject null hypothesis: Your method performs significantly better.



## References



[1] Mothilal, Ramaravind K., Amit Sharma, and Chenhao Tan. "Explaining machine learning classifiers through diverse counterfactual explanations." Proceedings of the 2020 conference on fairness, accountability, and transparency. 2020.

[2] Brughmans, Dieter, Pieter Leyman, and David Martens. "Nice: an algorithm for nearest instance counterfactual explanations." Data Mining and Knowledge Discovery (2023): 1-39.

[3] Wijekoon, Anjana, et al. "How Close Is Too Close? The Role of Feature Attributions in Discovering Counterfactual Explanations." International Conference on Case-Based Reasoning. Cham: Springer International Publishing, 2022.

[4] Van Looveren, Arnaud, and Janis Klaise. "Interpretable counterfactual explanations guided by prototypes." Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Cham: Springer International Publishing, 2021.

[5] Becker, Barry and Kohavi, Ronny. (1996). Adult. UCI Machine Learning Repository. https://doi.org/10.24432/C5XW20.



In [None]:
def calculate_entropy(query, base_counterfactuals):
        base_counterfactual_entropies = []
        # Loop for each base counterfactual
        for base_counterfactual in base_counterfactuals:
            actions = []

            #Loop for each feature
            for i in range(len(query)):

                difference = query[i] - base_counterfactual[i]

                if(difference < 0):
                    actions.append("INC")
                elif(difference > 0):
                    actions.append("DEC")
                else:
                    actions.append("NONE")

            positive_actions = actions.count("INC")
            negative_actions = actions.count("DEC")
            no_actions = actions.count("NONE")
            total_actions = len(actions)

            feature_entropies = []

            #Loop for each feature
            for i in range(len(query)):
                positive_entropy = 0
                negative_entropy = 0
                no_entropy = 0


                if(no_actions != 0):
                  no_entropy = (-(no_actions / total_actions) * np.log2(no_actions / total_actions))

                if(positive_actions != 0):
                  positive_entropy = (-(positive_actions / total_actions) * np.log2(positive_actions / total_actions))

                if(negative_actions != 0):
                  negative_entropy = (-(negative_actions / total_actions) * np.log2(negative_actions / total_actions))

                feature_entropy =  positive_entropy + negative_entropy + no_entropy
                feature_entropies.append(feature_entropy)

            base_counterfactual_entropy = np.mean(feature_entropies)
            base_counterfactual_entropies.append(base_counterfactual_entropy)
        disagreement = np.mean(base_counterfactual_entropies)

        return disagreement

calculate_entropy([1,2,3,4,5], [[2,3,4,5,6], [2,3,4,5,4], [1,2,3,5,4]])