# DiCE ML Comparison with Optimal Point Method 

We compare the DiCE Model-agnostic methods with the optimal point method in this notebook. 

1. First, we import DiCE ML model-agnostic methods 
2. Second, we import the packaged files needed to run the "Optimal Point" Methodology 
3. Third, we run the experiments and compare the results at the end using different models such as SVM and random forest classifier.
4. Finally, we compare the runtimes of DiCE and the Optimal Point methodology

Note: Running experiments for the adult income can take hours. Please be mindful of the runtime for these experiments.

# Step 1: Importing DiCE ML and helper functions 

Below we import DiCE ML and their relevant helper functions. We import the sci-kit learn library and some of their necessary methods to make sure that we can run the experiments. 

In [None]:
# import DiCE
import dice_ml

from sklearn.datasets import fetch_openml
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn import svm

import pandas as pd
import numpy as np
import json
import datetime

import warnings
import random
warnings.filterwarnings("ignore", category=UserWarning, module="sklearn")

import os
import sys
nb_dir = os.path.split(os.getcwd())[0]
if nb_dir not in sys.path:
    sys.path.append(nb_dir)

import urllib.request
from urllib.request import urlopen
import ssl
import json
ssl._create_default_https_context = ssl._create_unverified_context
from dice_ml.utils import helpers


# Step 2: Import the necessary functionality to make Optimal Point methodology work 

We import many of the methods needed for the ```optimal_point()``` function to work as intended below. We import additional methods from ```binary_search_optimal_point().```

In [None]:
from files.common_functions import euclidean_distance, closest_border_point, closest_point, move_from_A_to_B_with_x1_displacement
from files.common_functions import get_multi_dim_border_points, det_constraints, real_world_constraints, constraint_bounds
from files.common_functions import balance_dataset, check_class_balance, convert_columns
from files.binary_search_optimal_point import multi_decision_boundary

In [None]:
# Loading cuML GPU acceleration library
%load_ext cuml.accel

In [None]:
df=pd.read_csv('../../toy_dataset.csv')
# SVM classifier with polynomial decision boundary
svm_classifier = svm.SVC(kernel='poly',C=10, degree=2, probability=True, random_state=1)

In [None]:
def optimal_point(dataset, model, desired_class, original_class, chosen_row=-1, threshold=10000, point_epsilon=0.1, epsilon=0.01, constraints=[], delta=15, plot=False, step=0.5):
    """
    Finds the closest point to the decision boundary from an undesired point,
    optionally constrained by real-world conditions.
    This essentially finds the counterfactual explanation for a given point by minimizing the distance to the given boundary.
    This method is important because it addresses a key problem with the original optimal_point() function where we generated an R^n dimensional grid that we would then have to iterate over.
    The problem with iterating over such a grid is eventually that we will hit a memory error for high-dimensional features such as 20, 30 or 40 features. This will cause the function to crash.
    Additionally, due to the exponential increase of the number of features to search, the grid will become infeasible to search (curse of dimensionality).

    Parameters
    ----------
    dataset : pd.DataFrame
        Full dataset containing features and a final column with class labels.

    model : sklearn-like classifier
        A binary classification model with a `.fit()` and `.predict()` method.

    desired_class : int or label
        The target class we want the corrected point to belong to.

    original_class : int or label
        The actual class label of the undesired point.

    chosen_row :  int
        The selected row of the dataset to find the counterfactual explanation for

    threshold : int, optional
        Max number of decision boundary points to sample. Default is 10000.

    point_epsilon : float, optional
        Precision used to estimate decision boundary points. Default is 0.1.

    epsilon : float, optional
        Step size used when displacing a point toward the decision boundary. Default is 0.01.

    constraints : list, optional
        A list of real-world constraints on the features (e.g., ranges, logic constraints). Default is [].

    delta : int, optional
        Tolerances or maximum displacement for each continuous feature

    plot : boolean
        Used as a parameter to determine whether to plot the results or not

    Returns
    -------
    np.ndarray
        A corrected point that satisfies the class change and real-world constraints.

    Raises
    ------
    Exception
        If the number of constraints exceeds the number of features.

    Notes
    -----
    - This function trains the model on the provided dataset, generates boundary points using
      `find_decision_boundary`, applies constraints, and finds the closest optimal point.
    - Assumes binary classification and relies on external functions like `real_world_constraints`,
      `closest_point`, `move_from_A_to_B_with_x1_displacement`, etc., which must be defined elsewhere.
    - Includes plotting for visualization (e.g., boundary contours, points), which requires matplotlib.
    - The function blends boundary approximation with counterfactual generation, useful for explainable AI.
    - Print statements are for progress tracking; plotting is partially commented out but can be enabled.
    - Usage: Call with a dataset and model to generate counterfactuals, e.g., for model interpretation or optimization.

    Examples
    --------
    >>> import pandas as pd
    >>> from sklearn.linear_model import LogisticRegression
    >>> dataset = pd.DataFrame({'feat1': [0, 1, 2], 'feat2': [0, 1, 0], 'label': [0, 1, 0]})
    >>> model = LogisticRegression()
    >>> undesired_coords = [2, 0]  # Example point from class 0
    >>> optimal = optimal_point(dataset, model, desired_class=1, original_class=0, undesired_coords=undesired_coords)
    >>> print(optimal)  # e.g., array([[1.5, 0.5]])
    """

    # Convert categorical columns if needed (before balancing)
    inv_col_map = convert_columns(dataset)

    # Extract features and labels before balancing
    X_orig = dataset.iloc[:, :-1]

    # Save the original row's feature values
    undesired_coords = X_orig.iloc[chosen_row, :].copy()

    # Balance the dataset
    dataset = balance_dataset(df=dataset, target=dataset.columns[-1])

    if not check_class_balance(dataset, target=dataset.columns[-1]):
        raise RuntimeError("Failed to balance classes for binary classification")

    sampled_dataset = dataset.sample(n=min(dataset.shape[0], 20000))

    # Extract new training features/labels after balancing
    X_train = sampled_dataset.iloc[:, :-1]
    y_train = sampled_dataset.iloc[:, -1]
    # Train the model
    print("Fitting model...")
    model.fit(X_train, y_train)
    print("Model training complete.")

    # -------------------------------
    # STEP 2: Find decision boundary
    # -------------------------------
    print("boundary points started generation...")

    # This step uses binary interpolation to get points close to the decision boundary
    boundary_points = multi_decision_boundary(model, X_train, y_train,
                                             threshold=threshold, epsilon=point_epsilon)
    print("boundary points finished.")
    print(boundary_points.shape)
    # Detect categorical features (assumed as int columns)
    categorical_features = X_train.select_dtypes(include=['int32', 'int64', 'int8']).columns.tolist()

    # Round categoricals to int for discrete values
    for col in categorical_features:
        boundary_points[col] = boundary_points[col].astype(int)

    # -------------------------------
    # STEP 3: Apply real-world constraints (optional)
    # -------------------------------
    # Reduce boundary points based on external rules (e.g., cost limits, physics constraints)
    contours_pd = real_world_constraints(points=boundary_points,
                                      undesired_coords=undesired_coords,
                                      constraints=constraints)
    undesired_datapt = np.reshape(undesired_coords, (1, -1))  # Reshape undesired point to 2D array

    # if plot:
    #     plt.plot(contours[:,0], contours[:,1], lw=0.5, color='red')  # Commented: Plot contours for visualization

    # -------------------------------
    # STEP 4: Find closest point on constrained boundary
     # -------------------------------
    if contours_pd is not None and desired_class != original_class:
        contours = contours_pd.to_numpy()
        print("Finding the closest point from the contour line to the point...")
        contours_pd.reset_index(drop=True, inplace=True)
        optimal_datapt = closest_point(undesired_datapt, contour=contours)
        print("Found the closest point from the contour line to the point.")  # Note: Duplicate print, possibly a typo
        D = optimal_datapt - undesired_datapt  # Compute direction vector
        deltas = D * (1+epsilon)  # Scale by (1 + epsilon) to overshoot
        optimal_datapt = move_from_A_to_B_with_x1_displacement(undesired_datapt, optimal_datapt, deltas=deltas)
    elif desired_class == original_class or contours_pd is None:
        # If we want to *stay within* the same class (more constrained)
        all_constrained_feats = [var for (var,_) in constraints]
        closest_boundedpt = None
        vars = set(X_train.columns) - set(all_constrained_feats)
        cont_mutable_vars = [X_train.columns.get_loc(col) for col in vars]
        deltas, len_constr = det_constraints(datapt=undesired_datapt[0], vars=cont_mutable_vars, deltas=deltas)  # Determine constraints

        if len_constr > X_train.shape[1]:
            raise Exception("There cannot be more constraints than features")
        else:
            # All n dimensions are constrained, so generate an exact grid of boundary candidates
            bounded_contour_pts = get_multi_dim_border_points(center=undesired_datapt[0],
                                                              extents=deltas,
                                                              step=0.1)
            np_bounded_contour = np.array(bounded_contour_pts)  # Convert to NumPy array
            if plot:
                x_values, y_values = np_bounded_contour[:, 0], np_bounded_contour[:, 1]  # Extract x/y for plotting
                plt.scatter(x_values, y_values, marker='o')  # Plot bounded points

            closest_boundedpt = closest_border_point(np_bounded_contour, contour=boundary_points)  # Find closest on border
            print(closest_boundedpt)
        D = closest_boundedpt - undesired_datapt  # Compute direction
        optimal_datapt = move_from_A_to_B_with_x1_displacement(undesired_datapt, closest_boundedpt, deltas=D)  # Move point
    # Plot original and optimal points with connecting line
    # if plot:
    #     plt.scatter(undesired_datapt[0][0], undesired_datapt[0][1], c = 'r')  # Plot undesired point
    #     plt.text(undesired_datapt[0][0]+0.002, undesired_datapt[0][1]+0.002, 'NH')  # Label 'NH' (e.g., Non-Healthy)
    #     plt.scatter(optimal_datapt[0][0], optimal_datapt[0][1], c = 'g')  # Plot optimal point (changed to green for distinction)
    #     plt.text(optimal_datapt[0][0]+0.002, optimal_datapt[0][1]+0.002, 'NH')  # Label 'H' (e.g., Healthy; adjusted from duplicate 'NH')
    #     plt.plot([undesired_datapt[0][0], optimal_datapt[0][0]], [undesired_datapt[0][1],optimal_datapt[0][1]], linestyle='--')  # Dashed line between points

    categorical_features = [col for col in inv_col_map.keys()]
    final_optimal_datapt = []

    for col in X_train.columns:
        if col in categorical_features:
            idx = int(optimal_datapt[0,X_train.columns.get_loc(col)])
            final_optimal_datapt.append(inv_col_map[col][idx])
        else:
            final_optimal_datapt.append(optimal_datapt[0,X_train.columns.get_loc(col)])
    query_instance = undesired_coords
    return dataset, model, query_instance, final_optimal_datapt, euclidean_distance(undesired_datapt, optimal_datapt), delta, boundary_points

In [None]:
def clamp_vec_per_axis(v, ref_point, bool_vec, frac=0.05):
    """
    Clamp displacement vector v per-axis so endpoint p0+v stays inside +/- frac*|p0_i| (plus eps).

    Args:
      v    : array-like shape (n,)
      ref_point   : array-like shape (n,)
      bool_vec : boolean vector for isolating categorical features out of changes
      frac : fraction (default 0.05)

    Returns:
      a clipped vector that is bounded within half of the interval on both sides for each dimension
    """
    # allowable bounds for the final point
    lower_bounds = (ref_point - (frac/2) * np.abs(ref_point)) * bool_vec
    upper_bounds = (ref_point + (frac/2) * np.abs(ref_point)) * bool_vec

    # clamp the endpoint
    endpoint = np.clip(ref_point + v, lower_bounds, upper_bounds)
    return np.clip(endpoint, lower_bounds, upper_bounds)

In [None]:
def run_dice_cfs(df, model, query_instance, method, continuous_features, categorical_features, target, chosen_row, contours, plot=False, total_CFs=1, delta=100):
    start = datetime.datetime.now()

    x_train = df.iloc[:, :-1]
    backend='sklearn'

    d = dice_ml.Data(dataframe=df, continuous_features=continuous_features, categorical_features=categorical_features, outcome_name=target)
    m = dice_ml.Model(model=model, backend=backend)

    exp_dice = dice_ml.Dice(d, m, method=method)

    query_instance = x_train.iloc[[chosen_row]].to_numpy()

    dice_cfs = exp_dice.generate_counterfactuals(pd.DataFrame(data=query_instance, columns=x_train.columns),
                                                        total_CFs=total_CFs, desired_class="opposite")

    cfs_list = json.loads(dice_cfs.to_json())['cfs_list']
    dist_cfs = []

    # np_bounded_contour = np.array(bounded_contour_pts)  # Convert to NumPy array
    # x_values, y_values = np_bounded_contour[:, 0], np_bounded_contour[:, 1]  # Extract x/y for plotting
    # if plot:
    #     plt.scatter(x_values, y_values, marker='o')  # Plot bounded points
    contours = contours.reset_index(drop=True)

    bool_vec = []
    for col in df.iloc[:, :-1].columns:
        if np.issubdtype(df[col].dtype, np.number):
            bool_vec.append(1)   # numeric -> allow changes
        else:
            bool_vec.append(0)   # categorical -> mask out

    if delta == 100:
        for point in cfs_list[0]:
            point_vec = [float(point[i]) for i in range(len(point[:-1]))]
            dist_cfs.append(euclidean_distance(np.array(point_vec), query_instance))
    else:
        for point in cfs_list[0]:
            point_vec = [float(point[i]) for i in range(len(point[:-1]))]
            point_vec = np.reshape(np.array(point_vec), (1, -1))
            endpoint_vec = clamp_vec_per_axis(point_vec, ref_point=query_instance, bool_vec=bool_vec, frac=delta/100)
            closest_pt = closest_point(endpoint_vec, contour=contours.to_numpy())
            dist_cfs.append(euclidean_distance(closest_pt, endpoint_vec))

    # if plot:
    #     for point in cfs_list[0][:5]:
    #         x,y = point[0], point[1]
    #         print("EUCLIDEAN DISTANCE:", euclidean_distance(delta*np.array((x,y)), query_instance))
    #         plt.scatter(x,y, c = 'yellow')  # Plot optimal point (changed to green for distinction)
    #         plt.text(x+0.002, y+0.002, 'H')  # Label 'H' (e.g., Healthy; adjusted from duplicate 'NH')
    #         plt.plot([x,query_instance[0][0]], [y, query_instance[0][1]], linestyle='--')  # Dashed line between points
    end = datetime.datetime.now()
    diff = end - start
    print(f"Elapsed time: {diff}")

    return dist_cfs, diff.total_seconds()

In [None]:
def exps(dataset, model, method, target, x_train, y_train, continuous_features, categorical_features, inv_map, num_samples, delta=100, constraints=[], threshold=25000):
    dice_dists, optimal_dists = [], []
    sub_dataset = dataset[dataset[target] == 1]
    random_integers = random.sample(range(0, sub_dataset.shape[0]-1), num_samples)

    for i in random_integers:
        real_idx = sub_dataset.index[i]
        chosen_row=real_idx
        query_instance=x_train.iloc[chosen_row:chosen_row+1,:]
        label = y_train.iloc[chosen_row:chosen_row+1]
        df, model, query_instance, opt_point, dist, exp_delta, boundary_points = optimal_point(dataset, model, desired_class=0, original_class=1, threshold=threshold, chosen_row=chosen_row, point_epsilon=1e-3, epsilon=0.01, constraints=constraints, delta=delta, step=0.1)
        optimal_dists.append(dist)
        dist_cfs, _ = run_dice_cfs(df=df, model=model, query_instance=query_instance,method=method, continuous_features=continuous_features, categorical_features=categorical_features, target=target, contours=boundary_points, chosen_row=chosen_row, delta=exp_delta, total_CFs=10)
        dice_dists.extend(dist_cfs)

# Step 3: Toy Dataset

We run a few experiments using the toy dataset, and we compare the results visually using both the optimal point method and the dice model-agnostic methods. 

In [None]:
inv_map = {
    1: -1,
    -1: 1
}
x_train = df.iloc[:,:-1]
y_train  = df.iloc[:,-1]
continuous_features=['x1', 'x2']
categorical_features=[]
target='y'

In [None]:
optimal_dists, dice_dists = exps(df, model=svm_classifier, method='kdtree',
                                 x_train=x_train, y_train=y_train,
                                 continuous_features=continuous_features,
                                 categorical_features=categorical_features,
                                 inv_map=inv_map, num_samples=500,
                                 target=target)

In [None]:
print(np.mean(optimal_dists), np.mean(dice_dists))

In [None]:
optimal_dists, dice_dists = exps(df, model=svm_classifier, method='random',
                                 x_train=x_train, y_train=y_train,
                                 continuous_features=continuous_features,
                                 categorical_features=categorical_features,
                                 inv_map=inv_map, num_samples=500,
                                 target=target)

In [None]:
print(np.mean(optimal_dists), np.mean(dice_dists))

In [None]:
optimal_dists, dice_dists = exps(df, model=svm_classifier, method='genetic',
                                 x_train=x_train, y_train=y_train,
                                 continuous_features=continuous_features,
                                 categorical_features=categorical_features,
                                 inv_map=inv_map, num_samples=500,
                                 target=target)

In [None]:
print(np.mean(optimal_dists), np.mean(dice_dists))

# Step 3: Adult Income Dataset Experiments

We run a few experiments using the adult income dataset comparing DiCE model-agnostic methodologies and the Optimal Point method.

We follow the following steps: 

1. Import the dataset using the helpers function from DiCE 
2. Initialize the classifier which is a Random Forest Classifier in this case
3. Iterate for 50 or 100 randomly selected points using the Optimal point method 
4. After each iteration of generation with the optimal point method, we apply the run_dice_cfs method that enables us to generate counterfactuals using DiCE's specific model-agnostic approach

In [None]:
dataset = helpers.load_adult_income_dataset()

In [None]:
dataset.head()

In [None]:
# Append classifier to preprocessing pipeline.
# Now we have a full prediction pipeline.
clf = RandomForestClassifier()

In [None]:
x_train = dataset.iloc[:,:-1]
y_train = dataset.iloc[:,-1]
continuous_features=["age", "hours_per_week"]
categorical_features = ['marital_status', 'workclass', 'education', 'race', 'gender', 'occupation']
target='income'
inv_map = {
    0: 1,
    1: 0
}
constraints = [
    ("age", "equal"),
    ("workclass", "equal"),
    ("education", "equal"),
    ("race", "equal"),
    ("gender", "equal"),
    ("occupation", "equal"),
]

In [None]:
optimal_dists, dice_dists = exps(dataset, clf, 'kdtree', target=target,
                                  x_train=x_train, y_train=y_train,
                                  continuous_features=continuous_features,
                                    categorical_features=categorical_features,
                                    inv_map=inv_map, num_samples=500, delta=20,
                                    constraints=constraints)

In [None]:
print(np.mean(optimal_dists), np.mean(dice_dists))

In [None]:
optimal_dists, dice_dists = exps(dataset, clf, 'random', target=target,
                                  x_train=x_train, y_train=y_train,
                                  continuous_features=continuous_features,
                                    categorical_features=categorical_features,
                                    inv_map=inv_map, num_samples=500, delta=20,
                                    constraints=constraints)

In [None]:
print(np.mean(optimal_dists), np.mean(dice_dists))

In [None]:
optimal_dists, dice_dists = exps(dataset, clf, 'genetic', target=target,
                                  x_train=x_train, y_train=y_train,
                                  continuous_features=continuous_features,
                                    categorical_features=categorical_features,
                                    inv_map=inv_map, num_samples=500, delta=20,
                                    constraints=constraints)

In [None]:
print(np.mean(optimal_dists), np.mean(dice_dists))

# Step 3: Heart Disease Dataset Experiments

We run a few experiments using the heart disease dataset comparing DiCE model-agnostic methodologies and the Optimal Point method.

We follow the following steps: 

1. Import the dataset using the helpers function from DiCE 
2. Initialize the classifier which is a Random Forest Classifier in this case
3. Iterate for 50 or 100 randomly selected points using the Optimal point method 
4. After each iteration of generation with the optimal point method, we apply the run_dice_cfs method that enables us to generate counterfactuals using DiCE's specific model-agnostic approach

In [None]:
heart_disease = pd.read_csv(
'../../heart.csv'
)

In [None]:
print(heart_disease.dtypes)

In [None]:
constraints = [
    ("age", "equal"),
    ("sex", "equal"),
    ("cp", "equal"),
    ("fbs", "equal"),
    ("restecg", "equal"),
    ("exang", "equal"),
    ("slope", "equal"),
    ("thal", "equal")
]

In [None]:
x_train = heart_disease.iloc[:,:-1]
y_train = heart_disease.iloc[:,-1]
continuous_features=["age", "trestbps", "thalach", "oldpeak", "chol"]
categorical_features=['sex', 'cp', 'fbs', 'restecg', 'exang', 'slope', 'thal']
target='target'
inv_map = {
    0: 1,
    1: 0
}

In [None]:
optimal_dists, dice_dists = exps(dataset=heart_disease, model=svm_classifier, method='kdtree', target=target,
                                  x_train=x_train, y_train=y_train,
                                  continuous_features=continuous_features,
                                    categorical_features=categorical_features,
                                    inv_map=inv_map, num_samples=100, delta=20,
                                    constraints=constraints)

In [None]:
print(np.mean(optimal_dists), np.mean(dice_dists))

In [None]:
optimal_dists, dice_dists = exps(dataset=heart_disease, model=svm_classifier, method='random', target=target,
                                  x_train=x_train, y_train=y_train,
                                  continuous_features=continuous_features,
                                    categorical_features=categorical_features,
                                    inv_map=inv_map, num_samples=500, delta=20,
                                    constraints=constraints)

In [None]:
print(np.mean(optimal_dists), np.mean(dice_dists))

In [None]:
optimal_dists, dice_dists = exps(dataset=heart_disease, model=svm_classifier, method='genetic', target=target,
                                  x_train=x_train, y_train=y_train,
                                  continuous_features=continuous_features,
                                    categorical_features=categorical_features,
                                    inv_map=inv_map, num_samples=500, delta=20,
                                    constraints=constraints)

In [None]:
print(np.mean(optimal_dists), np.mean(dice_dists))

# Comparing DiCE with large feature dataset made of numerical features 

We use the ```make_classification``` function of sci-kit learn library to generate a synthetic dataset of numerical values that we can then compare both methodologies.

In [None]:
X, y = make_classification(n_samples=2000, n_features=20, n_informative=20, n_redundant=0, random_state=42, n_classes=2)
y = y.reshape(-1,1)
columns = ["x"+str(i) for i in range(20)]
columns.append('y')
dataset = pd.DataFrame(data=np.hstack((X,y)), columns=columns)
model = LogisticRegression()
continuous_features = columns[:-1]
categorical_features=[]
target = 'y'
x_train = dataset.iloc[:,:-1]
y_train = dataset.iloc[:,-1]

In [None]:
optimal_dists, dice_dists = exps(dataset=dataset, model=model,
                                 method='kdtree', target=target,
                                  x_train=x_train, y_train=y_train,
                                  continuous_features=continuous_features,
                                    categorical_features=categorical_features,
                                    inv_map=inv_map, num_samples=500)

In [None]:
print(np.mean(optimal_dists), np.mean(dice_dists))

In [None]:
optimal_dists, dice_dists = exps(dataset=dataset, model=model,
                                 method='random', target=target,
                                  x_train=x_train, y_train=y_train,
                                  continuous_features=continuous_features,
                                    categorical_features=categorical_features,
                                    inv_map=inv_map, num_samples=500)

In [None]:
print(np.mean(optimal_dists), np.mean(dice_dists))

In [None]:
optimal_dists, dice_dists = exps(dataset=dataset, model=model,
                                 method='genetic', target=target,
                                  x_train=x_train, y_train=y_train,
                                  continuous_features=continuous_features,
                                    categorical_features=categorical_features,
                                    inv_map=inv_map, num_samples=500)

In [None]:
print(np.mean(optimal_dists), np.mean(dice_dists))

# Step 4: Runtime Tests 

The function ```runtime_tests()``` are used for comparing DiCE's model-agnostic approaches and Optimal Point for time complexity. We use a logistic regression classifier for examining runtime.

In [None]:
def runtime_tests(number_of_features, method, total_random=100):
    X, y = make_classification(n_samples=5000, n_features=number_of_features, n_informative=number_of_features,
                            n_redundant=0, n_classes=2, random_state=42)
    y = np.reshape(y, (-1, 1))
    columns = ["x"+str(i) for i in range(1, X.shape[1]+1)]
    columns.append('y')
    dataset = pd.DataFrame(data=np.hstack((X,y)), columns=columns)
    continuous_features=["x"+str(i) for i in range(1, X.shape[1]+1)]
    target='y'
    inv_map = {
        0: 1,
        1: 0
    }
    dice_dists, optimal_dists = [], []
    dice_runtime = []
    sub_dataset = dataset[dataset[target] == 0]
    random_integers = random.sample(range(1, sub_dataset.shape[0]), total_random)
    clf = LogisticRegression()

    for i in random_integers:
        real_idx = sub_dataset.index[i]
        chosen_row=real_idx
        query_instance=X[chosen_row:chosen_row+1,:]
        label = y[chosen_row:chosen_row+1]
        df, model, query_instance, opt_point, dist,_, contours = optimal_point(dataset, clf, desired_class=inv_map[label.item()], original_class=label.item(), threshold=5000, chosen_row=chosen_row, point_epsilon=1e-3, epsilon=0.01, constraints=[])
        optimal_dists.append(dist)
        dist_cfs, total_seconds = run_dice_cfs(df=df, contours=contours, model=model,query_instance=query_instance,method=method, continuous_features=continuous_features, categorical_features=[], target=target, chosen_row=chosen_row)
        dice_dists.extend(dist_cfs)
        dice_runtime.append(total_seconds)

    print(np.mean(dice_runtime))

In [None]:
runtime_tests(number_of_features=10, method='kdtree')

In [None]:
runtime_tests(number_of_features=50, method='kdtree')

In [None]:
runtime_tests(number_of_features=10, method='random')

In [None]:
runtime_tests(number_of_features=50, method='random')

In [None]:
runtime_tests(number_of_features=10, method='genetic')

In [None]:
runtime_tests(number_of_features=50, method='genetic')