# DiCE ML Comparison with Optimal Point Method 

We compare the DiCE Model-agnostic methods with the optimal point method in this notebook. 

1. First, we import DiCE ML model-agnostic methods 
2. Second, we import the packaged files needed to run the "Optimal Point" Methodology 
3. Third, we run the experiments and compare the results at the end using different models such as SVM and random forest classifier.
4. Finally, we compare the runtimes of DiCE and the Optimal Point methodology

Note: Running experiments for the adult income can take hours. Please be mindful of the runtime for these experiments.

# Step 1: Importing DiCE ML and helper functions 

Below we import DiCE ML and their relevant helper functions. We import the sci-kit learn library and some of their necessary methods to make sure that we can run the experiments. 

In [58]:
# import DiCE
import dice_ml

from sklearn.datasets import fetch_openml
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn import svm

import pandas as pd
import numpy as np
import json
import datetime

import warnings
import random
warnings.filterwarnings("ignore", category=UserWarning, module="sklearn")

import os
import sys
nb_dir = os.path.split(os.getcwd())[0]
if nb_dir not in sys.path:
    sys.path.append(nb_dir)

import urllib.request
from urllib.request import urlopen
import ssl
import json
ssl._create_default_https_context = ssl._create_unverified_context
from dice_ml.utils import helpers


# Step 2: Import the necessary functionality to make Optimal Point methodology work 

We import many of the methods needed for the ```optimal_point()``` function to work as intended below. We import additional methods from ```binary_search_optimal_point().```

In [59]:
from files.common_functions import euclidean_distance, closest_border_point, closest_point, move_from_A_to_B_with_x1_displacement
from files.common_functions import get_multi_dim_border_points, det_constraints, real_world_constraints, constraint_bounds
from files.common_functions import balance_dataset, check_class_balance, convert_columns
from files.binary_search_optimal_point import multi_decision_boundary

In [60]:
# Loading cuML GPU acceleration library
%load_ext cuml.accel

The cuml.accel extension is already loaded. To reload it, use:
  %reload_ext cuml.accel


In [4]:
df=pd.read_csv('../../toy_dataset.csv')
# SVM classifier with polynomial decision boundary
svm_classifier = svm.SVC(kernel='poly',C=10, degree=2, probability=True, random_state=1)

In [56]:
def optimal_point(dataset, model, desired_class, original_class, chosen_row=-1, threshold=10000, point_epsilon=0.1, epsilon=0.01, constraints=[], deltas=[], plot=False):
    """
    Finds the closest point to the decision boundary from an undesired point,
    optionally constrained by real-world conditions.
    This essentially finds the counterfactual explanation for a given point by minimizing the distance to the given boundary.
    This method is important because it addresses a key problem with the original optimal_point() function where we generated an R^n dimensional grid that we would then have to iterate over.
    The problem with iterating over such a grid is eventually that we will hit a memory error for high-dimensional features such as 20, 30 or 40 features. This will cause the function to crash.
    Additionally, due to the exponential increase of the number of features to search, the grid will become infeasible to search (curse of dimensionality).

    Parameters
    ----------
    dataset : pd.DataFrame
        Full dataset containing features and a final column with class labels.

    model : sklearn-like classifier
        A binary classification model with a `.fit()` and `.predict()` method.

    desired_class : int or label
        The target class we want the corrected point to belong to.

    original_class : int or label
        The actual class label of the undesired point.

    chosen_row :  int
        The selected row of the dataset to find the counterfactual explanation for

    threshold : int, optional
        Max number of decision boundary points to generate. Default is 10000.

    point_epsilon : float, optional
        Precision used to estimate decision boundary points. Default is 0.1.

    epsilon : float, optional
        Step size used when displacing a point toward the decision boundary. Default is 0.01.

    constraints : list, optional
        A list of real-world constraints on the features (e.g., ranges, logic constraints). Default is [].

    deltas : list, optional
        Tolerances or maximum displacements for each continuous feature. Only works for continuous features. Default is [].

    plot : boolean
        Used as a parameter to determine whether to plot the results or not

    Returns
    -------
    np.ndarray
        A corrected point that satisfies the class change and real-world constraints.

    Raises
    ------
    Exception
        If the number of constraints exceeds the number of features.

    Notes
    -----
    - This function trains the model on the provided dataset, generates boundary points using
      `find_decision_boundary`, applies constraints, and finds the closest optimal point.
    - Assumes binary classification and relies on external functions like `real_world_constraints`,
      `closest_point`, `move_from_A_to_B_with_x1_displacement`, etc., which must be defined elsewhere.
    - Includes plotting for visualization (e.g., boundary contours, points), which requires matplotlib.
    - The function blends boundary approximation with counterfactual generation, useful for explainable AI.
    - Print statements are for progress tracking; plotting is partially commented out but can be enabled.
    - Usage: Call with a dataset and model to generate counterfactuals, e.g., for model interpretation or optimization.

    Examples
    --------
    >>> import pandas as pd
    >>> from sklearn.linear_model import LogisticRegression
    >>> dataset = pd.DataFrame({'feat1': [0, 1, 2], 'feat2': [0, 1, 0], 'label': [0, 1, 0]})
    >>> model = LogisticRegression()
    >>> undesired_coords = [2, 0]  # Example point from class 0
    >>> optimal = optimal_point(dataset, model, desired_class=1, original_class=0, undesired_coords=undesired_coords)
    >>> print(optimal)  # e.g., array([[1.5, 0.5]])
    """

    # Convert categorical columns if needed (before balancing)
    inv_col_map = convert_columns(dataset)

    # Extract features and labels before balancing
    X_orig = dataset.iloc[:, :-1]

    # Save the original row's feature values
    undesired_coords = X_orig.iloc[chosen_row, :].copy()

    # Balance the dataset
    dataset = balance_dataset(df=dataset, target=dataset.columns[-1])

    if not check_class_balance(dataset, target=dataset.columns[-1]):
        raise RuntimeError("Failed to balance classes for binary classification")

    sampled_dataset = dataset.sample(n=min(dataset.shape[0], 20000))

    # Extract new training features/labels after balancing
    X_train = sampled_dataset.iloc[:, :-1]
    y_train = sampled_dataset.iloc[:, -1]
    # Train the model
    print("Fitting model...")
    model.fit(X_train, y_train)
    print("Model training complete.")

    # -------------------------------
    # STEP 2: Find decision boundary
    # -------------------------------
    print("boundary points started generation...")

    # This step uses binary interpolation to get points close to the decision boundary
    boundary_points = multi_decision_boundary(model, X_train, y_train,
                                             threshold=threshold, epsilon=point_epsilon)
    print("boundary points finished.")
    print(boundary_points.shape)

    # -------------------------------
    # STEP 3: Apply real-world constraints (optional)
    # -------------------------------
    # Reduce boundary points based on external rules (e.g., cost limits, physics constraints)
    contours_pd = real_world_constraints(points=boundary_points, undesired_coords=undesired_coords, constraints=constraints)

    # contours = boundary_points  # (Commented: Alternative to use raw boundary)
    undesired_datapt = np.reshape(np.array(list(undesired_coords)), (1, -1))  # Reshape undesired point to 2D array

    # -------------------------------
    # STEP 4: Find closest point on constrained boundary
     # -------------------------------
    if contours_pd is not None and desired_class != original_class:
        contours = contours_pd.to_numpy()
        print("Finding the closest point from the contour line to the point...")
        contours_pd.reset_index(drop=True, inplace=True)
        optimal_datapt = closest_point(undesired_datapt, contour=contours)
        print("Found the closest point from the contour line to the point.")  # Note: Duplicate print, possibly a typo
        D = optimal_datapt - undesired_datapt  # Compute direction vector
        deltas = D * (1+epsilon)  # Scale by (1 + epsilon) to overshoot
        optimal_datapt = move_from_A_to_B_with_x1_displacement(undesired_datapt, optimal_datapt, deltas=deltas)
    elif desired_class == original_class or contours_pd is None:
        # If we want to *stay within* the same class (more constrained)
        all_constrained_feats = [var for (var,_) in constraints]
        closest_boundedpt = None
        vars = set(X_train.columns) - set(all_constrained_feats)
        cont_mutable_vars = [X_train.columns.get_loc(col) for col in vars]
        print(len(deltas), undesired_datapt[0].shape)
        deltas, len_constr = det_constraints(datapt=undesired_datapt[0], vars=cont_mutable_vars, deltas=deltas)  # Determine constraints

        if len_constr > X_train.shape[1]:
            raise Exception("There cannot be more constraints than features")
        else:
            # All n dimensions are constrained, so generate an exact grid of boundary candidates
            print(undesired_datapt[0], deltas)
            bounded_contour_pts = get_multi_dim_border_points(center=undesired_datapt[0],
                                                              extents=deltas,
                                                              step=0.01)
            np_bounded_contour = np.array(bounded_contour_pts)  # Convert to NumPy array
            print(np_bounded_contour.shape)
            closest_boundedpt = closest_border_point(np_bounded_contour, contour=boundary_points)  # Find closest on border
            print(closest_boundedpt)
        D = closest_boundedpt - undesired_datapt  # Compute direction
        optimal_datapt = move_from_A_to_B_with_x1_displacement(undesired_datapt, closest_boundedpt, deltas=D)  # Move point

    categorical_features = [col for col in inv_col_map.keys()]
    final_optimal_datapt = []

    for col in X_train.columns:
        if col in categorical_features:
            idx = int(optimal_datapt[0,X_train.columns.get_loc(col)])
            final_optimal_datapt.append(inv_col_map[col][idx])
        else:
            final_optimal_datapt.append(optimal_datapt[0,X_train.columns.get_loc(col)])

    query_instance = undesired_coords
    return dataset, model, query_instance, final_optimal_datapt, euclidean_distance(undesired_datapt, optimal_datapt), deltas[0], boundary_points

In [6]:
def clamp_vec_per_axis(v, ref_point, bool_vec, frac=0.05):
    """
    Clamp displacement vector v per-axis so endpoint p0+v stays inside +/- frac*|p0_i| (plus eps).

    Args:
      v    : array-like shape (n,)
      ref_point   : array-like shape (n,)
      bool_vec : boolean vector for isolating categorical features out of changes
      frac : fraction (default 0.05)

    Returns:
      a clipped vector that is bounded within half of the interval on both sides for each dimension
    """
    # allowable bounds for the final point
    lower_bounds = (ref_point - (frac/2) * np.abs(ref_point)) * bool_vec
    upper_bounds = (ref_point + (frac/2) * np.abs(ref_point)) * bool_vec

    # clamp the endpoint
    endpoint = np.clip(ref_point + v, lower_bounds, upper_bounds)
    return np.clip(endpoint, lower_bounds, upper_bounds)

In [7]:
def run_dice_cfs(df, model, query_instance, method, continuous_features, categorical_features, target, chosen_row, contours, plot=False, total_CFs=1, delta=100):
    start = datetime.datetime.now()

    x_train = df.iloc[:, :-1]
    backend='sklearn'

    d = dice_ml.Data(dataframe=df, continuous_features=continuous_features, categorical_features=categorical_features, outcome_name=target)
    m = dice_ml.Model(model=model, backend=backend)

    exp_dice = dice_ml.Dice(d, m, method=method)

    query_instance = x_train.iloc[[chosen_row]].to_numpy()

    dice_cfs = exp_dice.generate_counterfactuals(pd.DataFrame(data=query_instance, columns=x_train.columns),
                                                        total_CFs=total_CFs, desired_class="opposite")

    cfs_list = json.loads(dice_cfs.to_json())['cfs_list']
    dist_cfs = []

    # np_bounded_contour = np.array(bounded_contour_pts)  # Convert to NumPy array
    # x_values, y_values = np_bounded_contour[:, 0], np_bounded_contour[:, 1]  # Extract x/y for plotting
    # if plot:
    #     plt.scatter(x_values, y_values, marker='o')  # Plot bounded points
    contours = contours.reset_index(drop=True)

    bool_vec = []
    for col in df.iloc[:, :-1].columns:
        if np.issubdtype(df[col].dtype, np.number):
            bool_vec.append(1)   # numeric -> allow changes
        else:
            bool_vec.append(0)   # categorical -> mask out

    if delta.any() == 100:
        for point in cfs_list[0]:
            point_vec = [float(point[i]) for i in range(len(point[:-1]))]
            dist_cfs.append(euclidean_distance(np.array(point_vec), query_instance))
    else:
        for point in cfs_list[0]:
            point_vec = [float(point[i]) for i in range(len(point[:-1]))]
            point_vec = np.reshape(np.array(point_vec), (1, -1))
            endpoint_vec = clamp_vec_per_axis(point_vec, ref_point=query_instance, bool_vec=bool_vec, frac=delta/100)
            closest_pt = closest_point(endpoint_vec, contour=contours.to_numpy())
            dist_cfs.append(euclidean_distance(closest_pt, endpoint_vec))

    # if plot:
    #     for point in cfs_list[0][:5]:
    #         x,y = point[0], point[1]
    #         print("EUCLIDEAN DISTANCE:", euclidean_distance(delta*np.array((x,y)), query_instance))
    #         plt.scatter(x,y, c = 'yellow')  # Plot optimal point (changed to green for distinction)
    #         plt.text(x+0.002, y+0.002, 'H')  # Label 'H' (e.g., Healthy; adjusted from duplicate 'NH')
    #         plt.plot([x,query_instance[0][0]], [y, query_instance[0][1]], linestyle='--')  # Dashed line between points
    end = datetime.datetime.now()
    diff = end - start
    print(f"Elapsed time: {diff}")

    return dist_cfs, diff.total_seconds()

In [22]:
def exps(dataset, model, method, target, x_train, y_train, continuous_features, categorical_features, inv_map, num_samples, deltas=[], constraints=[], threshold=25000):
    dice_dists, optimal_dists = [], []
    sub_dataset = dataset[dataset[target] == 1]
    random_integers = random.sample(range(0, sub_dataset.shape[0]-1), num_samples)

    thresholds = [threshold, threshold * 10, threshold * 100]

    for threshold in thresholds:
        for i in random_integers:
            real_idx = sub_dataset.index[i]
            chosen_row=real_idx
            query_instance=x_train.iloc[chosen_row:chosen_row+1,:]
            label = y_train.iloc[chosen_row:chosen_row+1]
            df, model, query_instance, opt_point, dist, exp_delta, boundary_points = optimal_point(dataset, model, desired_class=inv_map[label.item()], original_class=label.item(),
                                                                                                   threshold=threshold, chosen_row=chosen_row, point_epsilon=1e-3,
                                                                                                   epsilon=0.01, constraints=constraints, deltas=deltas)
            optimal_dists.append(dist)
            dist_cfs, _ = run_dice_cfs(df=df, model=model, query_instance=query_instance,method=method,
                                       continuous_features=continuous_features, categorical_features=categorical_features,
                                       target=target, contours=boundary_points, chosen_row=chosen_row,
                                       delta=exp_delta, total_CFs=10)
            dice_dists.extend(dist_cfs)
    return optimal_dists, dice_dists

# Step 3: Toy Dataset

We run a few experiments using the toy dataset, and we compare the results visually using both the optimal point method and the dice model-agnostic methods. 

In [23]:
inv_map = {
    1: -1,
    -1: 1
}
x_train = df.iloc[:,:-1]
y_train  = df.iloc[:,-1]
continuous_features=['x1', 'x2']
categorical_features=[]
target='y'

In [27]:
optimal_dists, dice_dists = exps(df, model=svm_classifier, method='kdtree',
                                 x_train=x_train, y_train=y_train,
                                 continuous_features=continuous_features,
                                 categorical_features=categorical_features,
                                 inv_map=inv_map, num_samples=9,
                                 target=target)

  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.01671689 1.00931098]]
[[-0.15136719  1.46484375]]


100%|██████████| 1/1 [00:00<00:00, 53.40it/s]
  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Elapsed time: 0:00:00.023641
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.00529942 1.00959663]]
[[0.21386719 2.50292969]]


100%|██████████| 1/1 [00:00<00:00, 58.63it/s]
  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Elapsed time: 0:00:00.022066
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.00269212 1.00981475]]
[[0.13720703 5.45117188]]


100%|██████████| 1/1 [00:00<00:00, 59.34it/s]
  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Elapsed time: 0:00:00.022655
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.00640294 1.00938876]]
[[0.27978516 1.65136719]]


100%|██████████| 1/1 [00:00<00:00, 58.10it/s]
  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Elapsed time: 0:00:00.022280
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.00815733 1.0083413 ]]
[[0.54711914 0.60791016]]


100%|██████████| 1/1 [00:00<00:00,  7.44it/s]
  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Elapsed time: 0:00:00.139050
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.00841258 1.00920567]]
[[0.63525391 1.27050781]]


100%|██████████| 1/1 [00:00<00:00, 52.73it/s]
  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Elapsed time: 0:00:00.023216
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.01636549 1.0097085 ]]
[[-0.15966797  3.46386719]]


100%|██████████| 1/1 [00:00<00:00, 50.52it/s]
  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Elapsed time: 0:00:00.025403
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.00825304 1.0047772 ]]
[[0.57714844 0.19238281]]


100%|██████████| 1/1 [00:00<00:00, 31.66it/s]
  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Elapsed time: 0:00:00.036352
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.00717053 1.00891762]]
[[0.35595703 0.93212891]]


100%|██████████| 1/1 [00:00<00:00, 55.52it/s]
  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Elapsed time: 0:00:00.022562
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.01636549 1.00931052]]
[[-0.15966797  1.46386719]]


100%|██████████| 1/1 [00:00<00:00, 58.97it/s]
  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Elapsed time: 0:00:00.021907
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.00500425 1.0095964 ]]
[[0.20117188 2.50146484]]


100%|██████████| 1/1 [00:00<00:00, 57.71it/s]

Elapsed time: 0:00:00.022188
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...



  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.00702398 1.0098149 ]]
[[0.33837891 5.45556641]]


100%|██████████| 1/1 [00:00<00:00, 61.28it/s]

Elapsed time: 0:00:00.021573
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64



  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.00640294 1.00938876]]
[[0.27978516 1.65136719]]


100%|██████████| 1/1 [00:00<00:00, 38.70it/s]
  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Elapsed time: 0:00:00.030989
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.00815733 1.0083413 ]]
[[0.54711914 0.60791016]]


100%|██████████| 1/1 [00:00<00:00, 54.88it/s]
  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Elapsed time: 0:00:00.023252
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.00841258 1.00920567]]
[[0.63525391 1.27050781]]


100%|██████████| 1/1 [00:00<00:00, 57.38it/s]
  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Elapsed time: 0:00:00.022511
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.01416233 1.00970834]]
[[-0.24365234  3.46191406]]


100%|██████████| 1/1 [00:00<00:00, 60.04it/s]


Elapsed time: 0:00:00.021690
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...


  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.00833863 1.00092963]]
[[0.60693359 0.11035156]]


100%|██████████| 1/1 [00:00<00:00, 29.64it/s]
  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Elapsed time: 0:00:00.038146
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.00633279 1.0089495 ]]
[[0.27441406 0.96044922]]


100%|██████████| 1/1 [00:00<00:00, 50.93it/s]
  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Elapsed time: 0:00:00.024668
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.01636549 1.00931052]]
[[-0.15966797  1.46386719]]


100%|██████████| 1/1 [00:00<00:00, 54.57it/s]
  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Elapsed time: 0:00:00.023141
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.00529942 1.00959663]]
[[0.21386719 2.50292969]]


100%|██████████| 1/1 [00:00<00:00, 55.44it/s]

Elapsed time: 0:00:00.022844
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...



  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[0.         1.00981485]]
[[0.         5.45410156]]


100%|██████████| 1/1 [00:00<00:00, 53.59it/s]
  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Elapsed time: 0:00:00.023349
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.00640294 1.00938876]]
[[0.27978516 1.65136719]]


100%|██████████| 1/1 [00:00<00:00, 47.58it/s]
  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Elapsed time: 0:00:00.025430
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.00815733 1.0083413 ]]
[[0.54711914 0.60791016]]


100%|██████████| 1/1 [00:00<00:00, 57.77it/s]
  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Elapsed time: 0:00:00.021650
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.00841258 1.00920567]]
[[0.63525391 1.27050781]]


100%|██████████| 1/1 [00:00<00:00, 53.23it/s]
  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Elapsed time: 0:00:00.023615
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.01636549 1.0097085 ]]
[[-0.15966797  3.46386719]]


100%|██████████| 1/1 [00:00<00:00, 54.33it/s]
  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Elapsed time: 0:00:00.023655
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.00825304 1.0047772 ]]
[[0.57714844 0.19238281]]


100%|██████████| 1/1 [00:00<00:00, 27.74it/s]
  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Elapsed time: 0:00:00.041297
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.00734827 1.00890327]]
[[0.37988281 0.91992188]]


100%|██████████| 1/1 [00:00<00:00, 46.73it/s]

Elapsed time: 0:00:00.026149





In [28]:
print(np.mean(optimal_dists), np.mean(dice_dists))

2.067291564909288 1.9591353451520552


In [29]:
optimal_dists, dice_dists = exps(df, model=svm_classifier, method='random',
                                 x_train=x_train, y_train=y_train,
                                 continuous_features=continuous_features,
                                 categorical_features=categorical_features,
                                 inv_map=inv_map, num_samples=9,
                                 target=target)

  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.00717053 1.00959482]]
[[0.35595703 2.49169922]]


100%|██████████| 1/1 [00:00<00:00, 24.03it/s]
  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Elapsed time: 0:00:00.047173
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.00734827 1.00890327]]
[[0.37988281 0.91992188]]


100%|██████████| 1/1 [00:00<00:00, 23.68it/s]
  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Elapsed time: 0:00:00.047300
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.00841258 1.00920567]]
[[0.63525391 1.27050781]]


100%|██████████| 1/1 [00:00<00:00, 23.74it/s]
  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Elapsed time: 0:00:00.047435
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.00815733 1.0083413 ]]
[[0.54711914 0.60791016]]


100%|██████████| 1/1 [00:00<00:00, 21.96it/s]
  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Elapsed time: 0:00:00.050502
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.01636549 1.00931052]]
[[-0.15966797  1.46386719]]


100%|██████████| 1/1 [00:00<00:00, 22.71it/s]
  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Elapsed time: 0:00:00.048723
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.00703251 1.00938549]]
[[0.33935547 1.64257812]]


100%|██████████| 1/1 [00:00<00:00, 26.33it/s]
  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Elapsed time: 0:00:00.042496
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.00825304 1.0047772 ]]
[[0.57714844 0.19238281]]


100%|██████████| 1/1 [00:00<00:00, 23.14it/s]
  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Elapsed time: 0:00:00.047126
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.01636549 1.0097085 ]]
[[-0.15966797  3.46386719]]


100%|██████████| 1/1 [00:00<00:00, 24.29it/s]
  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Elapsed time: 0:00:00.045139
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.00269212 1.00981475]]
[[0.13720703 5.45117188]]


100%|██████████| 1/1 [00:00<00:00, 27.06it/s]
  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Elapsed time: 0:00:00.041369
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.00500425 1.0095964 ]]
[[0.20117188 2.50146484]]


100%|██████████| 1/1 [00:00<00:00, 28.04it/s]
  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Elapsed time: 0:00:00.040221
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.00734827 1.00890327]]
[[0.37988281 0.91992188]]


100%|██████████| 1/1 [00:00<00:00, 27.06it/s]

Elapsed time: 0:00:00.041213
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...



  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.00841258 1.00920567]]
[[0.63525391 1.27050781]]


100%|██████████| 1/1 [00:00<00:00, 27.98it/s]
  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Elapsed time: 0:00:00.040005
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.00815733 1.0083413 ]]
[[0.54711914 0.60791016]]


100%|██████████| 1/1 [00:00<00:00, 26.39it/s]


Elapsed time: 0:00:00.043030
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...


  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.01636549 1.00931052]]
[[-0.15966797  1.46386719]]


100%|██████████| 1/1 [00:00<00:00, 25.88it/s]


Elapsed time: 0:00:00.042498
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...


  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.00640294 1.00938876]]
[[0.27978516 1.65136719]]


100%|██████████| 1/1 [00:00<00:00, 28.83it/s]


Elapsed time: 0:00:00.039382
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...


  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.00825304 1.0047772 ]]
[[0.57714844 0.19238281]]


100%|██████████| 1/1 [00:00<00:00, 25.54it/s]

Elapsed time: 0:00:00.043158
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...



  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.01416233 1.00970834]]
[[-0.24365234  3.46191406]]


100%|██████████| 1/1 [00:00<00:00, 27.84it/s]


Elapsed time: 0:00:00.040665
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...


  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.00555212 1.00981479]]
[[0.22607422 5.45214844]]


100%|██████████| 1/1 [00:00<00:00, 26.71it/s]

Elapsed time: 0:00:00.041967
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...



  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.00500425 1.0095964 ]]
[[0.20117188 2.50146484]]


100%|██████████| 1/1 [00:00<00:00, 28.36it/s]

Elapsed time: 0:00:00.039378
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64



  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.00734827 1.00890327]]
[[0.37988281 0.91992188]]


100%|██████████| 1/1 [00:00<00:00, 27.69it/s]


Elapsed time: 0:00:00.040635
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...


  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.00841258 1.00920567]]
[[0.63525391 1.27050781]]


100%|██████████| 1/1 [00:00<00:00, 23.47it/s]


Elapsed time: 0:00:00.047401
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...


  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.00815733 1.0083413 ]]
[[0.54711914 0.60791016]]


100%|██████████| 1/1 [00:00<00:00, 29.26it/s]

Elapsed time: 0:00:00.038146
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...



  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.01671689 1.00931098]]
[[-0.15136719  1.46484375]]


100%|██████████| 1/1 [00:00<00:00, 28.01it/s]

Elapsed time: 0:00:00.040141
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...



  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.00640294 1.00938876]]
[[0.27978516 1.65136719]]


100%|██████████| 1/1 [00:00<00:00, 25.46it/s]

Elapsed time: 0:00:00.044020
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...



  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.00833863 1.00092963]]
[[0.60693359 0.11035156]]


100%|██████████| 1/1 [00:00<00:00, 22.02it/s]
  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Elapsed time: 0:00:00.050553
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.01671689 1.00970858]]
[[-0.15136719  3.46484375]]


100%|██████████| 1/1 [00:00<00:00, 27.39it/s]
  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Elapsed time: 0:00:00.041067
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.00269212 1.00981475]]
[[0.13720703 5.45117188]]


100%|██████████| 1/1 [00:00<00:00, 25.99it/s]


Elapsed time: 0:00:00.042711


In [30]:
print(np.mean(optimal_dists), np.mean(dice_dists))

2.0668688706598375 1.9726283729555825


In [31]:
optimal_dists, dice_dists = exps(df, model=svm_classifier, method='genetic',
                                 x_train=x_train, y_train=y_train,
                                 continuous_features=continuous_features,
                                 categorical_features=categorical_features,
                                 inv_map=inv_map, num_samples=9,
                                 target=target)

  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.00815733 1.0083413 ]]
[[0.54711914 0.60791016]]


100%|██████████| 1/1 [00:00<00:00,  1.51it/s]
  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Elapsed time: 0:00:00.668635
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.01636549 1.00931052]]
[[-0.15966797  1.46386719]]


100%|██████████| 1/1 [00:00<00:00, 13.10it/s]
  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Elapsed time: 0:00:00.080267
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.00500425 1.0095964 ]]
[[0.20117188 2.50146484]]


100%|██████████| 1/1 [00:00<00:00, 12.25it/s]
  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Elapsed time: 0:00:00.085985
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.01416233 1.00970834]]
[[-0.24365234  3.46191406]]


100%|██████████| 1/1 [00:00<00:00, 10.71it/s]
  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Elapsed time: 0:00:00.097494
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[0.         1.00981485]]
[[0.         5.45410156]]


100%|██████████| 1/1 [00:00<00:00, 11.37it/s]
  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Elapsed time: 0:00:00.091917
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.00833863 1.00092963]]
[[0.60693359 0.11035156]]


100%|██████████| 1/1 [00:00<00:00, 11.42it/s]

Elapsed time: 0:00:00.091743
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...



  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.00717053 1.00891762]]
[[0.35595703 0.93212891]]


100%|██████████| 1/1 [00:00<00:00, 13.46it/s]
  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Elapsed time: 0:00:00.078295
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.00640294 1.00938876]]
[[0.27978516 1.65136719]]


100%|██████████| 1/1 [00:00<00:00, 11.03it/s]
  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Elapsed time: 0:00:00.094495
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.00841258 1.00920567]]
[[0.63525391 1.27050781]]


100%|██████████| 1/1 [00:00<00:00, 11.02it/s]
  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Elapsed time: 0:00:00.095216
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.00861413 1.00746214]]
[[0.7277832  0.39697266]]


100%|██████████| 1/1 [00:00<00:00, 12.73it/s]
  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Elapsed time: 0:00:00.082669
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.01636549 1.00931052]]
[[-0.15966797  1.46386719]]


100%|██████████| 1/1 [00:00<00:00, 11.93it/s]
  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Elapsed time: 0:00:00.087677
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.00500425 1.0095964 ]]
[[0.20117188 2.50146484]]


100%|██████████| 1/1 [00:00<00:00, 11.91it/s]
  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Elapsed time: 0:00:00.088551
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.01671689 1.00970858]]
[[-0.15136719  3.46484375]]


100%|██████████| 1/1 [00:00<00:00, 12.54it/s]
  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Elapsed time: 0:00:00.084090
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[0.         1.00981485]]
[[0.         5.45410156]]


100%|██████████| 1/1 [00:00<00:00, 10.49it/s]


Elapsed time: 0:00:00.099654
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...


  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.0083916  0.99103147]]
[[0.62695312 0.05224609]]


100%|██████████| 1/1 [00:00<00:00,  1.58it/s]
  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Elapsed time: 0:00:00.637150
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.00734827 1.00890327]]
[[0.37988281 0.91992188]]


100%|██████████| 1/1 [00:00<00:00, 13.59it/s]
  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Elapsed time: 0:00:00.077605
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.00640294 1.00938876]]
[[0.27978516 1.65136719]]


100%|██████████| 1/1 [00:00<00:00, 10.86it/s]
  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Elapsed time: 0:00:00.096281
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.00841258 1.00920567]]
[[0.63525391 1.27050781]]


100%|██████████| 1/1 [00:00<00:00, 11.50it/s]
  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Elapsed time: 0:00:00.091124
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.00815733 1.0083413 ]]
[[0.54711914 0.60791016]]


100%|██████████| 1/1 [00:00<00:00, 11.60it/s]
  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Elapsed time: 0:00:00.090702
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[0.99421586 1.00931599]]
[[0.06298828 1.47558594]]


100%|██████████| 1/1 [00:00<00:00, 12.89it/s]
  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Elapsed time: 0:00:00.081385
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.00500425 1.0095964 ]]
[[0.20117188 2.50146484]]


100%|██████████| 1/1 [00:00<00:00, 12.18it/s]
  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Elapsed time: 0:00:00.085962
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.01671689 1.00970858]]
[[-0.15136719  3.46484375]]


100%|██████████| 1/1 [00:00<00:00, 11.26it/s]
  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Elapsed time: 0:00:00.093033
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.00674792 1.00981479]]
[[0.30957031 5.45214844]]


100%|██████████| 1/1 [00:00<00:00, 13.52it/s]
  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Elapsed time: 0:00:00.077880
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.00833863 1.00092963]]
[[0.60693359 0.11035156]]


100%|██████████| 1/1 [00:00<00:00,  1.60it/s]
  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Elapsed time: 0:00:00.628792
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.00717053 1.00891762]]
[[0.35595703 0.93212891]]


100%|██████████| 1/1 [00:00<00:00, 13.00it/s]
  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Elapsed time: 0:00:00.081158
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.00517891 1.00939484]]
[[0.20849609 1.66796875]]


100%|██████████| 1/1 [00:00<00:00, 11.75it/s]
  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Elapsed time: 0:00:00.089200
Class counts:
 y
-1    10
 1    10
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(100, 2)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.00841258 1.00920567]]
[[0.63525391 1.27050781]]


100%|██████████| 1/1 [00:00<00:00, 11.43it/s]

Elapsed time: 0:00:00.091788





In [32]:
print(np.mean(optimal_dists), np.mean(dice_dists))

2.068867865588009 2.005418811207778


# Step 3: Adult Income Dataset Experiments

We run a few experiments using the adult income dataset comparing DiCE model-agnostic methodologies and the Optimal Point method.

We follow the following steps: 

1. Import the dataset using the helpers function from DiCE 
2. Initialize the classifier which is a Random Forest Classifier in this case
3. Iterate for 50 or 100 randomly selected points using the Optimal point method 
4. After each iteration of generation with the optimal point method, we apply the run_dice_cfs method that enables us to generate counterfactuals using DiCE's specific model-agnostic approach

In [61]:
dataset = helpers.load_adult_income_dataset()

  adult_data = adult_data.replace({'income': {'<=50K': 0, '>50K': 1}})


In [62]:
dataset.head()

Unnamed: 0,age,workclass,education,marital_status,occupation,race,gender,hours_per_week,income
0,28,Private,Bachelors,Single,White-Collar,White,Female,60,0
1,30,Self-Employed,Assoc,Married,Professional,White,Male,65,1
2,32,Private,Some-college,Married,White-Collar,White,Male,50,0
3,20,Private,Some-college,Single,Service,White,Female,35,0
4,41,Self-Employed,Some-college,Married,White-Collar,White,Male,50,0


In [63]:
# Append classifier to preprocessing pipeline.
# Now we have a full prediction pipeline.
clf = RandomForestClassifier()

In [64]:
x_train = dataset.iloc[:,:-1]
y_train = dataset.iloc[:,-1]
continuous_features=["age", "hours_per_week"]
categorical_features = ['marital_status', 'workclass', 'education', 'race', 'gender', 'occupation']
target='income'
inv_map = {
    0: 1,
    1: 0
}
constraints = [
    ("age", "equal"),
    ("workclass", "equal"),
    ("education", "equal"),
    ("race", "equal"),
    ("gender", "equal"),
    ("occupation", "equal"),
]
deltas = [15] * x_train.shape[1]
print(deltas)

[15, 15, 15, 15, 15, 15, 15, 15]


In [65]:
optimal_dists, dice_dists = exps(dataset, clf, 'kdtree', target=target,
                                  x_train=x_train, y_train=y_train,
                                  continuous_features=continuous_features,
                                    categorical_features=categorical_features,
                                    inv_map=inv_map, num_samples=500, deltas=deltas,
                                    constraints=constraints)

  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Class counts:
 income
0    19820
1    19820
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(25000, 8)
8 (8,)
[40.  2.  1.  1.  2.  1.  1. 60.] [-1, -1, -1, np.float64(0.15), -1, -1, -1, np.float64(9.0)]
(0,)
None


  select_pts = points.loc[points[constraint[0]] == undesired_coords[points.columns.get_loc(constraint[0])], :]


TypeError: unsupported operand type(s) for -: 'NoneType' and 'float'

In [None]:
print(np.mean(optimal_dists), np.mean(dice_dists))

In [None]:
optimal_dists, dice_dists = exps(dataset, clf, 'random', target=target,
                                  x_train=x_train, y_train=y_train,
                                  continuous_features=continuous_features,
                                    categorical_features=categorical_features,
                                    inv_map=inv_map, num_samples=500, deltas=deltas,
                                    constraints=constraints)

In [None]:
print(np.mean(optimal_dists), np.mean(dice_dists))

In [None]:
optimal_dists, dice_dists = exps(dataset, clf, 'genetic', target=target,
                                  x_train=x_train, y_train=y_train,
                                  continuous_features=continuous_features,
                                    categorical_features=categorical_features,
                                    inv_map=inv_map, num_samples=500, deltas=deltas,
                                    constraints=constraints)

In [None]:
print(np.mean(optimal_dists), np.mean(dice_dists))

# Step 3: Heart Disease Dataset Experiments

We run a few experiments using the heart disease dataset comparing DiCE model-agnostic methodologies and the Optimal Point method.

We follow the following steps: 

1. Import the dataset using the helpers function from DiCE 
2. Initialize the classifier which is a Random Forest Classifier in this case
3. Iterate for 50 or 100 randomly selected points using the Optimal point method 
4. After each iteration of generation with the optimal point method, we apply the run_dice_cfs method that enables us to generate counterfactuals using DiCE's specific model-agnostic approach

In [None]:
heart_disease = pd.read_csv(
'../../heart.csv'
)

In [None]:
print(heart_disease.dtypes)

In [None]:
constraints = [
    ("age", "equal"),
    ("sex", "equal"),
    ("cp", "equal"),
    ("fbs", "equal"),
    ("restecg", "equal"),
    ("exang", "equal"),
    ("slope", "equal"),
    ("thal", "equal")
]

In [None]:
x_train = heart_disease.iloc[:,:-1]
y_train = heart_disease.iloc[:,-1]
continuous_features=["age", "trestbps", "thalach", "oldpeak", "chol"]
categorical_features=['sex', 'cp', 'fbs', 'restecg', 'exang', 'slope', 'thal']
target='target'
inv_map = {
    0: 1,
    1: 0
}

In [None]:
optimal_dists, dice_dists = exps(dataset=heart_disease, model=svm_classifier, method='kdtree', target=target,
                                  x_train=x_train, y_train=y_train,
                                  continuous_features=continuous_features,
                                    categorical_features=categorical_features,
                                    inv_map=inv_map, num_samples=100, delta=20,
                                    constraints=constraints)

In [None]:
print(np.mean(optimal_dists), np.mean(dice_dists))

In [None]:
optimal_dists, dice_dists = exps(dataset=heart_disease, model=svm_classifier, method='random', target=target,
                                  x_train=x_train, y_train=y_train,
                                  continuous_features=continuous_features,
                                    categorical_features=categorical_features,
                                    inv_map=inv_map, num_samples=500, delta=20,
                                    constraints=constraints)

In [None]:
print(np.mean(optimal_dists), np.mean(dice_dists))

In [None]:
optimal_dists, dice_dists = exps(dataset=heart_disease, model=svm_classifier, method='genetic', target=target,
                                  x_train=x_train, y_train=y_train,
                                  continuous_features=continuous_features,
                                    categorical_features=categorical_features,
                                    inv_map=inv_map, num_samples=500, delta=20,
                                    constraints=constraints)

In [None]:
print(np.mean(optimal_dists), np.mean(dice_dists))

# Comparing DiCE with large feature dataset made of numerical features 

We use the ```make_classification``` function of sci-kit learn library to generate a synthetic dataset of numerical values that we can then compare both methodologies.

In [32]:
X, y = make_classification(n_samples=2000, n_features=20, n_informative=20, n_redundant=0, random_state=42, n_classes=2)
y = y.reshape(-1,1)
columns = ["x"+str(i) for i in range(20)]
columns.append('y')
dataset = pd.DataFrame(data=np.hstack((X,y)), columns=columns)
model = LogisticRegression()
continuous_features = columns[:-1]
categorical_features=[]
target = 'y'
x_train = dataset.iloc[:,:-1]
y_train = dataset.iloc[:,-1]
inv_map = {
    0: 1,
    1: 0
}

In [None]:
optimal_dists, dice_dists = exps(dataset=dataset, model=model,
                                 method='kdtree', target=target,
                                  x_train=x_train, y_train=y_train,
                                  continuous_features=continuous_features,
                                    categorical_features=categorical_features,
                                    inv_map=inv_map, num_samples=500)

TypeError: optimal_point() got an unexpected keyword argument 'threshold'

In [None]:
print(np.mean(optimal_dists), np.mean(dice_dists))

In [None]:
optimal_dists, dice_dists = exps(dataset=dataset, model=model,
                                 method='random', target=target,
                                  x_train=x_train, y_train=y_train,
                                  continuous_features=continuous_features,
                                    categorical_features=categorical_features,
                                    inv_map=inv_map, num_samples=1000)

In [None]:
print(np.mean(optimal_dists), np.mean(dice_dists))

In [None]:
optimal_dists, dice_dists = exps(dataset=dataset, model=model,
                                 method='genetic', target=target,
                                  x_train=x_train, y_train=y_train,
                                  continuous_features=continuous_features,
                                    categorical_features=categorical_features,
                                    inv_map=inv_map, num_samples=1000)

In [None]:
print(np.mean(optimal_dists), np.mean(dice_dists))

# Step 4: Runtime Tests 

The function ```runtime_tests()``` are used for comparing DiCE's model-agnostic approaches and Optimal Point for time complexity. We use a logistic regression classifier for examining runtime.

In [15]:
def runtime_tests(number_of_features, method, total_random=100):
    X, y = make_classification(n_samples=5000, n_features=number_of_features, n_informative=number_of_features,
                            n_redundant=0, n_classes=2, random_state=42)
    y = np.reshape(y, (-1, 1))
    columns = ["x"+str(i) for i in range(1, X.shape[1]+1)]
    columns.append('y')
    dataset = pd.DataFrame(data=np.hstack((X,y)), columns=columns)
    continuous_features=["x"+str(i) for i in range(1, X.shape[1]+1)]
    target='y'
    inv_map = {
        0: 1,
        1: 0
    }
    dice_dists, optimal_dists = [], []
    dice_runtime = []
    sub_dataset = dataset[dataset[target] == 0]
    random_integers = random.sample(range(1, sub_dataset.shape[0]), total_random)
    clf = LogisticRegression()

    for i in random_integers:
        real_idx = sub_dataset.index[i]
        chosen_row=real_idx
        query_instance=X[chosen_row:chosen_row+1,:]
        label = y[chosen_row:chosen_row+1]
        df, model, query_instance, opt_point, dist,_, contours = optimal_point(dataset, clf, desired_class=inv_map[label.item()], original_class=label.item(), threshold=5000, chosen_row=chosen_row, point_epsilon=1e-3, epsilon=0.01, constraints=[])
        optimal_dists.append(dist)
        dist_cfs, total_seconds = run_dice_cfs(df=df, contours=contours, model=model,query_instance=query_instance,method=method, continuous_features=continuous_features, categorical_features=[], target=target, chosen_row=chosen_row)
        dice_dists.extend(dist_cfs)
        dice_runtime.append(total_seconds)

    print(np.mean(dice_runtime))

In [16]:
runtime_tests(number_of_features=10, method='kdtree')

  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Class counts:
 y
1    2511
0    2511
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(5000, 10)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[1.01103093 1.01078872 0.99914874 0.97103053 1.01354818 1.01589574
  1.02808711 1.00900411 0.98566825 1.00952284]]
[[-0.98070084 -1.28155805  0.09207676  0.02491773 -0.28565322 -0.17231007
  -0.05684086  1.01316647  0.04050955  2.11568903]]


100%|██████████| 1/1 [00:00<00:00, 22.14it/s]
  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Elapsed time: 0:00:00.052449
Class counts:
 y
1    2511
0    2511
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
(5000, 10)
Finding the closest point from the contour line to the point...
Found the closest point from the contour line to the point.
[[0.95513066 1.016791   1.01132655 1.00730793 1.00880275 1.00779514
  1.01434738 1.01283316 1.00867557 1.00693296]]
[[ 0.01740737 -0.14972635 -0.76237403  0.37417548  0.84259825  0.4570788
  -0.23332379 -0.35749264  0.76159341  0.32830743]]


  0%|          | 0/1 [00:01<?, ?it/s]


KeyboardInterrupt: 

In [None]:
runtime_tests(number_of_features=50, method='kdtree')

In [None]:
runtime_tests(number_of_features=10, method='random')

In [None]:
runtime_tests(number_of_features=50, method='random')

In [None]:
runtime_tests(number_of_features=10, method='genetic')

In [None]:
runtime_tests(number_of_features=50, method='genetic')