# Observing the Effect of Changing the Resolution Size for the Grid-based Method

As we can see below, if ```resolution``` parameter is too low, the number of boundary points drops off, and below a particular resolution, no boundary points will be found. Increasing the resolution size decreases the overall sparsity and the distance between two points in $R^f$ space which means that it will increase the likelihood of finding two points that are close together in space.

Here we use a logistic regression classifier since we can easily identify the decision boundary points based on the probabilities for each class.

In [None]:
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression

import numpy as np
import pandas as pd
import warnings
import random

random.seed(0)
warnings.filterwarnings('ignore', category=UserWarning)

import os
import sys
nb_dir = os.path.split(os.getcwd())[0]
if nb_dir not in sys.path:
    sys.path.append(nb_dir)

In [3]:
X, y = make_classification(n_samples=2000, n_features=2, n_informative=2, n_redundant=0, random_state=42, n_classes=2)
model = LogisticRegression()
y = y.reshape(-1,1)
df1 = pd.DataFrame(data=np.hstack((X,y)))

In [4]:
print(df1.head(n=10))

          0         1    2
0  0.800062 -0.957489  1.0
1  1.187099  1.159787  1.0
2  0.154512  1.217520  0.0
3  0.179014 -0.852832  1.0
4 -0.735827 -0.245366  0.0
5  0.039487  1.320957  1.0
6 -1.482199  0.419738  0.0
7 -0.622829 -0.803223  0.0
8  0.965721 -1.068587  1.0
9  0.798459 -1.022348  1.0


In [5]:
from files.grid_optimal_point import optimal_point

# Ablation Study: Effect of Resolution on the Number of Grid Points

$R = 10$, $0$ boundary points found, $0.6$ second runtime  \
$R = 50$, $0$ boundary points found,  $0.2$ second runtime \
$R = 100$, $104$ boundary points found, $1.0$ second runtime  \
$R = 150$, $311$ boundary points found,  $15.1$ second runtime \
$R = 200$, $824$ boundary points found, $43.4$ second runtime \
$R = 250$, $1612$ boundary points found, $108.7$ second runtime

In [5]:
optimal_point(df1, model=model, desired_class=0, original_class=1, resolution=10, chosen_row=1, point_epsilon=0.1, epsilon=0.01, plot=False)

  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Class counts:
 2
1    1000
0    1000
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...


Exception: No Boundary Points Found. The DataFrame is empty. Please try to change the parameters.

In [6]:
optimal_point(df1, model=model, desired_class=0, original_class=1, resolution=50, chosen_row=1, point_epsilon=0.1, epsilon=0.01, plot=False)

  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Class counts:
 2
1    1000
0    1000
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...


Exception: No Boundary Points Found. The DataFrame is empty. Please try to change the parameters.

In [7]:
optimal_point(df1, model=model, desired_class=0, original_class=1, resolution=100, chosen_row=1, point_epsilon=0.1, epsilon=0.01, plot=False)

  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Class counts:
 2
1    1000
0    1000
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
Number of boundary points
(104, 2)
Finding the closest point from the contour line to the point...
Finding the closest point from the contour line to the point.
[[1.01087229 1.03644213]]
[[-1.15887253 -0.03919662]]


[0.01562654829664578, 1.1191621371582787]

In [8]:
optimal_point(df1, model=model, desired_class=0, original_class=1, resolution=150, chosen_row=1, point_epsilon=0.1, epsilon=0.01, plot=False)

  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Class counts:
 2
1    1000
0    1000
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
Number of boundary points
(311, 2)
Finding the closest point from the contour line to the point...
Finding the closest point from the contour line to the point.
[[1.0108713  0.82710625]]
[[-1.16018181  0.00452233]]


[0.01430417981153509, 1.1635276106048833]

In [9]:
optimal_point(df1, model=model, desired_class=0, original_class=1, resolution=200, chosen_row=1, point_epsilon=0.1, epsilon=0.01, plot=False)

  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Class counts:
 2
1    1000
0    1000
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
Number of boundary points
(824, 2)
Finding the closest point from the contour line to the point...
Finding the closest point from the contour line to the point.
[[1.01088942 1.06309174]]
[[-1.13657042 -0.02002368]]


[0.03815166640198053, 1.1385001569298325]

In [10]:
optimal_point(df1, model=model, desired_class=0, original_class=1, resolution=250, chosen_row=1, point_epsilon=0.1, epsilon=0.01, plot=False)

  balanced_dataset = pd.concat([balanced_dataset, upsampled_class], ignore_index=True)


Class counts:
 2
1    1000
0    1000
Name: count, dtype: int64
Fitting model...
Model training complete.
boundary points started generation...
boundary points finished.
Number of boundary points
(1612, 2)
Finding the closest point from the contour line to the point...
Finding the closest point from the contour line to the point.
[[1.01088532 0.7027732 ]]
[[-1.14183222  0.00228747]]


[0.03283725358485956, 1.1613947366266304]