# Generate near-boundary points

## Introduction

An estimation about basin boundary is good if it classifies points near the boundary well. Therefore, to obtain a dataset which consists of many near-boundary points is desirable for us to subsequently train our model. We adapt the following approach: Consider an attractor of interest from the given dynamical system. We start from uniformly sampled points from the region of interest, then we separate the sampled points into two(+ and -) classes, depending on whether the trajectory starting from the point will tend to the attractor. We consider all pairs of data points, where for each pair one point is taken from the positive class and the other is taken from the negative class. For such a pair, since they belong to different classes, the actual boundary must go in between them. We perform bisection to narrow down the margin of estimation until the margin becomes smaller than $\delta$ ($\delta$ is a hyperparameter to control precision of estimation). We then save those points close to the boundary as a dataset.

## Lorenz system and parameters

We model the Lorenz system, which consists of the following three ODEs: 

$$x'=\sigma(y-x) \\
y'=rx-y-xz \\
z'=xy-\beta z$$

In the script, we set $\sigma=10, \beta = \frac{8}{3}, r=10$. With this set of parameters, we have that $(\sqrt{24}, \sqrt{24}, 9)$ is a stable equilibrium. We therefore obtain a dataset by simulating the trajectory of 500 sample points. If the trajectory is attracted, we label it as +1, otherwise -1. We have obtained an initial dataset of 100,000 points by sampling from $(x_{0},y_{0},z_{0}) \in (-50,50)\times (-50,50) \times (-50,50)$ uniformly at random and compute their trajectories.

In [23]:
import numpy as np
import math
import pandas as pd
from scipy.integrate import solve_ivp

In [10]:
## Load the dataset
df = pd.read_csv('dataset_large.csv')

In [11]:
## Split the dataset according to the class label (since the original dataset contributes to more than two billion pairs, which is too large to compute,
## we only use 100 entries for each class, which will generate 20000 near-boundary points.)
df_n1 = df[df['attracted'] == -1].sample(n=100)
df_1 = df[df['attracted'] == 1].sample(n=100)

In [12]:
## Lorenz system
def lorenz(t, X, sigma, beta, r):
    """The Lorenz equations."""
    x, y, z = X
    xp = sigma*(y - x)
    yp = r*x - y - x*z
    zp = -beta*z + x*y
    return xp, yp, zp

sigma, beta, r = 10, 8/3, 10

In [13]:
# Check if the trajectory is attracted to the concerned Lorenz attractor
def euclidean_distance(point1, point2):
    point1, point2 = np.array(point1), np.array(point2)
    return np.sqrt(np.sum((point1 - point2) ** 2))
lorenz_attractor = (math.sqrt(24), math.sqrt(24), 9)

def is_attracted(x, y, z):
    return euclidean_distance((x, y, z), lorenz_attractor) < 0.01

In [14]:
## Implement the simulation process and decide if the trajectory is attracted by the Lorenz attractor
def simulation(x0 ,y0 ,z0):
    tmax, n = 1500, 100000
    soln = solve_ivp(lorenz, (0, tmax), (x0, y0, z0), args=(sigma, beta, r),dense_output=True)
    t = np.linspace(0, tmax, n)
    x, y, z = soln.sol(t)
    return is_attracted(x[n-1], y[n-1], z[n-1])

We define the bisection routine as follows: We accept two points $a$, $b$, and precision threshold $\delta$ as input. $a$ is from positive class and $b$ is from negative class. The algorithm terminates if $a$ and $b$ are of distance within $\delta$, that means the boundary is estimated by a margin of length $\leq \delta$, and then $a$ and $b$ are returned as near-boundary points. Otherwise, the algorithm takes midpoint $c$ and check which class $c$ belongs to, and go to one of the recursion cases depending on the result. By this approach, the margin is halved each time.

In [15]:
def bisection(a, b, delta):
    distance = np.linalg.norm(np.array([a[0]-b[0], a[1]-b[1], a[2]-b[2]]))
    if distance < delta:
        return (a, b)
    else:
        c = ((a[0]+b[0])/2, (a[1]+b[1])/2, (a[2]+b[2])/2)
        if simulation(c[0], c[1], c[2]):
            return bisection(c, b, delta)
        else:
            return bisection(a, c, delta)

In [21]:
## Test bisection
a = (19.37110577290963,-11.711808886806338,21.152897915080075) ## label +1
b = (11.598790495406433,40.44625804241089,-47.255588938326824) ## label -1

a_near, b_near = bisection(a, b, 0.01)
print(simulation(a_near[0], a_near[1], a_near[2]))
print(simulation(b_near[0], b_near[1], b_near[2]))
print(a_near)
print(b_near)

True
False
(16.934667097051694, 4.638522640809609, -0.29155938955236094)
(16.93419271257431, 4.641706116574332, -0.29573471223628467)


In [17]:
# Initialize an empty list to store the rows
rows = []

# Get the total number of iterations
total_iterations = len(df_1) * len(df_n1)
current_iteration = 0

# Iterate over each possible pair to generate near-boundary points
for i, row_a in df_1.iterrows():
    for j, row_b in df_n1.iterrows():
        a = (row_a['x0'], row_a['y0'], row_a['z0'])
        b = (row_b['x0'], row_b['y0'], row_b['z0'])
        a_near, b_near = bisection(a, b, 0.01)

        # Append a_near and b_near to the list as dictionaries
        rows.append({'x0': a_near[0], 'y0': a_near[1], 'z0': a_near[2], 'attracted': 1})
        rows.append({'x0': b_near[0], 'y0': b_near[1], 'z0': b_near[2], 'attracted': -1})

        # Update and print the progress every 10 iterations
        current_iteration += 1
        if current_iteration % 10 == 0:
            print(f'Progress: {current_iteration}/{total_iterations}')

# Convert the list of dictionaries to a DataFrame
df_near = pd.DataFrame(rows)

Progress: 10/10000
Progress: 20/10000
Progress: 30/10000
Progress: 40/10000
Progress: 50/10000
Progress: 60/10000
Progress: 70/10000
Progress: 80/10000
Progress: 90/10000
Progress: 100/10000
Progress: 110/10000
Progress: 120/10000
Progress: 130/10000
Progress: 140/10000
Progress: 150/10000
Progress: 160/10000
Progress: 170/10000
Progress: 180/10000
Progress: 190/10000
Progress: 200/10000
Progress: 210/10000
Progress: 220/10000
Progress: 230/10000
Progress: 240/10000
Progress: 250/10000
Progress: 260/10000
Progress: 270/10000
Progress: 280/10000
Progress: 290/10000
Progress: 300/10000
Progress: 310/10000
Progress: 320/10000
Progress: 330/10000
Progress: 340/10000
Progress: 350/10000
Progress: 360/10000
Progress: 370/10000
Progress: 380/10000
Progress: 390/10000
Progress: 400/10000
Progress: 410/10000
Progress: 420/10000
Progress: 430/10000
Progress: 440/10000
Progress: 450/10000
Progress: 460/10000
Progress: 470/10000
Progress: 480/10000
Progress: 490/10000
Progress: 500/10000
Progress:

In [19]:
df_near.to_csv('dataset_near.csv', index=False)

In [20]:
## Verify the generated near-boundary points
df_near = pd.read_csv('dataset_near.csv')
df_near_1 = df_near[df_near['attracted'] == 1]
df_near_n1 = df_near[df_near['attracted'] == -1]

for i, row in df_near_1.iterrows():
    if not simulation(row['x0'], row['y0'], row['z0']):
        print(f"Error: {row['x0']}, {row['y0']}, {row['z0']}")
print("Done with label 1")

for i, row in df_near_n1.iterrows():
    if simulation(row['x0'], row['y0'], row['z0']):
        print(f"Error: {row['x0']}, {row['y0']}, {row['z0']}")
print("Done with label -1")

Done with label 1
Done with label -1


In [24]:
print(simulation(-10.5140963715,32.0892188389,12.8194874549))

True
