The `ckmeans` algorithm solves the same problem as the `separability`, i.e. finding the exact solution to kmeans in one dimension, when k=2. The quesion is, which does it faster? Let's do some simple benchmarks.

In [1]:
from ntarp import separability as sep
from ckmeans_1d_dp import ckmeans

In [2]:
from sklearn.datasets import make_blobs
import numpy as np

In [3]:
x = np.zeros((10_000, 100))
for i in range(10_000):
    x[i][..., np.newaxis], __ = make_blobs(n_samples = 100, n_features = 1, centers =2)

In [4]:
%%timeit
sep.w(x)

44.8 ms ± 369 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [5]:
def their_normalized_withinss(x):
    result = ckmeans(x, k=2)
    return result.tot_withinss / result.totss

In [6]:
%%timeit
their_normalized_withinss(x)

111 ms ± 335 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


Our version is faster, by a factor of approximately 2. That's good news for us.

In [7]:
my_w = sep.w(x)
my_w

array([0.02323745, 0.35250037, 0.32114736, ..., 0.05860749, 0.0526579 ,
       0.34088928])

In [8]:
their_w = their_normalized_withinss(x)
their_w

array([0.02323745, 0.35250037, 0.32114736, ..., 0.05860749, 0.0526579 ,
       0.34088928])

In [9]:
my_w - their_w

array([ 8.67361738e-17,  5.55111512e-16,  4.71844785e-15, ...,
        3.05311332e-16,  7.63278329e-17, -2.22044605e-16])

In [10]:
np.allclose(my_w, their_w)

True

In [11]:
np.log2(abs(my_w-their_w).max())

-46.38529015588479