# Mapper algorithm
The mapper algorithm is a "modular" algorithm for creating a low-dimensional representation (usually a graph) of a large and/or high-dimensional (point cloud) dataset while preserving interesting topological characteristics. The results of the algorithm depend on three choices that we can make for it:

1. A (combination of) *filter* function(s), mapping the dataset to a metric space (called the *parameter space*) based on some distance/similarity metric (e.g. height of points or angle relative to some center).
2. An *open cover* of the parameter space, usually consisting of overlapping open intervals. The pre-images of the cover sets under the filter determine an open cover of the original point cloud.
3. A *clustering algorithm* to divide the covering sets into clusters/"connected components". The graph will be constructed based on overlap of these clusters.

The default choice for the parameter space is $\mathbb{R}$, which produces a graph as a result. However, Mapper can also be extended to $S^1$ (producing a graph with cycles) or $\mathbb{R}^M$ (producing a simplicial complex of dimension $\leq M$). For specific choices of filter and parameter space, cover, and clustering algorithm, we retrieve other well-known TDA algorithms for representing high-dimensional data, such as density clustering trees, disconnectivity graphs, and Reeb graphs; mapper is essentially a generalization of these techniques.

### Filters, covers & clusters
The *filter* function essentially reduces data points to only their relevant characteristics (e.g. reducing a 3d point $(x,y,z)$ to only its height $y$). Given a set $X$ of $N$ points, the filter is a function $f: X \to Z$ (usually the parameter space is $Z = \mathbb{R}$, but it can also be $\mathbb{R}^2$ or $S^1$) which assigns a value to each of the $N$ points. 

We then partition the *parameter space* $Z$ based on the range $I$ of $f$; this is typically done using a set $S$ of smaller, overlapping intervals, defined by a length $l$ and a percentage of overlap $p$. For example, if $I = [0,2]$, $l = 1$ and $p = 2/3$, then $S = \{[0,1], [1/3, 4/3], [2/3, 5/3], [1, 2]\}$.

Then, we use this partitioning of $Z$ to derive a cover $\mathcal{U}$ of $X$, defined by $\mathcal{U} = \bigcup_{I_j \in S} f^{-1}(I_j)$. We also divide the *covering sets* $X_j = f^{-1}(I_j)$ further into clusters $X_{jk}$, which can be thought of as representing connected components. Ultimately, each cluster will be a vertex in our complex and represents the points within it. We draw an edge between clusters $X_{jk}$ and $X_{lm}$ if their intersection is non-empty, i.e. if there is a point which exists in both clusters (see also Figure 1 in the mapper paper). Note that a clustering algorithm usually assigns each point within an $X_j$ to *one* unique cluster; this means there will be no edges between clusters within the same covering set $X_j$.

But how do we divide these $X_j$ into clusters? That's using a user-defined *clustering algorithm*. Mapper does not place any requirements on this, so any (domain-specific) algorithm will work. This appears to be the part of the algorithm that Mapper is most sensitive to, so it's a good idea to test a couple different clustering algorithms, and in particular we should try a variable amount of clusters for different covering sets.

### Example
An example of how the filter and clustering could work for our data, assuming that the point clouds are rotated such that the tree always grows in the $+y$ direction:
- For the filter $f$, we use $f(x,y,z) = y$, i.e. each point is mapped to its height above the ground.
- We then partition the resulting range (which is $[min \ height, \  max \ height]$) into intervals with overlap; the values of $l$ and $p$ will have to be tuned.
- We then cluser points into balls, (rotated) boxes, or similar shapes. The key requirement is that *clusters that belong to the same leader/support branch should overlap, while clusters that belong to different branches should not*. This requires tuning the size of the balls/boxes.
- From these clusters and their overlap, we can construct the mapper graph.
- In the graph, we should be able to identify the four key parts of the tree:
    - There should be only two intervals ($\textcolor{red}{red}$ and $\textcolor{orange}{orange}$ in the example) which have only one connected component. The one with many clusters ($\textcolor{orange}{orange}$) is the support, while the one with only a few clusters ($\textcolor{red}{red}$) is the trunk.
    - The leaders are formed by long, "straight" connected components spanning multiple intervals. Side branches are clusters which have a sharp angle to one of these leader "chains".
    - Of course, one of the biggest challenges will be finding better heuristics for identifying parts of the tree which are robust against various types of noise.

This would produce a result that looks something like this:

![mapper graph](Mapper_Mockup_Real.png)

Again, note that there are no connections between points within the same covering set (color), so the result doesn't quite look like a tree; our challenge is to find a good filter, cover and clustering algorithm that *does* produce a tree-like shape, and ideally one where we can actually label the branches too.

# Implementation
Assumption: we are given a point cloud dataset $X$ of $N$ points, each of which are arrays of the form [x,y,z].

Things we can tweak in the algorithm:
1. The filter function
2. The partition function
3. The clustering algorithm

For now we'll only consider $Z = \mathbb{R}$ as a parameter space; the algorithm can be extended to higher dimensions to produce not just graphs (1-simplicial complexes) but higher dimensional simplicial complexes as well.

In [1]:
### IMPORTS & GENERAL SETTINGS ###
import numpy as np
from tqdm import tqdm

### Filter
For now we just use a point's height.

In [2]:
### FILTER FUNCTION ###
# Input: depends on exact function; usually either the coordinates of one point 
#        or the distances between one point and all other points (for e.g. density or eccentricity filtering)
# Output: a real value

def filter_height(point):
    '''Filters a point based on its height (y value)'''
    return point[1]

# @ Timo add additional filters here

### Cover
For now we use the default overlapping intervals.

In [3]:
### COVERING FUNCTION ###
# Covers the image of the filter function with overlapping intervals
# Input: minimum filter function value, max filter function value, interval length l, interval overlap percentage p
# Output: a dictionary whose keys are the overlapping intervals and whose values are empty arrays.
# Example with min = 0, max = 2, l = 1 and p = 0.5: {(0,1): [], (0.5,1.5): [], (1, 2): []}
def cover_intervals(fmin, fmax, l, p):
    '''If the filter function f has values between fmin and fmax, this function covers the range [fmin, fmax] 
    with intervals of length l which overlap for a percentage 0 < p < 1.
    Example output for fmin = 0, fmax = 2, l = 1 and p = 0.5: {(0,1): [], (0.5,1.5): [], (1, 2): []}'''
    output = {}
    epsilon = 1e-8
    overlap_length = p*l
    I_start = fmin - epsilon  # Starting point of current interval. 
                              # We subtract a small constant so fmin itself is also included in the first interval.
    output[(I_start, I_start + l)] = []

    # Add new intervals until we've covered the whole range [fmin, fmax]
    # Note: the last interval may extend past fmax, but this is fine.
    while I_start + l < fmax:
        I_start += l - overlap_length
        output[(I_start, I_start + l)] = [] # Add interval to output dict with empty array as value

    return output

# TEST
cover = cover_intervals(0, 5, 1.2, 0.3)
print(cover)

{(-1e-08, 1.19999999): [], (0.8399999899999999, 2.03999999): [], (1.6799999899999998, 2.87999999): [], (2.5199999899999996, 3.71999999): [], (3.3599999899999995, 4.55999999): [], (4.199999989999999, 5.3999999899999995): []}


In [4]:
### COVER POINT CLOUD ACCORDING TO FITLER FUNCTION & COVER IN PARAMETER SPACE ###
# Cover the original point cloud data according to the filtering function and the above cover of the parameter space.
# Input: point cloud dataset X
# Output: dictionary whose keys are the intervals in Z as found above, 
# and whose values are the points that are mapped to these intervals by the filter function.
# Note: a point may occur in multiple intervals (in that case they overlap).
# Example output if X = {a, b, c, d} with filter values respectively {0.4, 0.7, 1.3, 1.8}:
# {(0,1): [a, b], (0.5, 1.5): [b, c], (1, 2): [c, d]}
def apply_covering(X, filter_values, cover_dict):
    '''Cover point cloud dataset X whose points have filter values filter_values, using the intervals in cover_dict.keys().
    Note: the indices in X and filter_values are expected to match.'''
    output = cover_dict
    intervals = list(cover_dict.keys())

    for i, filter_value in tqdm(enumerate(filter_values)):
        # Add point to each interval that its filter value falls into
        for interval in intervals:
            if interval[0] < filter_value and filter_value < interval[1]:
                output[interval].append(X[i])
    
    return output

# TEST
l = 1
p = 0.6
X = [[0, 0.4, 0.2], [0.3, 0.7, 1.0], [1.6, 1.3, 0.2], [2.4, 1.8, -0.4]]
filter_values = [filter_height(point) for point in X]
fmin = min(filter_values)
fmax = max(filter_values)
cover_dict = cover_intervals(fmin, fmax, l, p)
print("Cover of parameter space: ", cover_dict)
X_cover = apply_covering(X, filter_values, cover_dict)
print("Cover of X: ", X_cover)

Cover of parameter space:  {(0.39999999, 1.39999999): [], (0.7999999900000001, 1.79999999): [], (1.1999999900000002, 2.1999999900000002): []}


4it [00:00, ?it/s]

Cover of X:  {(0.39999999, 1.39999999): [[0, 0.4, 0.2], [0.3, 0.7, 1.0], [1.6, 1.3, 0.2]], (0.7999999900000001, 1.79999999): [[1.6, 1.3, 0.2]], (1.1999999900000002, 2.1999999900000002): [[1.6, 1.3, 0.2], [2.4, 1.8, -0.4]]}





### Clustering
For now we use k-means.
Major point of improvement: use different k for different covering sets based on the nr of points (and maybe also the shape?) of the covering set.

In [5]:
### CLUSTERING ALGORITHM ###
# Clusters the points in a covering set of X according to some clustering algorithm.
# Input: subset of points in X, additional parameters determining size and shape of clusters
# Output: an array/dictionary where each entry represents a cluster (and whose value is a list containing all points in that cluster).

from sklearn.cluster import KMeans, DBSCAN, AgglomerativeClustering

def cluster_kmeans(covering_set, k=20):
    '''Clusters the points in the covering set using the k-means algorithm.
    Output: a dictionary with the k cluster centroids as keys and the points belonging to each cluster as values.'''
    kmeans = KMeans(n_clusters=k, random_state=0).fit(covering_set)
    centroids = kmeans.cluster_centers_
    labels = kmeans.labels_  # For each point in covering_set, gives the index that that point belongs to
    output = {}
    for i in range(k):
        output[tuple(centroids[i])] = covering_set[labels == i]

    return output



def cluster_linkage(covering_set, distance_threshold=0.5):
    '''Clusters the points in covering_set using agglomerative clustering with a distance threshold
    instead of a fixed number of clusters.'''
    # Note: linkage could be made a parameter
    linkage = AgglomerativeClustering(n_clusters=None, linkage='ward', distance_threshold=distance_threshold).fit(covering_set)

    labels = linkage.labels_ 
    output = {}

    # For each cluster, calculate the cluster centroid as the mean of the points in that cluster
    # and assign all points in the cluster to that centroid in the output dictionary.
    for i in np.unique(labels):
        cluster_points = covering_set[labels == i]
        centroid = np.average(cluster_points, axis=0)
        output[tuple(centroid)] = cluster_points

    return output



def cluster_dbscan(covering_set, eps=0.5, min_samples=3):
    '''
    Clusters the points in the covering set using the DBSCAN algorithm.

    Parameters
    ------------
    eps : max distance between two samples for one to be considered in the neighborhood of the other.
        The bigger epsilon, the larger the clusters will be on average; we want to tune epsilon so that points
        within a branch have distance < eps, while points from different branches have distance > eps.
    min_samples : the minimum number of samples that have to be in the neighborhood of a point
        for it to be considered a "core sample"
    '''
    dbscan = DBSCAN(eps=eps, min_samples=min_samples).fit(covering_set)
    labels = dbscan.labels_  # Note: DBSCAN also labels some points as "noise" (-1)
    output = {}

    # For each cluster, calculate the cluster centroid as the mean of the points in that cluster (this is not done by DBSCAN)
    # and assign all points in the cluster to that centroid in the output dictionary.
    # Note: points labeled as noise are not included in any cluster.
    for i in np.unique(labels):
        if i < 0:
            # If we wanna do something with the noise points, do it here
            continue
        cluster_points = covering_set[labels == i]
        centroid = np.average(cluster_points, axis=0)
        output[tuple(centroid)] = cluster_points

    return output

# TEST
X = np.array([[0,0,0], [1,1.5,1], [0,0.5,0], [1,1,1], [1.5,1,1], [0.5,0,0]])
#clusters = cluster_kmeans(X, 2)
clusters = cluster_linkage(X, 1)
#clusters = cluster_dbscan(X, 0.5, 3)
print(clusters)

{(1.1666666666666667, 1.1666666666666667, 1.0): array([[1. , 1.5, 1. ],
       [1. , 1. , 1. ],
       [1.5, 1. , 1. ]]), (0.16666666666666666, 0.16666666666666666, 0.0): array([[0. , 0. , 0. ],
       [0. , 0.5, 0. ],
       [0.5, 0. , 0. ]])}


### Generate graph 

In [6]:
### GENERATE GRAPH ###
# Compute edges between clusters based on common points, and use this to produce a graph, visualized using the PyVis library.
# If the filter value of a point x is in the overlap of intervals I and J, then there is a cluster in the pre-image of I
# and a cluster in the pre-image of J that both contain x; these clusters will then be connected.
# Note: since every point in a covering set is in *one* unique cluster within that covering set, there are no edges
# between clusters within the same covering set (this is the whole point; to detect different connected components)

import pyvis as pv
import matplotlib.cm as cm 
from matplotlib.colors import rgb2hex

def compute_graph(cluster_dict):
    '''Converts the given list of clusters into a PyVis network.
    cluster_dict is assumed to be a nested dictionary of the following form:
    {cover_interval_1: {cluster_1: [point1, point2, ...], cluster_2: [point5, point7, ...], ..., cluster_k: [point4, point12, ...]},
     cover_interval_2: {cluster_1: [point1, point56, ...], cluster_2: [point32, point4, ...], ..., cluster_k: [point41, point95, ...]},
     ...}'''
    graph = pv.network.Network(notebook=True, cdn_resources='in_line')
    graph.toggle_physics(False)  # Disable the physics-based layout since we want to put nodes at custom positions (namely cluster centroids)
    colormap = cm.get_cmap('viridis', len(cluster_dict))

    # Add nodes; these are the centroid clusters
    for i, coverset in enumerate(cluster_dict.values()):
        nodes = list(coverset.keys()) 
        graph.add_nodes(
            ['{}-{}'.format(i, j) for j in range(len(nodes))],       # ID = coverset_nr-cluster_nr
            color=[rgb2hex(colormap(i)) for _ in range(len(nodes))], # Color by coverset
            x = [node[0]*1000 for node in nodes],                    # Set x,y position manually as centroid position (*1000 to show them properly separated)
            y = [node[1]*1000 for node in nodes]
        ) 

    # Add edges (can only safely be done after all nodes have been added)
    # For each cluster, see if any of the cluster's points can also be found in any other clusters; if so, add an edge.
    # Since every point within a coverset has one unique cluster, we don't have to check the cluster's own coverset.
    # Also, since the graph is undirected, we don't need to check previous coversets.
    # But we are still left with a 4-layer for loop; maybe see if we can implement this more efficiently.
    for iA, coversetA in tqdm(enumerate(list(cluster_dict.values())[:-1])): # Don't need to check last coverset since graph is undirected
        for jA, clusterA in enumerate(coversetA.values()):
            #print("Cluster A ({}-{}): ".format(iA, jA), clusterA) # DEBUG
            # See which clusters in other coversets share an index with this cluster
            for iB, coversetB in enumerate(list(cluster_dict.values())[iA+1:]): # Previous coversets have already been checked
                iB += iA+1  # Correct index
                for jB, clusterB in enumerate(coversetB.values()):
                    #print("\t Cluster B ({}-{}): ".format(iB, jB), clusterB) # DEBUG
                    #print("\t Common points: ", arrays_intersect(clusterA, clusterB)) # DEBUG
                    if arrays_intersect(clusterA, clusterB):
                        graph.add_edge('{}-{}'.format(iA, jA), '{}-{}'.format(iB, jB))

    return graph

def arrays_intersect(A, B):
    '''Returns true iff arrays A and B have at least one element in common.'''
    # any(point in clusterB for point in clusterA) # This returns incorrect results because numpy is funny :tm:
    for x in A:
        if np.any(np.all(x == B, axis=1)): # If all coordinates of x match with all coordinates of any point in B, return True
            return True
    return False
    

# TEST (filter value is y-coordinate)
X = {(0, 0.7): np.array([[0.1, 0.1, 0.4], [0, 0.2, 0.5], [0.7, 0.5, -0.2], [0.8, 0.6, -0.1]]),
     (0.3, 1): np.array([[0.7, 0.5, -0.2], [0.8, 0.6, -0.1], [0.2, 0.9, 0.8], [0.1, 0.8, 0.9]]),
     (0.6, 1.3): np.array([[0.8, 0.6, -0.1], [0.2, 0.9, 0.8], [0.1, 0.8, 0.9], [-0.6, 1.2, -0.5], [-0.7, 1.1, -0.4]])}
cluster_dict = {}
for cover_interval, points in X.items():
    cluster_dict[cover_interval] = cluster_kmeans(points, 2)
print(cluster_dict)

compute_graph(cluster_dict)

  colormap = cm.get_cmap('viridis', len(cluster_dict))


{(0, 0.7): {(0.75, 0.55, -0.15): array([[ 0.7,  0.5, -0.2],
       [ 0.8,  0.6, -0.1]]), (0.04999999999999999, 0.15000000000000002, 0.44999999999999996): array([[0.1, 0.1, 0.4],
       [0. , 0.2, 0.5]])}, (0.3, 1): {(0.15000000000000002, 0.8500000000000001, 0.85): array([[0.2, 0.9, 0.8],
       [0.1, 0.8, 0.9]]), (0.75, 0.55, -0.15000000000000002): array([[ 0.7,  0.5, -0.2],
       [ 0.8,  0.6, -0.1]])}, (0.6, 1.3): {(0.36666666666666664, 0.7666666666666667, 0.5333333333333334): array([[ 0.8,  0.6, -0.1],
       [ 0.2,  0.9,  0.8],
       [ 0.1,  0.8,  0.9]]), (-0.65, 1.15, -0.45000000000000007): array([[-0.6,  1.2, -0.5],
       [-0.7,  1.1, -0.4]])}}


2it [00:00, 3949.44it/s]


<class 'pyvis.network.Network'> |N|=6 |E|=4

### Full algorithm

In [7]:
### FULL MAPPER ALGORITHM ##
def mapper(X, filter_alg='height', cluster_alg = 'kmeans', bag_nr = 1):
    '''Runs the mapper algorithm on point cloud dataset X. Make sure to have run the cells above.
    
    Parameters
    ------------
    X : The point cloud dataset. Points are assumed to be arrays of the form [x,y,z].
    filter_alg : 'height' # @Timo add new ones here
        The filter to use. 
    cluster_alg : 'kmeans', 'linkage' or 'dbscan'
        The clustering algorithm to use.
    bag_nr : index of the tree (included in file name in order to not overwrite files for different bags)
    
    Outputs
    ------------
    Creates a file mapper_output.html containing the output graph. 
    If it does not open automatically, open it in your browser manually.
    '''
    print("### INITIALIZING MAPPER ###")
    file_name = 'MAPPER_BAG={}'.format(bag_nr)  # Parameter values are stored in the filename

    # 1: Calculate filter values
    match filter_alg:
        case 'height':
            filter_fn = lambda point: filter_height(point)
            file_name += '_FILTER=height'
        # @Timo add new filters here (see also clustering algorithm below for inspiration)
        case _:
            raise Exception("Filter function not recognized!")

    filter_vals = []
    for point in X:
        filter_vals.append(filter_fn(point))
    fmin = min(filter_vals)
    fmax = max(filter_vals)
    #print("Filter values: ", filter_vals) # DEBUG
    
    # 2: Cover parameter space with open intervals
    l = 0.4 # @Timo to get a better-looking picture, you may want to tweak these l and p parameters depending on the filter values
    p = 0.4
    cover_dict = cover_intervals(fmin, fmax, l, p)
    file_name += '_PARTITION=intervals_l={}_p={}'.format(l, p)
    #print("Cover of parameter space: ", cover_dict) # DEBUG

    # 3: Assign points in X to covering sets based on these intervals
    print("GENERATING COVER FOR X")
    X_cover = apply_covering(X, filter_vals, cover_dict)
    #print("Cover of X: ", X_cover) # DEBUG

    # 4: Apply clustering on each covering set
    print("CLUSTERING COVERING SETS")

    # Select clustering algorithm based on user input
    match cluster_alg: 
        case 'kmeans':
            k = 12 # Define kmeans function args here
            cluster_fn = lambda points : cluster_kmeans(points, k)
            file_name += '_CLUSTER=kmeans_k={}'.format(k)
        case 'linkage':
            distance_threshold = 2 # Define linkage function args here
            cluster_fn = lambda points : cluster_linkage(points, distance_threshold)
            file_name += '_CLUSTER=linkage_dthreshold={}'.format(distance_threshold)
        case 'dbscan':
            eps = 0.1 # Define dbscan function args here
            min_samples = 3
            cluster_fn = lambda points : cluster_dbscan(points, eps, min_samples)
            file_name += '_CLUSTER=dbscan_eps={}_minsamples={}'.format(eps, min_samples)
        case _:
            raise Exception("Clustering algorithm not recognized!")
    
    X_clustered = {}
    for cover_interval, points in tqdm(X_cover.items()):
        X_clustered[cover_interval] = cluster_fn(np.array(points))
    #print("X with covering sets clustered: ", X_clustered) # DEBUG

    # 5: Use clusters to compute mapper graph
    print("COMPUTING GRAPH")
    file_name += '.html'
    graph = compute_graph(X_clustered)
    graph.prep_notebook()
    graph.show('mapper_outputs/' + file_name, notebook=True)

    # @Kishan add function here to show the graph overlayed over the point cloud.
    # Nodes correspond to clusters and should be plotted in 3D at the cluster center's coordinate.
    
    print("Succesfully generated graph ", file_name)
    print("If it does not open automatically, open the html in your browser.")
    print("###########################")

# TEST
#X = [[0.1, 0.1, 0.4], [0, 0.2, 0.5], [0.7, 0.5, -0.2], [0.8, 0.6, -0.1], 
#     [0.2, 0.9, 0.8], [0.1, 0.8, 0.9], [-0.6, 1.2, -0.5], [-0.7, 1.1, -0.4]]
#mapper(X)

# Experiments

In [8]:
# Now with an actual point cloud dataset (code modified from view_pointcloud_with_superpoints.py)
import open3d as o3d

bag_nr = 27  # SET BAG NR HERE

FILE = "data/bag_{}/cloud_final.ply".format(bag_nr)
VOXEL = 0.002                     # 2 mm down-sample (tweak)

# --- LOAD ---
pcd = o3d.io.read_point_cloud(FILE)
print(pcd)                        # point count

# --- DOWNSAMPLE & COLOR BY HEIGHT ---
pcd = pcd.voxel_down_sample(VOXEL)
pts = np.asarray(pcd.points)
print(pts)

# Generate superpoints
def superpoint_selection(pts, r_super=0.1):
    bool_pts = np.zeros(pts.shape[0], dtype=bool)
    super_points = []
    while not np.all(bool_pts):
        remaining_indices = np.where(~bool_pts)[0] # Remaining Indices
        print(np.sum(~bool_pts))
        pts_remain = pts[~bool_pts] # Uncovered Points
        rand_super_pt = pts_remain[np.random.choice(pts_remain.shape[0])] # Randomly Chosen Super Point

        bool_pts_super_pt = np.sum(np.abs(pts_remain - rand_super_pt), axis=1) < r_super # Subset of Uncovered Points
        super_pt = np.mean(pts_remain[bool_pts_super_pt], axis=0) # Mean of Subset of Uncovered Points
        super_points.append(super_pt)

        bool_pts[remaining_indices[bool_pts_super_pt]] = True # Change set of Covered Points accordingly

    return np.array(super_points)

r_super = 0.05
try: # Don't regenerate superpoints if we already have them, as this takes some time
    print(super_points)
    print("Already generated superpoints")
except:
    super_points = superpoint_selection(pts, r_super)

Jupyter environment detected. Enabling Open3D WebVisualizer.
[Open3D INFO] WebRTC GUI backend enabled.
[Open3D INFO] WebRTCWindowSystem: HTTP handshake server disabled.
PointCloud with 353443 points.
[[ 0.2747663   1.2261113   1.0956359 ]
 [-0.4813996  -0.29149929  1.2761585 ]
 [ 0.28657991  1.3112668   0.6753301 ]
 ...
 [-0.50263989  0.0758308   2.2249382 ]
 [-0.48745045  0.04330866  2.2272356 ]
 [-0.49915734  0.07589275  2.2268715 ]]
348046
347893
347435
347056
346355
346176
345800
345634
345442
345174
344757
344003
343412
343198
342836
342157
342051
341999
341413
340886
340453
340230
339996
339927
339853
339713
339041
338800
338718
338249
337635
337069
336882
336514
336422
336249
335692
335341
335298
335086
334916
334712
334533
333604
333174
332464
332047
331917
331672
331097
330617
330348
329449
329343
329189
329022
328923
328749
328246
328035
327946
327834
327627
327017
326915
326538
326453
326232
325801
325534
324873
324601
324335
323866
323400
322514
322410
322299
322034
321801


In [11]:
# Run mapper algorithm
print("Nr of superpoints: ", len(super_points))

filter_alg = 'height'
cluster_alg = 'dbscan'
# Set bag nr in previous cell

mapper(super_points, filter_alg=filter_alg, cluster_alg=cluster_alg, bag_nr=bag_nr)

Nr of superpoints:  4818
### INITIALIZING MAPPER ###
GENERATING COVER FOR X


4818it [00:00, 718716.67it/s]


CLUSTERING COVERING SETS


100%|██████████| 9/9 [00:00<00:00, 161.85it/s]
  colormap = cm.get_cmap('viridis', len(cluster_dict))


COMPUTING GRAPH


8it [00:03,  2.18it/s]

mapper_outputs/MAPPER_BAG=27_FILTER=height_PARTITION=intervals_l=0.4_p=0.4_CLUSTER=dbscan_eps=0.1_minsamples=3.html
Succesfully generated graph  MAPPER_BAG=27_FILTER=height_PARTITION=intervals_l=0.4_p=0.4_CLUSTER=dbscan_eps=0.1_minsamples=3.html
If it does not open automatically, open the html in your browser.
###########################



