This function, count_bumps_DBSCAN, is designed to count the number of structures in a 3D point cloud that rise above (or fall below) the flat 2D plane at height 0.  

The function requires numpy imported as np, pandas imported as pd, and the DBSCAN class imported from sklearn.cluster.

The function first removes all datapoints at height 0, then inputs the lateral coordinates (components 0 and 1) into the Density-Based Spacial Clustering Applications with Noise model, https://en.wikipedia.org/wiki/DBSCAN.  

The DBSCAN model operates by selecting a random point in the dataset, then counts the number of points which are within a distance set by the epsilon value (eps) hyperparameter.  Then, if that count is higher than the min_samples hyperparameter, the model creates a cluster and continues adding any points to that cluster that are within distance episolon of another point in the cluster.  Once there are no more points to add to the cluster, the model picks a random new point and begins the process anew.  Any points that are not within epsilon distance of a number of other points greater than min_samples will be labeled as noise with a -1 and not addeed to any cluster. 

This function does not take advantage of any data about the height of the bumps, and will count bumps of any shape (snaking hills, long walls, donuts, bumps that have bumps on them will be counted as 1).  It will also find dips below height 0 (holes and valleys). 

Please note that this function only works on datasets where the majority of datapoints are part of a flat 2D plane at height 0 and all bumps are spaced at least 6 units from the nearest bump.  

In [1]:
import numpy as np
import pandas as pd
from sklearn.cluster import DBSCAN

def count_bumps_DBSCAN(point_cloud_csv_path):
    eps = 6
    min_samples = 2
    
    # Create a DataFrame with 3 fields: 0, 1, and 2 (corresponding to x, y, and z).
    df = pd.read_csv(point_cloud_csv_path, header = None)
    
    # Remove all datapoints with component 2 having a value of 0.
    df_bumps = df[df[2] != 0]
    
    # Define Density Based Spacial Clustering Applications w/ Noise clustering model.
    # The hyperparameters for this model were manually discovered using the bumps_16.txt training data.
    dbscan_cluster = DBSCAN(eps=eps, min_samples=min_samples)
    
    # Fit model and create an array with the cluster label for each datapoint
    clusters = dbscan_cluster.fit_predict(df_bumps[[0, 1]])
    
    # Count the number of unique cluster labels.
    bump_count = np.unique(clusters)
    
    # Return that value as the number of bumps.
    return len(bump_count)

In [2]:
count_bumps_DBSCAN('bumps_16.txt')

16

In [3]:
import time

# takes ~ 2 seconds
time_start = time.time()

count = count_bumps_DBSCAN('bumps.txt')

print('Time elapsed: {} seconds'.format(time.time()-time_start))
print('Bumps counted: {}'.format(count))

Time elapsed: 2.3794944286346436 seconds
Bumps counted: 193
