## PointCloudsSegmentation: Kmeans and DBSCAN

* This project aims to explore two popular clustering algorithms, K-means and DBSCAN, for segmenting point clouds.
* Point clouds are widely used in various fields, such as computer vision, robotics, and geospatial data analysis.
* Segmentation of point clouds is essential for object recognition, scene understanding, and more.

### Project Objectives

- Implement the K-means clustering algorithm to segment point clouds.
- Implement the DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithm for point cloud segmentation.
- Evaluate the performance of both algorithms in terms of accuracy, runtime, and robustness.

## Data

* Shapenet dataset https://shapenet.org/ for testing and evaluating both algorithms.
* The chosen class is the Airplanes Class.
* Each airplane's ground truth labels are 4 clusters: head, tail, wings and body of the airplane.

## The objective is to be able to cluster the planes into their 3 body parts with clustering results shown visually.

## Implementation

The project will be implemented using Python and popular libraries such as open3d for point clouds manipukation, NumPy, and scikit-learn for K-means, and scikit-learn for DBSCAN.

## Evaluation

To assess the performance of the clustering algorithms, we will use the following metrics:

- Silhouette Score
- Adjusted Rand Index (ARI)
- Runtime Analysis
- jaccard_scores


## Results

The results of the segmentation are presented using both K-means and DBSCAN and their performance is compared based on the chosen evaluation metrics.

## Conclusion

The project aims to provide insights into the effectiveness of K-means and DBSCAN for point cloud segmentation.




In [None]:
!pip install open3d

#### 1. Kmeans

In [None]:
# Necessary Imports
import numpy as np
#pip install open3d
import open3d as o3d
from sklearn.cluster import KMeans, DBSCAN, OPTICS
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler

In [None]:
import keras

In [None]:
# Download Shapenet dataset of point clouds
dataset_url = "https://git.io/JiY4i"

dataset_path = keras.utils.get_file(
    fname="shapenet.zip",
    origin=dataset_url,
    cache_subdir="datasets",
    hash_algorithm="auto",
    extract=True,
    archive_format="auto",
    cache_dir="datasets",
)

In [None]:
dataset_path

In [None]:
import zipfile
import os

In [None]:
# Define the directory where you want to extract the contents
zip_file_path = dataset_path
extracted_dir = 'dataset'
# Extract the zip file
with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
    zip_ref.extractall(extracted_dir)



In [None]:
import json

# View dataset metadata
with open("/tmp/.keras/datasets/PartAnnotation/metadata.json") as json_file:
    metadata = json.load(json_file)

print(metadata)

In [None]:
Airplane_pts_path ='/content/dataset/PartAnnotation/02691156/points/'

In [None]:
# List the files in the extracted directory
Airplane_pcd_files = os.listdir(Airplane_pts_path)

In [None]:
Airplane_pcd_files[0:5]

In [None]:
# Define the output directory for .pcd files
output_directory = 'Data_pcd/'
# Create the output directory if it doesn't exist
os.makedirs(output_directory, exist_ok=True)

In [None]:
# Loop through the list of .pts files
for filename in Airplane_pcd_files[:100]:  # Choose the first 100 files
    # Construct the full path to the .pts file
    pts_path = os.path.join(Airplane_pts_path, filename)

    # Read the .pts file (assuming it contains XYZ points)
    with open(pts_path, 'r') as file:
        lines = file.readlines()

    # Parse the points from the .pts file
    points = []
    for line in lines:
        if line.strip():  # Ignore empty lines
            x, y, z = map(float, line.split())
            points.append([x, y, z])

    # Create an Open3D point cloud
    point_cloud = o3d.geometry.PointCloud()
    point_cloud.points = o3d.utility.Vector3dVector(np.array(points))

    # Define the path to the output .pcd file
    pcd_path = os.path.join(output_directory, f'{filename[:-4]}.pcd')

    # Save the point cloud in .pcd format
    o3d.io.write_point_cloud(pcd_path, point_cloud)

    print(f"Conversion complete for {filename}. Point cloud saved as {pcd_path}")

In [None]:
# List all .pcd files in the directory
pcd_files = [os.path.join(output_directory, filename) for filename in os.listdir(output_directory) if filename.endswith('.pcd')][10:20]

# Create an empty list to store the loaded point clouds
point_clouds = []

# Read and append the point clouds to the list
for pcd_file in pcd_files:
    pcd = o3d.io.read_point_cloud(pcd_file)
    point_clouds.append(pcd)
    #o3d.visualization.draw_plotly([pcd],window_name=pcd_file)



In [None]:

# Directory containing ground truth labels
labels_directory = '/content/dataset/PartAnnotation/02691156/expert_verified/points_label'

# List of .seg files in the directory
# Each file is linked to each pcd in order.
seg_files = [os.path.join(labels_directory, filename) for filename in os.listdir(labels_directory) if filename.endswith('.seg')]



# Initialize a dictionary to store ground truth labels for each point cloud
ground_truth_labels = {}
# Load ground truth labels for each point cloud
for seg_file in seg_files[:100]:
    # Extract the name
    filename = os.path.splitext(os.path.basename(seg_file))[0]

    with open(seg_file, 'r') as file:
        labels = [int(line.strip()) for line in file]

    ground_truth_labels[filename] = labels

In [None]:
ground_truth_labels.keys()

In [None]:
# Get points and transform it to a numpy array:
points = [np.asarray(point.points).copy() for point in point_clouds]

In [None]:
points

In [None]:
ground_truth_labels.keys()

In [None]:
n_clusters =4

In [None]:
# Iterate through each point cloud
for i in range(len(points)):
  # Normalization
  scaled_points = StandardScaler().fit_transform(points[i])

  # Clustering with K-Means
  kmeans_model = KMeans(n_clusters=n_clusters).fit(scaled_points)

  # Get labels
  cluster_labels = kmeans_model.labels_

  # Get the number of colors
  n_clusters = len(set(cluster_labels))

  # Mapping the labels classes to a color map
  colors = plt.get_cmap("tab20")(cluster_labels / (n_clusters if n_clusters > 0 else 1))

  # Attribute to noise the black color
  colors[cluster_labels < 0] = [0, 0, 0,0]

  # Update points colors
  pcd.colors = o3d.utility.Vector3dVector(colors[:, :3])

  # Display the individual point cloud
  o3d.visualization.draw_plotly([pcd])

In [None]:
# Iterate through each point cloud
for i, pcd in enumerate(point_clouds):
  # Normalization
  scaled_points = StandardScaler().fit_transform(points[i])

  # Clustering with K-Means
  kmeans_model = KMeans(n_clusters=n_clusters).fit(scaled_points)

  # Get labels
  cluster_labels = kmeans_model.labels_

  # Get the number of colors
  n_clusters = len(set(cluster_labels))

  # Mapping the labels classes to a color map
  colors = plt.get_cmap("tab20")(cluster_labels / (n_clusters if n_clusters > 0 else 1))

  # Attribute to noise the black color
  colors[cluster_labels < 0] = [0, 0, 0,0]

  # Update points colors
  pcd.colors = o3d.utility.Vector3dVector(colors[:, :3])

  # Display the individual point cloud
  o3d.visualization.draw_plotly([pcd])

In [None]:
# Define the output directory for .pcd files
new_directory = 'New_pcd/'
# Create the output directory if it doesn't exist
os.makedirs(new_directory, exist_ok=True)

In [None]:
# Generate new points (example: random points)
new_points = np.random.rand(10, 3)  # 100 new points with (x, y, z) coordinates
new_point_clouds =[]
# Loop through the list of point clouds and add new points to each
for i, point_cloud in enumerate(point_clouds):
    # Append the new points to the existing point cloud
    combined_points = np.vstack((points[i], new_points))
    point_cloud.points = o3d.utility.Vector3dVector(combined_points)

    # Save the updated point cloud to a new file or overwrite the original
    o3d.io.write_point_cloud(f"New_pcd/updated_point_cloud{i+1}.pcd", point_cloud)
    o3d.visualization.draw_plotly([point_cloud])
    new_point_clouds.append(point_cloud)


In [None]:
new_points = [np.asarray(point.points).copy() for point in new_point_clouds]

In [None]:
# Iterate through each point cloud
for i in range(len(new_points)):
  # Normalization
  scaled_points = StandardScaler().fit_transform(new_points[i])

  # Clustering with K-Means
  kmeans_model = KMeans(n_clusters=n_clusters).fit(scaled_points)

  # Get labels
  cluster_labels = kmeans_model.labels_

  # Get the number of colors
  n_clusters = len(set(cluster_labels))

  # Mapping the labels classes to a color map
  colors = plt.get_cmap("tab20")(cluster_labels / (n_clusters if n_clusters > 0 else 1))

  # Attribute to noise the black color
  colors[cluster_labels < 0] = [0, 0, 0,0]

  # Update points colors
  pcd.colors = o3d.utility.Vector3dVector(colors[:, :3])

  # Display the individual point cloud
  o3d.visualization.draw_plotly([pcd])

#### 2. DBSCAN

In [None]:
# Initialize DBSCAN parameters
eps = 0.3  # Maximum distance between two samples for one to be considered as in the neighborhood of the other
min_samples = 10  # The number of samples (or total weight) in a neighborhood for a point to be considered as a core point

# Iterate through each point cloud
for i, pcd in enumerate(point_clouds):
    # Normalization
    scaled_points = StandardScaler().fit_transform(points[i])

    # Clustering with DBSCAN
    dbscan_model = DBSCAN(eps=eps, min_samples=min_samples).fit(scaled_points)

    # Get labels
    cluster_labels = dbscan_model.labels_

    # Get the number of colors
    n_clusters = len(set(cluster_labels))

    # Mapping the labels classes to a color map
    colors = plt.get_cmap("tab20")(cluster_labels / (n_clusters if n_clusters > 0 else 1))

    # Attribute to noise the black color
    colors[cluster_labels == -1] = [0, 0, 0,0]

    # Update points colors
    pcd.colors = o3d.utility.Vector3dVector(colors[:, :3])

    # Display the individual point cloud
    o3d.visualization.draw_plotly([pcd])

In [None]:
!pip install nbconvert


In [None]:
%%shell
jupyter nbconvert --to html /content/PointCloudsSegmentation.ipynb