<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Introduction" data-toc-modified-id="Introduction-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Introduction</a></span></li><li><span><a href="#State-of-the-art" data-toc-modified-id="State-of-the-art-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>State of the art</a></span><ul class="toc-item"><li><span><a href="#Point-Cloud-Methods" data-toc-modified-id="Point-Cloud-Methods-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Point Cloud Methods</a></span></li></ul></li><li><span><a href="#Taxonomy-of-point-based-methods-for-semantic-segmentation" data-toc-modified-id="Taxonomy-of-point-based-methods-for-semantic-segmentation-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Taxonomy of point based methods for semantic segmentation</a></span></li><li><span><a href="#3D-scene-semantic-segmentation-using-pointnet-in-pytorch" data-toc-modified-id="3D-scene-semantic-segmentation-using-pointnet-in-pytorch-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>3D scene semantic segmentation using pointnet in pytorch</a></span></li></ul></div>

# Introduction

Point clouds are the simplest representation of an object. It can be generated using:
 * LIDAR
 * Infrared
 * Stereo
 
Many challenges like: resolution, occlusion and noise

Unlike the images, pointcloud is an unordered set of points and convolution cannot be applied

# State of the art

Recently, a lot of effort has been put into bridging the success from 2D scene
understanding into the 3D world. 

* **Voxelized Methods:** The straightforward approach of applying CNNs in the 3D space is implemented by preprocessing the point cloud into a voxel representation first in order to apply 3D convolutions on that new representation. However 3D convolutions have drawbacks. Memory and computational time grows cubicly on the number of voxels, restricting approaches to use coarse voxels grids. However, by doing so, one then introduces discretization artifacts (especially for thin structures) and loose geometric information such as point density. 

* **Point Cloud Methods:** Methods directly operating on the point cloud representation produce promising results. Point cloud methods are a class of approaches that directly process unstructured 3D point clouds, rather than voxelizing or projecting them onto a regular grid. One of the most influential methods in this group is PointNet, which introduced the idea of extracting point features using a sequence of MLPs processing the points individually, followed by a max-pooling operation that describes the points globally. 

    Subsequent methods have built on this approach by partitioning the point cloud space in more meaningful ways, such as using octrees or kd-trees. Others have incorporated local geometry and surface information into the feature extraction process through clustering, hierarchical grouping of points, or graph neural networks.

    The advantage of point cloud methods is that they can preserve the original spatial structure of the data without the need for voxelization or projection. This allows for more accurate representation of the underlying geometry and better preservation of fine-grained details. However, point cloud methods can be more computationally expensive due to the lack of structure and the need for additional processing steps to group nearby points and extract features.
    
## Point Cloud Methods 

3D point cloud segmentation is the process of classifying point clouds into different homogeneous regions such that the points in the same isolated and meaningful region have similar properties. 3D segmentation is a challenging task because of high redundancy, uneven sampling density, and lack of explicit structure of point cloud data. The segmentation of point clouds into foreground and background is a fundamental step in processing 3D point clouds.

3D point cloud segmentation can be deployed at 

1. **scene level (semantic segmentation)**
2. **object-level (instance segmentation)**
3. **part level (part segmentation)**

Semantic segmentation is a technique that detects for each pixel, the object category that it belongs to and also treats multiple objects of the same class as a single entity.

# Taxonomy of point based methods for semantic segmentation

![taxonomy](images/taxonomy.png)

The taxonomy for various point-based 3D semantic segmentation techniques can be given by 4 paradigms as 

    (a) Point-wise MLP
    (b) Point Convolution
    (c) RNN-based
    (d) Graph-based.



# 3D scene semantic segmentation using pointnet in pytorch

* Install PyTorch: The first step is to install PyTorch, which is a popular deep learning library that includes support for 3D data. You can follow the installation instructions provided on the PyTorch website to install the library.

* Install other necessary packages: Depending on the specific requirements of your project, you may need to install other packages such as NumPy, Open3D, and PyVista. These packages can help you work with 3D data and perform various preprocessing steps.

* Load and preprocess data: Once you have installed the necessary packages, you can start loading and preprocessing your 3D data. This might involve converting point cloud data into a format that can be used by PointNet, as well as performing other preprocessing steps such as normalization or data augmentation.

* Implement PointNet: Next, you can start implementing the PointNet model in PyTorch. You can find the original PointNet implementation on the GitHub repository of the author, or you can find pre-existing implementations on other open-source repositories. You can then modify the code to suit your specific requirements and integrate it with your data loading and preprocessing code.

* Train the model: Once you have implemented the PointNet model, you can start training it on your data. You will need to specify a loss function, an optimizer, and other hyperparameters, as well as define a training loop that iteratively updates the model weights based on the training data.

* Evaluate the model: After training the model, you can evaluate its performance on a separate validation set or test set. This can involve computing various metrics such as accuracy, precision, recall, and F1 score.

* Fine-tune the model: Depending on the results of your evaluation, you may need to fine-tune the model by adjusting hyperparameters, changing the architecture, or performing other optimizations.