<a href="https://colab.research.google.com/github/cagBRT/PointCloud/blob/main/fast_point_cloud_segmentation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Fast Point Cloud Segmentation with Model Predictions

Annotating point clouds is difficult and time-consuming, especially if you have to annotate every single point (for semantic/panoptic segmentation). In order to label a large dataset, you need to have access to a large labeling team, or have a lot of patience.

Luckily, you can speed up the labeling process through model-assisted labeling. Instead of annotating every point cloud from scratch, you first train a model on a small number of labeled point clouds and then use model predictions to speed up the rest of the labeling. After correcting some of the model predictions, you can retrain your model to improve the predictions, and so on. This is called **model-assisted labeling**.

This notebook shows how you can set up model-assisted labeling on [Segments.ai](https://segments.ai?utm_source=guide&utm_medium=colab&utm_campaign=mal-pc-seg) using its simple Python SDK. As an example, we'll label a set of diverse frames of the [SemanticKITTI](http://www.semantic-kitti.org/) dataset by using model predictions from [SqueezeSegV3](https://arxiv.org/abs/2004.01803).

If you want to label objects using cuboids (3D bounding boxes) instead, take a look at our other [notebook](https://colab.research.google.com/drive/1OGHJeaVU3geXmQDuW4UdhLprbNyfbflg).

## 1. Set-up

We'll start by installing the Python SDK and cloning the demo repository from Github.

If you're using Colab, be sure to use a GPU-powered runtime, so you can run the segmentation model later. You can change your runtype by clicking on `Runtime > Change runtime type` in the top bar.

In [1]:
! pip install segments-ai -q
! git clone https://github.com/segments-ai/demo-pointcloud-segmentation.git -q
%cd demo-pointcloud-segmentation/

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/50.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.0/50.0 kB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m374.5/374.5 kB[0m [31m8.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m51.5/51.5 kB[0m [31m6.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.9/1.9 MB[0m [31m15.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for segments-ai (pyproject.toml) ... [?25l[?25hdone
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the fol

When the SDK is installed, we can initialize the client with an API key. You can find your API keys in your [account settings](https://segments.ai/account). If you don't have an account yet, you can create one on [here](https://segments.ai/join).

In [None]:
from segments import SegmentsClient
from getpass import getpass

api_key = getpass('Enter your API key: ')
client = SegmentsClient(api_key)

Next, we'll clone an example dataset containing some SemanticKITTI frames.

If you want to create a dataset containing your own point clouds, check out the code snippets [in the appendix](#create-your-own-dataset).

In [None]:
clone = client.clone_dataset(
    "admin-tobias/fast-labeling-semantickitti",
    new_name="fast-labeling-semantickitti",
    new_public=False,
)
dataset_identifier = f"{clone.owner.username}/{clone.name}"

## 2. Manual labeling

Before we can train a segmentation model, we need some data. Thus, we'll start by labeling a subset of our data manually.

Open [Segments.ai](https://segments.ai/home) in your browser and navigate to your newly created dataset. Press "Start Labeling" to open the labeling interface and label your first point cloud. If you don't know how to use the labeling interface, have a look at [the docs](https://docs.segments.ai/guides/use-the-labeling-interfaces/3d-point-cloud-segmentation-interface). Keep in mind you can use hotkeys (and [customize them](https://docs.segments.ai/guides/customize-hotkeys)) to label faster.

## 3. Train a model

Now that we have some labeled data, we can train a model. To do this, we'll first create a release of our dataset, and turn it into a [`SegmentsDataset`](https://segments-python-sdk.readthedocs.io/en/latest/dataset.html) with the labeled samples. Then we'll train a segmentation on these labeled samples.

In [None]:
release_name = "v0.1"
client.add_release(dataset_identifier, release_name)

*Creating a release can take a short while, so you might run into problems if you immediately execute the next cell. You can check the status on the releases tab of your dataset in the web interface.*

In [None]:
release = client.get_release(dataset_identifier, release_name)

In [None]:
from segments import SegmentsDataset

dataset = SegmentsDataset(
    release, labelset="ground-truth", filter_by=["LABELED", "REVIEWED"]
)

For demonstration purposes, we'll cheat and simply use a pretrained [SqueezeSegV3](https://github.com/chenfengxu714/SqueezeSegV3) model here. Run the next cell to install the requirements for this model.

In [None]:
! git clone https://github.com/chenfengxu714/SqueezeSegV3.git -q
! pip install -r SqueezeSegV3/requirements.txt -q

## 4. Generate and upload label predictions

Now that we have a trained model, we can run it on the unlabeled point clouds to generate label predictions. Then we can upload these predictions to Segments.ai to correct any mistakes our model still made.

We'll start by creating a new `SegmentsDataset` containing the unlabeled frames, and downloading the point clouds.

In [None]:
dataset = SegmentsDataset(release, labelset="ground-truth", filter_by="UNLABELED")

In [None]:
import urllib.request
import os

dataset_path = "./unlabeled_data"

download_path = os.path.join(dataset_path, "sequences", "00", "velodyne")
os.makedirs(download_path, exist_ok=True)

for sample in dataset:
    # Download each point cloud
    sample_url = sample["attributes"]["pcd"]["url"]
    urllib.request.urlretrieve(
        sample_url, os.path.join(download_path, f'{sample["name"]}.bin')
    )

Now, we can run the model on the unlabeled frames.

In [None]:
from utils import run_model

output_path = "./output"
run_model(dataset_path, output_path)

Finally, we can upload the predictions to Segments.ai using [`client.add_label()`](https://segments-python-sdk.readthedocs.io/en/latest/client.html#create-a-label).

In [None]:
from utils import get_prediction

predictions_path = os.path.join(output_path, "sequences", "00", "predictions")

for sample in dataset:
    name = sample["name"]
    label_path = os.path.join(predictions_path, name + ".label")
    annotations, point_annotations = get_prediction(label_path)

    # Upload the predictions to Segments.ai
    attributes = {
        "format_version": "0.2",
        "annotations": annotations,
        "point_annotations": point_annotations,
    }
    client.add_label(
        sample["uuid"], "ground-truth", attributes, label_status="PRELABELED"
    )

## 5. Correct and repeat

Now go back to [Segments.ai](https://segments.ai/home) and click the "Start labeling" button to start labeling again. This time, your job is quite a bit easier: instead of having to label each image from scratch, you can simply correct the mistakes your model made.

After labeling some more images, you can go back to step 4 and retrain your model. This way, it will become increasingly easy to label point clouds. After some iterations, you might reach a point where you're mostly just verifying the model's predictions, only having to correct the occasional mistakes on hard edge cases.

## Appendix

### Create your own dataset

Start by creating an empty dataset using [`client.add_dataset()`](https://segments-python-sdk.readthedocs.io/en/latest/client.html#create-a-dataset). You have to specify a name, the task type (point cloud segmentation), and the categories you want to annotate in the point clouds.

In [None]:
name = "my-point-clouds"
task_type = "pointcloud-segmentation"
task_attributes = {
    "format_version": "0.1",
    "categories": [
        {"name": "ground", "id": 1},
        {"name": "obstacle", "id": 2},
    ],
}

dataset = client.add_dataset(name, task_type=task_type, task_attributes=task_attributes)

Next, import your data. You can either upload point cloud files to Segments.ai's asset storage service, or pass URLs to the files on other cloud buckets. You can find more information in ([the docs](https://docs.segments.ai/guides/import-data)).

Segments.ai currently supports [PCD](https://docs.segments.ai/reference/sample-types/supported-file-formats#pcd-point-cloud-data) and [binary XYZI(R)](https://docs.segments.ai/reference/sample-types/supported-file-formats#binary-xyzi-r-kitti-nuscenes) files.

In [None]:
# Upload a local file
filename = "ca9a282c9e77460f8360f564131a8af5_nuscenes.bin"

with open(f"path/to/{filename}", "rb") as f:
    asset = client.upload_asset(f, filename=filename)

point_cloud_url = asset.url

In [None]:
dataset = "tobias-admin/my-point-clouds"

name = "ca9a282c9e77460f8360f564131a8af5_nuscenes"

attributes = {
    "pcd": {"url": point_cloud_url, "type": "nuscenes"},
}

sample = client.add_sample(dataset, name, attributes)