# 3D Data Processing

---
A.A. 2021/22 (6 CFU) - Dr. Daniel Fusaro
---


##3DP Lab4 - Point Cloud Segmentation

original paper -
[PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation](https://web.stanford.edu/~rqi/pointnet/)

dataset link: https://drive.google.com/drive/folders/1_xPLa_rMIT3ggSSnp1W5mB74mojlWqvH?usp=sharing

To add a link to the dataset in your Google Drive main folder, you need to:


1.   Click on the link
2.   Right click on "dataset"
3.   Click Add shortcut to Drive

When you will mount your drive folder in Colab you will find this folder without the need of re-uploading it.





## Segment what?

In this Laboratory you will segment a point cloud taken from the famous [Semantic-Kitti dataset](http://semantic-kitti.org/) using PointNet.

The original dataset counts about 30 labels (see next), but you will remap them to only 3:


*   Traversable (road, parking, sidewalk, ecc.)
*   Not-Traversable (cars, trucks, fences, trees, people, objects)
*   Unknown (outliers)

The remap process is done using a key-value dictionary that maps an original label to the correspondent reduced label set.




# Install/import required packages

In [2]:
# useful for visualization
## note: it's not necessary to restart the notebook environment after installation
!pip install open3d

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [4]:
# Google Colab pyTorch needs this Pillow version
## note: it's not necessary to restart the notebook environment after installation
!pip install Pillow==9.0.0

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [5]:
import numpy as np
import random
import math
import time
import struct
import os

# pyTorch imports
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms

# a nice training progress bar
from tqdm import tqdm

# visualization
import open3d as o3d
import plotly.graph_objects as go


# Connect and mount your Google Drive

In [6]:
from google.colab import drive
drive_path = '/content/drive'
drive.mount(drive_path)

Mounted at /content/drive


# Select the first GPU (if available)

In [7]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(device)

cuda:0


# General Parameters

In [8]:
numpoints = 20000 # [number of points]
max_dist = 15     # [meters]
min_dist = 4      # [meters]

# transform distances to squares (code optimization)
max_dist *= max_dist
min_dist *= min_dist

size_float = 4
size_small_int = 2

dataset_path = os.path.join(drive_path, "MyDrive", "dataset")

# Read Data utilities

In [9]:
def sample(pointcloud, labels, numpoints_to_sample):
  """
    INPUT
        pointcloud          : list of 3D points
        labels              : list of integer labels
        numpoints_to_sample : number of points to sample
  """
  tensor = np.concatenate((pointcloud, np.reshape(labels, (labels.shape[0], 1))), axis= 1)
  tensor = np.asarray(random.choices(tensor, weights=None, cum_weights=None, k=numpoints_to_sample))
  pointcloud_ = tensor[:, 0:3]
  labels_ = tensor[:, 3]
  labels_ = np.array(labels_, dtype=np.int_)
  return pointcloud_, labels_

In [10]:
def readpc(pcpath, labelpath, reduced_labels=True):
  """
    INPUT
        pcpath         : path to the point cloud ".bin" file
        labelpath      : path to the labels ".label" file
        reduced_labels : flag to select which label encoding to return
                        [True]  -> values in range [0, 1, 2]   -- default
                        [False] -> all Semantic-Kitti dataset original labels
  """

  pointcloud, labels = [], []

  with open(pcpath, "rb") as pc_file, open(labelpath, "rb") as label_file:
    byte = pc_file.read(size_float*4)
    label_byte = label_file.read(size_small_int)
    _ = label_file.read(size_small_int)

    while byte:
      x,y,z, _ = struct.unpack("ffff", byte)      # unpack 4 float values
      label = struct.unpack("H", label_byte)[0]   # unpach 1 Unsigned Short value
      
      d = x*x + y*y + z*z       # Euclidean norm

      if min_dist<d<max_dist:
          pointcloud.append([x, y, z])
          if reduced_labels:            # for reduced labels range
            labels.append(label_remap[label])
          else:                         # for full labels range
            labels.append(label)
      
      byte = pc_file.read(size_float*4)
      label_byte = label_file.read(size_small_int)
      _ = label_file.read(size_small_int)
  

  pointcloud  = np.array(pointcloud)
  labels      = np.array(labels)

  # return fixed_sized lists of points/labels (fixed size: numpoints)
  return sample(pointcloud, labels, numpoints)


# Data visualization

## Define Color Maps for visualization

Let's define some color mapping to associate an integer value to an RGB color scheme



*   **semantic_kitti_color_scheme**: original Semantic-Kitti color scheme (see [here](https://github.com/PRBonn/semantic-kitti-api/blob/master/config/semantic-kitti-all.yaml))
*   **label_remap**: remapping of Semantic-Kitti labels to "*Unknown*", "*Traversable*", "*Not-Traversable*"
*   **remap_color_scheme**: color scheme for rendering these 3 labels



In [11]:
semantic_kitti_color_scheme = {
  0 : [0, 0, 0],        # "unlabeled"
  1 : [0, 0, 255],      # "outlier"
  10: [245, 150, 100],  # "car"
  11: [245, 230, 100],  # "bicycle"
  13: [250, 80, 100],   # "bus"
  15: [150, 60, 30],    # "motorcycle"
  16: [255, 0, 0],      # "on-rails"
  18: [180, 30, 80],    # "truck"
  20: [255, 0, 0],      # "other-vehicle"
  30: [30, 30, 255],    # "person"
  31: [200, 40, 255],   # "bicyclist"
  32: [90, 30, 150],    # "motorcyclist"
  40: [255, 0, 255],    # "road"
  44: [255, 150, 255],  # "parking"
  48: [75, 0, 75],      # "sidewalk"
  49: [75, 0, 175],     # "other-ground"
  50: [0, 200, 255],    # "building"
  51: [50, 120, 255],   # "fence"
  52: [0, 150, 255],    # "other-structure"
  60: [170, 255, 150],  # "lane-marking"
  70: [0, 175, 0],      # "vegetation"
  71: [0, 60, 135],     # "trunk"
  72: [80, 240, 150],   # "terrain"
  80: [150, 240, 255],  # "pole"
  81: [0, 0, 255],      # "traffic-sign"
  99: [255, 255, 50],   # "other-object"
  252: [245, 150, 100], # "moving-car"
  253: [200, 40, 255],  # "moving-bicyclist"
  254: [30, 30, 255],   # "moving-person"
  255: [90, 30, 150],   # "moving-motorcyclist"
  256: [255, 0, 0],     # "moving-on-rails"
  257: [250, 80, 100],  # "moving-bus"
  258: [180, 30, 80],   # "moving-truck"
  259: [255, 0, 0],     # "moving-other-vehicle"
}

In [12]:
label_remap = {
  0 :  0, # "unlabeled"
  1 :  0, # "outlier"
  10:  2, # "car"
  11:  2, # "bicycle"
  13:  2, # "bus"
  15:  2, # "motorcycle"
  16:  2, # "on-rails"
  18:  2, # "truck"
  20:  2, # "other-vehicle"
  30:  2, # "person"
  31:  2, # "bicyclist"
  32:  2, # "motorcyclist"
  40:  1, # "road"
  44:  1, # "parking"
  48:  1, # "sidewalk"
  49:  1, # "other-ground"
  50:  2, # "building"
  51:  2, # "fence"
  52:  2, # "other-structure"
  60:  1, # "lane-marking"
  70:  2, # "vegetation"
  71:  2, # "trunk"
  72:  2, # "terrain"
  80:  2, # "pole"
  81:  2, # "traffic-sign"
  99:  2, # "other-object"
  252: 2, # "moving-car"
  253: 2, # "moving-bicyclist"
  254: 2, # "moving-person"
  255: 2, # "moving-motorcyclist"
  256: 2, # "moving-on-rails"
  257: 2, # "moving-bus"
  258: 2, # "moving-truck"
  259: 2, # "moving-other-vehicle"
}

In [13]:
remap_color_scheme = [
  [0, 0, 0],
  [0, 255, 0],
  [0, 0, 255]
]

In [14]:
def remap_to_bgr(integer_labels, color_scheme):
  bgr_labels = []
  for n in integer_labels:
    bgr_labels.append(color_scheme[int(n)][::-1])
  np_bgr_labels = np.array(bgr_labels)
  return np_bgr_labels

## Visualization utilities

In order to visualize colored point clouds we make use of the Python package *Open3D*.

Unfortunately, the original doesn't run on Colab.
So, we replace the drawing function with a custom one (*draw_geometries*) that allows the rendering.



In [15]:
def draw_geometries(geometries):
    graph_objects = []

    for geometry in geometries:
        geometry_type = geometry.get_geometry_type()
        
        if geometry_type == o3d.geometry.Geometry.Type.PointCloud:
            points = np.asarray(geometry.points)
            colors = None
            if geometry.has_colors():
                colors = np.asarray(geometry.colors)
            elif geometry.has_normals():
                colors = (0.5, 0.5, 0.5) + np.asarray(geometry.normals) * 0.5
            else:
                geometry.paint_uniform_color((1.0, 0.0, 0.0))
                colors = np.asarray(geometry.colors)

            scatter_3d = go.Scatter3d(x=points[:,0], y=points[:,1], z=points[:,2], mode='markers', marker=dict(size=1, color=colors))
            graph_objects.append(scatter_3d)

        if geometry_type == o3d.geometry.Geometry.Type.TriangleMesh:
            triangles = np.asarray(geometry.triangles)
            vertices = np.asarray(geometry.vertices)
            colors = None
            if geometry.has_triangle_normals():
                colors = (0.5, 0.5, 0.5) + np.asarray(geometry.triangle_normals) * 0.5
                colors = tuple(map(tuple, colors))
            else:
                colors = (1.0, 0.0, 0.0)
            
            mesh_3d = go.Mesh3d(x=vertices[:,0], y=vertices[:,1], z=vertices[:,2], i=triangles[:,0], j=triangles[:,1], k=triangles[:,2], facecolor=colors, opacity=0.50)
            graph_objects.append(mesh_3d)
        
    fig = go.Figure(
        data=graph_objects,
        layout=dict(
            scene=dict(
                xaxis=dict(visible=False),
                yaxis=dict(visible=False),
                zaxis=dict(visible=False),
                aspectmode='data'
            )
        )
    )
    fig.show()

In [16]:
def visualize3DPointCloud(np_pointcloud, np_labels):
  """
  INPUT
      np_pointcloud : numpy array of 3D points
      np_labels     : numpy array of integer labels
  """
  assert(len(np_pointcloud) == len(np_labels))

  
  pcd = o3d.geometry.PointCloud()
  v3d = o3d.utility.Vector3dVector

  # set geometry point cloud points
  pcd.points = v3d(np_pointcloud)
  
  # scale color values in range [0:1]
  pcd.colors = o3d.utility.Vector3dVector(np_labels / 255.0)

  # replace rendering function
  o3d.visualization.draw_geometries = draw_geometries

  # visualize the colored point cloud
  o3d.visualization.draw_geometries([pcd])

## Visualization Example

Let's try to visualize an example point cloud

In [17]:
# define point cloud example index and absolute paths
pointcloud_index = 700
pcpath    = os.path.join(dataset_path, "sequences", "00", "velodyne", str(pointcloud_index).zfill(6) + ".bin"  )
labelpath = os.path.join(dataset_path, "sequences", "00", "labels",   str(pointcloud_index).zfill(6) + ".label")

# load pointcloud and labels with original Semantic-Kitti labels
pointcloud, labels = readpc(pcpath, labelpath, False)
labels = remap_to_bgr(labels, semantic_kitti_color_scheme)
print("Semantic-Kitti original color scheme")
visualize3DPointCloud(pointcloud, labels)

# load pointcloud and labels with remapped labels
pointcloud, labels = readpc(pcpath, labelpath)
labels = remap_to_bgr(labels, remap_color_scheme)
print("Remapped color scheme")
visualize3DPointCloud(pointcloud, labels)

Semantic-Kitti original color scheme


Remapped color scheme


## Data Transformation

In [18]:
class Normalize(object):
    def __call__(self, pointcloud):
        assert len(pointcloud.shape)==2
        
        norm_pointcloud = pointcloud - np.mean(pointcloud, axis=0) 
        norm_pointcloud /= np.max(np.linalg.norm(norm_pointcloud, axis=1))

        return  norm_pointcloud

In [19]:
class ToTensor(object):
    def __call__(self, pointcloud):
        assert len(pointcloud.shape)==2

        return torch.from_numpy(pointcloud)

In [20]:
def default_transforms():
    return transforms.Compose([
                                Normalize(),
                                ToTensor()
                              ])

# PointCloud Dataset

`torch.utils.data.Dataset` is an abstract class representing a dataset. Your custom dataset should inherit `Dataset` and override the following methods:

* `__init__` to initialize your dataset. For example, if your dataset fits in memory, you can load the entire dataset in a list, or you can just store the list of dataset files.
* `__len__` so that len(dataset) returns the size of the dataset.
* `__getitem__` to support indexing such that `dataset[i]` can be used to get  the i-th sample

Therefore, the structure of the class is:

```
class CustomDataset(Dataset):

    def __init__(self, init_parameters, transform=None):
        self.transform = transform
        [...]

    def __len__(self):
        [...]

    def __getitem__(self, idx):
        [...]
        if self.transform:
            sample = self.transform(sample)

        return sample
```

Typically, a `transform` function is provided during initialization. This function is applied to each sample at runtime, so it is executed every time you load a sample from the dataset. This is really helpful, for example, to add random data transformation during training, such as random image rotation, random noise...

In [21]:
class PointCloudData(Dataset):
    def __init__(self, dataset_path, transform=default_transforms(), start=0, end=1000):
        """
          INPUT
              dataset_path: path to the dataset folder
              transform   : transform function to apply to point cloud
              start       : index of the first file that belongs to dataset
              end         : index of the first file that do not belong to dataset
        """
        self.dataset_path = dataset_path
        self.transforms = transform

        self.pc_path = os.path.join(self.dataset_path, "sequences", "00", "velodyne")
        self.lb_path = os.path.join(self.dataset_path, "sequences", "00", "labels")

        self.pc_paths = os.listdir(self.pc_path)
        self.lb_paths = os.listdir(self.lb_path)
        assert(len(self.pc_paths) == len(self.lb_paths))

        self.start = start
        self.end   = end

        # clip paths according to the start and end ranges provided in input
        self.pc_paths = self.pc_paths[start: end]
        self.lb_paths = self.lb_paths[start: end]

    def __len__(self):
        return len(self.pc_paths)

    def __getitem__(self, idx):
      item_name = str(idx + self.start).zfill(6)
      pcpath = os.path.join(self.pc_path, item_name + ".bin")
      lbpath = os.path.join(self.lb_path, item_name + ".label")
      
      # load points and labels
      pointcloud, labels = readpc(pcpath, lbpath)

      # transform
      torch_pointcloud  = torch.from_numpy(pointcloud)
      torch_labels      = torch.from_numpy(labels)

      return torch_pointcloud, torch_labels

# Dataset Creation

Now we can instantiate our training and test dataset objects.

In [22]:
train_ds  = PointCloudData(dataset_path, start=0,   end=100)
val_ds    = PointCloudData(dataset_path, start=100, end=120)
test_ds   = PointCloudData(dataset_path, start=120, end=150)

Creating a `Dataset` class may seem unnecessary for the most basic problems. But it really helps when the dataset and the training procedure start to get more complex.

One of the most useful benefit of defining a `Dataset` class is the possiblity to use the PyTorch `Dataloader` module.

By operating on the dataset directly, we are losing out on a lot of features by using a simple for loop to iterate over the data. In particular, we are missing out on:

* Batching the data
* Shuffling the data
* Load the data in parallel using multiprocessing workers.

`torch.utils.data.DataLoader` is an iterator which provides all these features. Parameters used below should be clear.

In [23]:
# warning: batch_size needs to be at least 2
train_loader  = DataLoader( dataset=train_ds,  batch_size=5, shuffle=True  )
val_loader    = DataLoader( dataset=val_ds,    batch_size=5, shuffle=False )
test_loader   = DataLoader( dataset=test_ds,   batch_size=1, shuffle=False )

# Network Definition

## Network Base Module

A network is defined by extending the *torch.nn.module* class. The basic structure is:

```
class Net(nn.Module):
    
    def __init__(self, input_parameters):
        super().__init__() # This executes the parent __init__ method
        [...]

    def forward(self, x, optional_parameters):
        [...]
        return out # return the output of the network
```

You need to define two methods:
*   **\_\_init\_\_**: The constructor method. This is exectuted when the object is initialized (no need to call it explicitly). Here you have to instantiate all the network's parameters. PyTorch provides utility functions to easily initialize most of the commonly used deep learning layers.
*   **forward**: Here you define the forward pass of the network, from the input *x* to the output (the method must return the network output). You just need to define the forward part, the back-propagation is automatically tracked by the framework!

In [24]:
# Multi Layer Perceptron
class MLP(nn.Module):
   def __init__(self, input_size, output_size):
     super().__init__()
     self.input_size   = input_size
     self.output_size  = output_size
     self.conv  = nn.Conv1d(self.input_size, self.output_size, 1)
     self.bn    = nn.BatchNorm1d(self.output_size)

   def forward(self, input):
     return F.relu(self.bn(self.conv(input)))

# Fully Connected with Batch Normalization
class FC_BN(nn.Module):
   def __init__(self, input_size, output_size):
     super().__init__()
     self.input_size   = input_size
     self.output_size  = output_size
     self.lin  = nn.Linear(self.input_size, self.output_size)
     self.bn    = nn.BatchNorm1d(self.output_size)

   def forward(self, input):
     return F.relu(self.bn(self.lin(input)))

class TNet(nn.Module):
   def __init__(self, k=3):
      super().__init__()
      self.k=k

      self.mlp1 = MLP(self.k, 64)
      self.mlp2 = MLP(64, 128)
      self.mlp3 = MLP(128, 1024)

      self.fc_bn1 = FC_BN(1024, 512)
      self.fc_bn2 = FC_BN(512,256)

      self.fc3 = nn.Linear(256,k*k)
    

   def forward(self, input):
      # input.shape == (batch_size,n,3)
      
      bs = input.size(0)
      xb = self.mlp1(input)
      xb = self.mlp2(xb)
      xb = self.mlp3(xb)

      pool = nn.MaxPool1d(xb.size(-1))(xb)
      flat = nn.Flatten(1)(pool)

      xb = self.fc_bn1(flat)
      xb = self.fc_bn2(xb)
      
      # initialize as identity
      init = torch.eye(self.k, requires_grad=True).repeat(bs,1,1)
      if xb.is_cuda:
        init=init.cuda()
      matrix = self.fc3(xb).view(-1,self.k,self.k) + init
      return matrix

### PointNet Module

Here you need to complete the Network Module implementing **\_\_init\_\_** and **forward** methods.

Refer to previous cells for a description of these methods and example of implementation.

> **!! Please note:**

*   layers input and output size parameters need to match with those mentioned in PointNet paper
*   use MLP and TNet modules to complete the code

In [25]:
class PointNet(nn.Module):
   def __init__(self):
        super().__init__()
        self.input_transform = TNet(k=3)
        self.feature_transform = TNet(k=64)

        self.mlp1 = MLP(3, 64)
        self.mlp2 = MLP(64, 64)
        self.mlp3 = MLP(64,64)
        self.mlp4 = MLP(64, 128)
        self.mlp5 = MLP(128, 1024)

   def forward(self, input):
        n_pts = input.size()[2]
        matrix3x3 = self.input_transform(input)
        input_transform_output = torch.bmm(torch.transpose(input, 1, 2), matrix3x3).transpose(1, 2)

        mlp1_output = self.mlp1(input_transform_output)
        mlp2_output=self.mlp2(mlp1_output)

        matrix64x64 = self.feature_transform(mlp2_output)
        feature_transform_output= torch.bmm(torch.transpose(mlp2_output, 1, 2), matrix64x64).transpose(1, 2)

        mlp3_output = self.mlp3(feature_transform_output)
        mlp4_output = self.mlp4(mlp3_output)
        mlp5_output = self.mlp5(mlp4_output)
 

        global_feature = nn.MaxPool1d(mlp5_output.size(-1))(mlp5_output)
        
        global_feature_repeated = nn.Flatten(1)(global_feature).repeat(n_pts,1,1).transpose(0,2).transpose(0,1)

        return [feature_transform_output, global_feature_repeated], matrix3x3, matrix64x64

### PointNetSeg Module

Also here you need to complete the Network Module implementing **\_\_init\_\_** and **forward** methods.

Refer to previous cells for a description of these methods and example of implementation.

> **!! Please note**

*   layers input and output size parameters need to match with those mentioned in PointNet paper
*   use MLP module to complete the code

In [26]:
class PointNetSeg(nn.Module):
    def __init__(self, classes = 3):
        super().__init__()
        self.pointnet = PointNet()

      
        self.mlp1 = MLP(1088, 512)
        self.mlp2 = MLP(512,256)
        self.mlp3 = MLP(256,128)
        self.mlp4 = MLP(128,128)
        self.mlp5 = MLP(128, classes)

        
       
        self.logsoftmax = nn.LogSoftmax(dim=1)


    def forward(self, input):
        inputs, matrix3x3, matrix64x64 = self.pointnet(input)
        stack = torch.cat(inputs, 1)
        
        

        mlp1_output = self.mlp1(stack)
        mlp2_output = self.mlp2(mlp1_output)
        mlp3_output = self.mlp3(mlp2_output) 
        mlp4_output = self.mlp4(mlp3_output)
        mlp5_output = self.mlp5(mlp4_output)
        
        
        return self.logsoftmax(mlp5_output), matrix3x3, matrix64x64
    
    

# Loss Function

This is the loss used by the model to update its weights during training loop.

For details, please refer to the PointNet paper.

In [27]:
def pointNetLoss(outputs, labels, m3x3, m64x64, alpha = 0.0001):
    criterion = torch.nn.NLLLoss()
    bs=outputs.size(0)
    id3x3 = torch.eye(3, requires_grad=True).repeat(bs,1,1)
    id64x64 = torch.eye(64, requires_grad=True).repeat(bs,1,1)
    if outputs.is_cuda:
        id3x3=id3x3.cuda()
        id64x64=id64x64.cuda()
    diff3x3 = id3x3-torch.bmm(m3x3,m3x3.transpose(1,2))
    diff64x64 = id64x64-torch.bmm(m64x64,m64x64.transpose(1,2))
    return criterion(outputs, labels) + alpha * (torch.norm(diff3x3)+torch.norm(diff64x64)) / float(bs)

# Training loop

In [28]:
pointnet = PointNetSeg()
pointnet.to(device);

In [29]:
optimizer = torch.optim.Adam(pointnet.parameters(), lr=0.005)

In [30]:
def train(model, train_loader, val_loader=None,  epochs=15, save=True):
    best_val_acc = -1.0
    for epoch in range(epochs): 
        pointnet.train()
        running_loss = 0.0

        for i, data in enumerate(train_loader, 0):
            inputs, labels = data
            inputs = inputs.to(device).float()
            labels = labels.to(device)
            optimizer.zero_grad()
            outputs, m3x3, m64x64 = pointnet(inputs.transpose(1,2))
            loss = pointNetLoss(outputs, labels, m3x3, m64x64)
            loss.backward()
            optimizer.step()

            # print statistics
            running_loss += loss.item()
            if i % 10 == 9 or True:    # print every 10 mini-batches
                    print('[%d, %5d] loss: %.3f' % (epoch + 1, i + 1, running_loss / 10))
                    running_loss = 0.0

        pointnet.eval()
        correct = total = 0

        # validation
        with torch.no_grad():
            for data in val_loader:
                inputs, labels = data
                inputs = inputs.to(device).float()
                labels = labels.to(device)
                outputs, __, __ = pointnet(inputs.transpose(1,2))
                _, predicted = torch.max(outputs.data, 1)
                
                total   += labels.size(0) * labels.size(1)
                correct += (predicted == labels).sum().item()

        print("correct", correct, "/", total)
        val_acc = 100.0 * correct / total
        print('Valid accuracy: %d %%' % val_acc)

        # save the model
        if save and val_acc > best_val_acc:
            best_val_acc = val_acc
            path = os.path.join(drive_path, "MyDrive", "pointnetmodel.yml")
            print("best_val_acc:", val_acc, "saving model at", path)
            torch.save(pointnet.state_dict(), path)

In [31]:
train(pointnet, train_loader, val_loader, save=True)

[1,     1] loss: 0.120
[1,     2] loss: 0.116
[1,     3] loss: 0.108
[1,     4] loss: 0.111
[1,     5] loss: 0.104
[1,     6] loss: 0.102
[1,     7] loss: 0.103
[1,     8] loss: 0.102
[1,     9] loss: 0.099
[1,    10] loss: 0.097
[1,    11] loss: 0.096
[1,    12] loss: 0.099
[1,    13] loss: 0.096
[1,    14] loss: 0.093
[1,    15] loss: 0.093
[1,    16] loss: 0.089
[1,    17] loss: 0.090
[1,    18] loss: 0.091
[1,    19] loss: 0.093
[1,    20] loss: 0.085
correct 5888 / 400000
Valid accuracy: 1 %
best_val_acc: 1.472 saving model at /content/drive/MyDrive/pointnetmodel.yml
[2,     1] loss: 0.086
[2,     2] loss: 0.091
[2,     3] loss: 0.084
[2,     4] loss: 0.087
[2,     5] loss: 0.081
[2,     6] loss: 0.083
[2,     7] loss: 0.085
[2,     8] loss: 0.086
[2,     9] loss: 0.082
[2,    10] loss: 0.084
[2,    11] loss: 0.083
[2,    12] loss: 0.085
[2,    13] loss: 0.083
[2,    14] loss: 0.077
[2,    15] loss: 0.080
[2,    16] loss: 0.079
[2,    17] loss: 0.081
[2,    18] loss: 0.078
[2,    

# Test
Let's compute the model test metrics

> First we need to load the best model weights





In [32]:
# create a new instantiation of PointNetSeg model
pointnet = PointNetSeg()

# load pyTorch model weights
model_path = os.path.join(drive_path, "MyDrive", "pointnetmodel.yml")
pointnet.load_state_dict(torch.load(model_path))

# move the model to cuda
pointnet.to(device)

PointNetSeg(
  (pointnet): PointNet(
    (input_transform): TNet(
      (mlp1): MLP(
        (conv): Conv1d(3, 64, kernel_size=(1,), stride=(1,))
        (bn): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (mlp2): MLP(
        (conv): Conv1d(64, 128, kernel_size=(1,), stride=(1,))
        (bn): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (mlp3): MLP(
        (conv): Conv1d(128, 1024, kernel_size=(1,), stride=(1,))
        (bn): BatchNorm1d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (fc_bn1): FC_BN(
        (lin): Linear(in_features=1024, out_features=512, bias=True)
        (bn): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (fc_bn2): FC_BN(
        (lin): Linear(in_features=512, out_features=256, bias=True)
        (bn): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True

In [33]:
def compute_stats(true_labels, pred_labels):
  unk     = np.count_nonzero(true_labels == 0)
  trav    = np.count_nonzero(true_labels == 1)
  nontrav = np.count_nonzero(true_labels == 2)

  total_predictions = labels.shape[1]*labels.shape[0]
  correct = (true_labels == pred_labels).sum().item()

  return correct, total_predictions

In [34]:
pointnet.eval()
total_correct_predictions = total_predictions = 0

start = time.time()

for i, data in tqdm(enumerate(test_loader, 0)):
  inputs, labels = data
  inputs = inputs.to(device).float()
  labels = labels.to(device)
  outputs, __, __ = pointnet(inputs.transpose(1,2))  
  _, predicted = torch.max(outputs.data, 1)

  # visualize results
  remapped_pred = remap_to_bgr(predicted[0].cpu().numpy(), remap_color_scheme)
  np_pointcloud = inputs[0].cpu().numpy()
  # visualize3DPointCloud(np_pointcloud, remapped_pred)
  
  # compute statistics
  ground_truth_labels = labels.cpu()
  predicted_labels    = predicted.cpu()
  correct, total = compute_stats(ground_truth_labels, predicted_labels)

  total_correct_predictions += correct
  total_predictions         += total

end = time.time()

# nice layout after tqdm
print()
print()

test_acc    = 100. * total_correct_predictions / total_predictions
tot_latency = end-start
avg_latency = tot_latency / len(test_loader.dataset)

print('Test accuracy:', test_acc, "%")
print('total time:',    tot_latency, " [s]")
print('avg time  :',    avg_latency, " [s]")


30it [00:21,  1.41it/s]



Test accuracy: 91.952 %
total time: 21.273188591003418  [s]
avg time  : 0.7091062863667806  [s]





> **Note**: you need to write *Test accuracy* and *avg time* in your laboratory report



If you want to experiment with other Network structure, feel free to modify the original PointNetSeg implementation.

For example, you may change layers input and output size parameters, or directly adding layers. You need to be very careful in doing such operations.

These experiments are not mandatory but mentioning them in the report will be appreciated. **Note that the report and the source code that you deliver must contain also the original version required by this laboratory.**

**So every modification you may do need to follow this cell: create new cells if needed.**

In [35]:
class PointNet(nn.Module):
   def __init__(self):
        super().__init__()
        self.input_transform = TNet(k=3)
        self.feature_transform = TNet(k=64)

        self.mlp1 = MLP(3, 64)
        self.mlp2 = MLP(64, 64)
        self.mlp3 = MLP(64, 64)
        self.mlp4 = MLP(64, 128)
        self.mlp5 = MLP(128, 1024)

   def forward(self, input):
        n_pts = input.size()[2]
        matrix3x3 = self.input_transform(input)
        input_transform_output = torch.bmm(torch.transpose(input, 1, 2), matrix3x3).transpose(1, 2)

        mlp1_output = self.mlp1(input_transform_output)
        mlp2_output = self.mlp2(mlp1_output)

        matrix64x64 = self.feature_transform(mlp2_output)
        feature_transform_output= torch.bmm(torch.transpose(mlp1_output, 1, 2), matrix64x64).transpose(1, 2)

        mlp3_output = self.mlp3(feature_transform_output)
        mlp4_output = self.mlp4(mlp3_output)
        mlp5_output = self.mlp5(mlp4_output)
 

        global_feature = nn.MaxPool1d(mlp5_output.size(-1))(mlp5_output)
        
        global_feature_repeated = nn.Flatten(1)(global_feature).repeat(n_pts,1,1).transpose(0,2).transpose(0,1)

        return [feature_transform_output, global_feature_repeated], matrix3x3, matrix64x64


class PointNetSeg(nn.Module):
    def __init__(self, classes = 3):
        super().__init__()
        self.pointnet = PointNet()

      
        self.mlp1 = MLP(1088, 512)
        self.mlp2 = MLP(512,256)
        self.mlp3 = MLP(256,128)
        self.mlp4 = MLP(128, classes)

        
       
        self.logsoftmax = nn.LogSoftmax(dim=1)


    def forward(self, input):
        inputs, matrix3x3, matrix64x64 = self.pointnet(input)
        stack = torch.cat(inputs, 1)
        
        

        mlp1_output = self.mlp1(stack)
        mlp2_output = self.mlp2(mlp1_output)
        mlp3_output = self.mlp3(mlp2_output)
        mlp4_output = self.mlp4(mlp3_output)
        
        
        return self.logsoftmax(mlp4_output), matrix3x3, matrix64x64


In [36]:
pointnet = PointNetSeg()
pointnet.to(device);
optimizer = torch.optim.Adam(pointnet.parameters(), lr=0.005)

train(pointnet, train_loader, val_loader, save=True)

[1,     1] loss: 0.123
[1,     2] loss: 0.113
[1,     3] loss: 0.104
[1,     4] loss: 0.106
[1,     5] loss: 0.111
[1,     6] loss: 0.105
[1,     7] loss: 0.099
[1,     8] loss: 0.106
[1,     9] loss: 0.104
[1,    10] loss: 0.101
[1,    11] loss: 0.102
[1,    12] loss: 0.098
[1,    13] loss: 0.098
[1,    14] loss: 0.095
[1,    15] loss: 0.097
[1,    16] loss: 0.099
[1,    17] loss: 0.093
[1,    18] loss: 0.096
[1,    19] loss: 0.093
[1,    20] loss: 0.092
correct 264598 / 400000
Valid accuracy: 66 %
best_val_acc: 66.1495 saving model at /content/drive/MyDrive/pointnetmodel.yml
[2,     1] loss: 0.092
[2,     2] loss: 0.088
[2,     3] loss: 0.092
[2,     4] loss: 0.088
[2,     5] loss: 0.085
[2,     6] loss: 0.091
[2,     7] loss: 0.087
[2,     8] loss: 0.086
[2,     9] loss: 0.080
[2,    10] loss: 0.078
[2,    11] loss: 0.082
[2,    12] loss: 0.080
[2,    13] loss: 0.077
[2,    14] loss: 0.075
[2,    15] loss: 0.083
[2,    16] loss: 0.074
[2,    17] loss: 0.075
[2,    18] loss: 0.076
[2

In [37]:
# create a new instantiation of PointNetSeg model
pointnet = PointNetSeg()

# load pyTorch model weights
model_path = os.path.join(drive_path, "MyDrive", "pointnetmodel.yml")
pointnet.load_state_dict(torch.load(model_path))

# move the model to cuda
pointnet.to(device)




pointnet.eval()
total_correct_predictions = total_predictions = 0

start = time.time()

for i, data in tqdm(enumerate(test_loader, 0)):
  inputs, labels = data
  inputs = inputs.to(device).float()
  labels = labels.to(device)
  outputs, __, __ = pointnet(inputs.transpose(1,2))  
  _, predicted = torch.max(outputs.data, 1)

  # visualize results
  remapped_pred = remap_to_bgr(predicted[0].cpu().numpy(), remap_color_scheme)
  np_pointcloud = inputs[0].cpu().numpy()
  # visualize3DPointCloud(np_pointcloud, remapped_pred)
  
  # compute statistics
  ground_truth_labels = labels.cpu()
  predicted_labels    = predicted.cpu()
  correct, total = compute_stats(ground_truth_labels, predicted_labels)

  total_correct_predictions += correct
  total_predictions         += total

end = time.time()

# nice layout after tqdm
print()
print()

test_acc    = 100. * total_correct_predictions / total_predictions
tot_latency = end-start
avg_latency = tot_latency / len(test_loader.dataset)

print('Test accuracy:', test_acc, "%")
print('total time:',    tot_latency, " [s]")
print('avg time  :',    avg_latency, " [s]")

30it [00:12,  2.34it/s]



Test accuracy: 85.49033333333334 %
total time: 12.830955982208252  [s]
avg time  : 0.4276985327402751  [s]



