#**Point Transformers**

Transformers outshine convolutional neural networks and recurrent neural networks in many applications from various domains, including natural language processing, image classification and medical image segmentation. Point Transformer is introduced to establish state-of-the-art performances in 3D image data processing as another piece of evidence. Point Transformer is robust to perform multiple tasks such as 3D image semantic segmentation, 3D image classification and 3D image part segmentation.

To read about it more, please refer [this](https://analyticsindiamag.com/how-point-transformer-excels-in-3d-image-processing/) article.

## **Python Implementation of Point Transformer**

Point Transformer is available as a PyPi package. It can be simply pip installed to use in applications. Point Transformer is implemented in the PyTorch environment. Its requirements are Python 3.7+, PyTorch 1.6+ and einops 0.3+.

In [None]:
!python -m pip install pip --upgrade --user -q --no-warn-script-location
!python -m pip install numpy pandas seaborn matplotlib scipy statsmodels sklearn tensorflow keras opencv-python pillow scikit-image torch torchvision \
    point-transformer-pytorch einops --user -q --no-warn-script-location

import IPython
IPython.Application.instance().kernel.do_shutdown(True)


Import the necessary libraries and modules.

In [None]:
import torch
from point_transformer_pytorch import PointTransformerLayer 

An example implementation of a Point Transformer layer is provided in the following codes.

In [None]:
attn = PointTransformerLayer(
    dim = 128,
    pos_mlp_hidden_dim = 64,
    attn_mlp_hidden_mult = 4
)
feats = torch.randn(1, 16, 128)
pos = torch.randn(1, 16, 3)
mask = torch.ones(1, 16).bool()
attn(feats, pos, mask = mask) # (1, 16, 128) 

Number of nearest neighbors can be controlled through the corresponding argument in the PointTransformerLayer module. In the following example implementation, the number of nearest neighbors is set to 16. While processing, the layer will consider 16 nearest points in the 3D cloud space.

In [None]:
attn = PointTransformerLayer(
    dim = 128,
    pos_mlp_hidden_dim = 64,
    attn_mlp_hidden_mult = 4,
    num_neighbors = 16          
    # only the 16 nearest neighbors would be attended to for each point
)
feats = torch.randn(1, 2048, 128)
pos = torch.randn(1, 2048, 3)
mask = torch.ones(1, 2048).bool()
attn(feats, pos, mask = mask) # (1, 16, 128) 

The background source implementation of PointTransformerLayer is expressed in the following codes. The PyTorch environment is created by importing the necessary packages.

In [None]:
import torch
from torch import nn, einsum
from einops import repeat 

Helper functions for the layer development are defined as follows:

In [None]:
def exists(val):
    return val is not None
def max_value(t):
    return torch.finfo(t.dtype).max
def batched_index_select(values, indices, dim = 1):
    value_dims = values.shape[(dim + 1):]
    values_shape, indices_shape = map(lambda t: list(t.shape), (values, indices))
    indices = indices[(..., *((None,) * len(value_dims)))]
    indices = indices.expand(*((-1,) * len(indices_shape)), *value_dims)
    value_expand_len = len(indices_shape) - (dim + 1)
    values = values[(*((slice(None),) * dim), *((None,) * value_expand_len), ...)]
    value_expand_shape = [-1] * len(values.shape)
    expand_slice = slice(dim, (dim + value_expand_len))
    value_expand_shape[expand_slice] = indices.shape[expand_slice]
    values = values.expand(*value_expand_shape)
    dim += value_expand_len
    return values.gather(dim, indices) 

Finally, the layer is developed on top of PyTorch’s nn module as a Python Class. It performs masking, attention and aggregation through its forward method.

In [None]:
class PointTransformerLayer(nn.Module):
     def __init__(
         self,
         *,
         dim,
         pos_mlp_hidden_dim = 64,
         attn_mlp_hidden_mult = 4,
         num_neighbors = None
     ):
         super().__init__()
         self.num_neighbors = num_neighbors
         self.to_qkv = nn.Linear(dim, dim * 3, bias = False)
         self.pos_mlp = nn.Sequential(
             nn.Linear(3, pos_mlp_hidden_dim),
             nn.ReLU(),
             nn.Linear(pos_mlp_hidden_dim, dim)
         )
         self.attn_mlp = nn.Sequential(
             nn.Linear(dim, dim * attn_mlp_hidden_mult),
             nn.ReLU(),
             nn.Linear(dim * attn_mlp_hidden_mult, dim),
         )
     def forward(self, x, pos, mask = None):
         n, num_neighbors = x.shape[1], self.num_neighbors
         # get queries, keys, values
         q, k, v = self.to_qkv(x).chunk(3, dim = -1)
         # calculate relative positional embeddings
         rel_pos = pos[:, :, None, :] - pos[:, None, :, :]
         rel_pos_emb = self.pos_mlp(rel_pos)
         # use subtraction of queries to keys. i suppose this is a better inductive bias for point clouds than dot product
         qk_rel = q[:, :, None, :] - k[:, None, :, :]
         # prepare mask
         if exists(mask):
             mask = mask[:, :, None] * mask[:, None, :]
         # expand values
         v = repeat(v, 'b j d -> b i j d', i = n)
         # determine k nearest neighbors for each point, if specified
         if exists(num_neighbors) and num_neighbors < n:
             rel_dist = rel_pos.norm(dim = -1)
             if exists(mask):
                 mask_value = max_value(rel_dist)
                 rel_dist.masked_fill_(~mask, mask_value)
             dist, indices = rel_dist.topk(num_neighbors, largest = False)
             v = batched_index_select(v, indices, dim = 2)
             qk_rel = batched_index_select(qk_rel, indices, dim = 2)
             rel_pos_emb = batched_index_select(rel_pos_emb, indices, dim = 2)
             mask = batched_index_select(mask, indices, dim = 2) if exists(mask) else None
         # add relative positional embeddings to value
         v = v + rel_pos_emb
         # use attention mlp, making sure to add relative positional embedding first
         sim = self.attn_mlp(qk_rel + rel_pos_emb)
         # masking
         if exists(mask):
             mask_value = -max_value(sim)
             sim.masked_fill_(~mask[..., None], mask_value)
         # attention
         attn = sim.softmax(dim = -2)
         # aggregate
         agg = einsum('b i j d, b i j d -> b i d', attn, v)
         return agg 

#**Related Articles:**

> * [Point Transformers](https://analyticsindiamag.com/how-point-transformer-excels-in-3d-image-processing/)

> * [Comparison of Transfer Learning with Multi Class Classification](https://analyticsindiamag.com/practical-comparison-of-transfer-learning-models-in-multi-class-image-classification/)

> * [Fruit Recognition with CNN](https://analyticsindiamag.com/fruit-recognition-using-the-convolutional-neural-network/)

> * [Semantic Segmentation Using TensorFlow Keras](https://analyticsindiamag.com/semantic-segmentation-using-tensorflow-keras/)

> * [Convert Image to Pencil Sketch](https://analyticsindiamag.com/converting-image-into-a-pencil-sketch-in-python/)

> * [Image Classification Task with and without Data Augmentation](https://analyticsindiamag.com/image-data-augmentation-impacts-performance-of-image-classification-with-codes/)
