# Implementation of PointNet
We will use the tensorflow.keras Functional API to build PointNet from the original paper: “PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation” by Charles R. Qi*, Hao Su*, Kaichun Mo,and Leonidas J. Guibas. (* indicates equal contributions)

## Architecture diagram
![architecture.png](architecture.png)

## Implementation details
In the paper we can get the implementation details of PointNet classification and segmentation arhitecture where they share a portion of architecture (shown in below figure)

**PointNet classification**

(1) Input: the input is Nx3.(Note: additional features can be added for each point, the input can be denoted as NxD)

(2)	Input transform. Apply a T-Net module (which outputs a 3x3 transformation matrix) to standardize the input.

(3)	Two point-wise convolution. shared mlp(64,64)

(4)	Feature transform. Apply a T-Net module (which outputs a 64x64 transformation matrix) to standardize the feature.

(5)	Three point-wise convolution. shared mlp(64,128,1024)

(6)	Max pooling to aggregate information over all points to gain the global descriptor (1024 vector)

(7)	Output: three fully connected layers to predict output scores. mlp(512,256,k)

Note: Batchnorm is used for all layers with ReLU, and dropout layers are used for the last fully connected layer.

**PointNet segmentation**

(1)-(6)

(8)	Concatenate the global descriptor (1024 vector) with 3rd shared mlp layer (Nx64) to gain the local-global features(Nx1088) which contain both local and global structural information

(9)	Five point-wise convolution with shared mlp(512,256,128,128,m) to output the segmentation scores.
 

## Workflow of PointNet classification
**classification**
1. import the neccesary layers
2. create the input layer
3. tranform the input using T-Net(3x3)
4. apply two point-wise convolution: mlp(64,64)
5. tranform the feature input using T-Net(64x64)
6. apply two point-wise convolution: mlp(64,128,1024)
7. Aggregate over all points to gain the global descriptor (1024 vector)
8. Classifiy with three fully connected layers: FC(512,256,k) 


### 1.imports


In [3]:
import numpy as np 
import tensorflow as tf 
from tensorflow import keras
from tnet import TNet
from utils import custom_conv, custom_dense

### 2.input

In [4]:
num_points=4096
num_channels=3
num_classes=40
bn_momentum=0.99
input = keras.Input(shape=(num_points,num_channels))
print(input.shape)

(None, 4096, 3)


### 3.tranform the input using T-Net(3x3)

In [5]:
x= TNet(add_regularization=False,bn_momentum=bn_momentum)(input)
print(x.shape)

(None, 4096, 3)


### 4.apply two point-wise convolution: mlp(64,64)

In [6]:
x = custom_conv(x,64)
x = custom_conv(x,64)
print(x.shape)

(None, 4096, 64)


### 5.tranform the feature input using T-Net(64x64)

In [7]:
x= TNet(add_regularization=True,bn_momentum=bn_momentum)(x)
print(x.shape)

(None, 4096, 64)


### 6.apply two point-wise convolution: mlp(64,128,1024)

In [8]:
x = custom_conv(x,64)
x = custom_conv(x,128)
x = custom_conv(x,1024)
print(x.shape)

(None, 4096, 1024)


### 7.Aggregate over all points to gain the global descriptor (1024 vector)

In [9]:
x= keras.layers.GlobalMaxPool1D()(x)
print(x.shape)

(None, 1024)


### 8.Classifiy with three fully connected layers: FC(512,256,k) and create the model

In [10]:
x = custom_dense(x,512)
x = custom_dense(x,256)
x = keras.layers.Dropout(rate=0.3)(x)
output = keras.layers.Dense(units=num_classes, activation='softmax')(x)
print(output.shape)

(None, 256)


In [11]:
# build the model
model = keras.models.Model(inputs=input, outputs=output)

In [12]:
model.summary()

Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_2 (InputLayer)         [(None, 4096, 3)]         0         
_________________________________________________________________
t_net (TNet)                 (None, 4096, 3)           801408    
_________________________________________________________________
conv1d (Conv1D)              (None, 4096, 64)          256       
_________________________________________________________________
batch_normalization_2 (Batch (None, 4096, 64)          256       
_________________________________________________________________
activation (Activation)      (None, 4096, 64)          0         
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 4096, 64)          4160      
_________________________________________________________________
batch_normalization_3 (Batch (None, 4096, 64)          256   

## workflow of PointNet segmentation
**segmentation**

1. 1 to 7 steps of Classification.
2. Concatenate the global descriptor (1024 vector) with 3rd shared mlp layer (Nx64) to gain the local-global features(Nx1088)
3. apply five point-wise convolution-mlp(512,256,128,128,m) to output the segmentation scores.

### 1.1- 7 steps of classifcaiton

In [13]:
# (1) input
input = keras.Input(shape=(num_points,num_channels))
# (2)	Input transform. Apply a T-Net module (which outputs a 3x3 transformation matrix) to standardize the input.
x= TNet(add_regularization=False,bn_momentum=bn_momentum)(input)

# (3)	Two point-wise convolution. shared mlp(64,64)
x = custom_conv(input,64)
x = custom_conv(x,64)

# (4)	Feature transform. Apply a T-Net module (which outputs a 64x64 transformation matrix) to standardize the feature.
x= TNet(add_regularization=True,bn_momentum=bn_momentum)(x)

# (5)	Three point-wise convolution. shared mlp(64,128,1024)
local_feat = custom_conv(x,64)
x = custom_conv(local_feat,128)
x = custom_conv(x,1024)

# (6)	Max pooling to aggregate information over all points to gain the global descriptor (1024 vector)
# TODO: compare GlobalMaxPool1D with MaxPool1D
global_feat= keras.layers.GlobalMaxPool1D()(x)
print(x)


Tensor("activation_11/Identity:0", shape=(None, 4096, 1024), dtype=float32)


### 2.Concatenate the global descriptor (1024 vector) with 3rd shared mlp layer (Nx64) to gain the local-global features(Nx1088)

In [14]:
global_feat=tf.expand_dims(global_feat,axis=1)
global_feat=tf.tile(global_feat,[1,num_points,1])
x=tf.concat([local_feat,global_feat],axis=-1)
print(x)

Tensor("concat:0", shape=(None, 4096, 1088), dtype=float32)


### 3.apply five point-wise convolution-mlp(512,256,128,128,m) to output the segmentation scores and create the model

In [15]:
x = custom_conv(x,512)
x = custom_conv(x,256)
x = custom_conv(x,128)
x = custom_conv(x,128)
output = custom_conv(x,num_classes,activation='softmax')
print(output.shape)

(None, 4096, 40)


In [16]:
# build the model
model = keras.models.Model(inputs=input, outputs=output)
model.summary()

Model: "model_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_3 (InputLayer)            [(None, 4096, 3)]    0                                            
__________________________________________________________________________________________________
conv1d_5 (Conv1D)               (None, 4096, 64)     256         input_3[0][0]                    
__________________________________________________________________________________________________
batch_normalization_13 (BatchNo (None, 4096, 64)     256         conv1d_5[0][0]                   
__________________________________________________________________________________________________
activation_7 (Activation)       (None, 4096, 64)     0           batch_normalization_13[0][0]     
____________________________________________________________________________________________

## final code

### classification

In [21]:
# (1) input
input = keras.Input(shape=(num_points,num_channels))
# (2)	Input transform. Apply a T-Net module (which outputs a 3x3 transformation matrix) to standardize the input.
x= TNet(add_regularization=False,bn_momentum=bn_momentum)(input)

# (3)	Two point-wise convolution. shared mlp(64,64)
x = custom_conv(input,64)
x = custom_conv(x,64)

# (4)	Feature transform. Apply a T-Net module (which outputs a 64x64 transformation matrix) to standardize the feature.
x= TNet(add_regularization=True,bn_momentum=bn_momentum)(x)

# (5)	Three point-wise convolution. shared mlp(64,128,1024)
x = custom_conv(x,64)
x = custom_conv(x,128)
x = custom_conv(x,1024)

# (6)	Max pooling to aggregate information over all points to gain the global descriptor (1024 vector)
# TODO: compare GlobalMaxPool1D with MaxPool1D
x= keras.layers.GlobalMaxPool1D()(x)
# (7)	Output: three fully connected layers to predict output scores. mlp(512,256,k)
x = custom_dense(x,512)
x = custom_dense(x,256)
x = keras.layers.Dropout(rate=0.3)(x)
output = keras.layers.Dense(units=num_classes, activation='softmax')(x)
# build the model
model = keras.models.Model(inputs=input, outputs=output)
model.summary()

Model: "model_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_5 (InputLayer)         [(None, 4096, 3)]         0         
_________________________________________________________________
conv1d_29 (Conv1D)           (None, 4096, 64)          256       
_________________________________________________________________
batch_normalization_51 (Batc (None, 4096, 64)          256       
_________________________________________________________________
activation_33 (Activation)   (None, 4096, 64)          0         
_________________________________________________________________
conv1d_30 (Conv1D)           (None, 4096, 64)          4160      
_________________________________________________________________
batch_normalization_52 (Batc (None, 4096, 64)          256       
_________________________________________________________________
activation_34 (Activation)   (None, 4096, 64)          0   

### segmentation

In [22]:
# (1) input
input = keras.Input(shape=(num_points,num_channels))
# (2)	Input transform. Apply a T-Net module (which outputs a 3x3 transformation matrix) to standardize the input.
x= TNet(add_regularization=False,bn_momentum=bn_momentum)(input)

# (3)	Two point-wise convolution. shared mlp(64,64)
x = custom_conv(input,64)
x = custom_conv(x,64)

# (4)	Feature transform. Apply a T-Net module (which outputs a 64x64 transformation matrix) to standardize the feature.
x= TNet(add_regularization=True,bn_momentum=bn_momentum)(x)

# (5)	Three point-wise convolution. shared mlp(64,128,1024)
local_feat = custom_conv(x,64)
x = custom_conv(local_feat,128)
x = custom_conv(x,1024)

# (6)	Max pooling to aggregate information over all points to gain the global descriptor (1024 vector)
# TODO: compare GlobalMaxPool1D with MaxPool1D
global_feat= keras.layers.GlobalMaxPool1D()(x)

(7)
global_feat=tf.expand_dims(global_feat,axis=1)
global_feat=tf.tile(global_feat,[1,num_points,1])
x=tf.concat([local_feat,global_feat],axis=-1)

(8)
x = custom_conv(x,512)
x = custom_conv(x,256)
x = custom_conv(x,128)
x = custom_conv(x,128)
output = custom_conv(x,num_classes,activation='softmax')

# build the model
model = keras.models.Model(inputs=input, outputs=output)
model.summary()

Model: "model_4"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_6 (InputLayer)            [(None, 4096, 3)]    0                                            
__________________________________________________________________________________________________
conv1d_34 (Conv1D)              (None, 4096, 64)     256         input_6[0][0]                    
__________________________________________________________________________________________________
batch_normalization_62 (BatchNo (None, 4096, 64)     256         conv1d_34[0][0]                  
__________________________________________________________________________________________________
activation_40 (Activation)      (None, 4096, 64)     0           batch_normalization_62[0][0]     
____________________________________________________________________________________________

## model diagram

### classification



### segmentation
