# Introduction
In this project, you will be asked to implement [PointNet](https://arxiv.org/abs/1612.00593) architecture and train a classification network (left) and a segmentation network (middle).
![title](img/cls_sem.jpg)

### Grading Points
* Task 1.1 - 5
* Task 1.2 - 5
* Task 2.1 - 10
* Task 2.2 - 5
* Task 2.3 - 5
* Task 2.4 - 5
* Task 2.5 - 5
* Task 2.6 - 10
* Task 2.7 - 5
* Task 2.8 - 10
* Task 2.9 - 10
* Task 2.10 - 5 
* Task 2.11 - 5
* Task 2.12 - 5
* Task 2.13 - 10

In [1]:
# autoreload reloads modules automatically before entering the execution of code typed at the IPython prompt.
%load_ext autoreload

In [2]:
%autoreload 2?

In [1]:
import random
import math
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.utils.data as data

from torchvision.transforms import Compose

import dataset # custom dataset for ModelNet10 and ShapeNet

# 1. Data Loading

Usually, we write the point cloud as $X\in\mathbb{R}^{N\times 3}$. While in programming, we use `B x 3 x N` layout, where `B` is the batch-size and `N` is the number of points in a single point cloud.

## 1.1 Jitter the position of each points by a zero mean Gaussian
For input $X\in\mathbb{R}^{N\times 3}$, we transform $X$ by $X \leftarrow X + \mathcal{N}(0, \sigma^2)$.

In [2]:
class RandomJitter(object):
    def __init__(self, sigma):
        self.sigma = sigma
        
    def __call__(self, data):
        ## hint: useful function `torch.randn` and `torch.randn_like`
        ## TASK 1.1
        ## This function takes as input a point cloud of layout `3 x N`, 
        ## and output the jittered point cloud of layout `3 x N`.
#       torch.randn mean=0, variance=1
        distribution = torch.empty(data.shape).normal_(mean=0,std=self.sigma)
        data = data + distribution
        
        return data

In [None]:
## random generate data and test your transform here

In [5]:
randomJitter = RandomJitter(0.5)

In [6]:
randomJitter.__call__(torch.randn(3,16))

tensor([[-0.3520,  0.5141,  0.2169, -2.6240, -0.4214,  0.4747, -0.0183, -0.3933,
          0.7387, -0.8958, -0.2121,  1.9848,  0.2685, -1.2039,  1.6044, -0.9679],
        [-0.9044,  0.1868, -0.5167, -0.9403, -0.7028, -0.1019,  0.6148, -1.0564,
         -0.8722,  1.3013, -1.1787, -0.0405,  1.0275,  0.3444, -1.9572,  0.9243],
        [ 0.1299, -1.1597, -0.5881, -0.3834, -0.8663,  0.1068, -0.7399, -0.1256,
         -1.3237, -1.5778, -0.0985, -0.5886,  1.2592, -0.8350, -0.1757,  2.6175]])

## 1.2 Rotate the object along the z-axis randomly
For input $X\in\mathbb{R}^{N\times 3}$, we rotate all points along z-axis (up-axis) by a degree $\theta$.


Suppose $T$ is the transformation matrix,
$$X\leftarrow XT,$$
where $$T=\begin{bmatrix}\cos\theta & \sin\theta & 0 \\ -\sin\theta & \cos\theta & 0 \\ 0 & 0 & 1 \end{bmatrix}.$$

$[x,y,z]->[xcos\theta-ysin\theta, xsin\theta-ycos\theta, z]$

In [3]:
class RandomZRotate(object):
    def __init__(self, degrees):
        ## here `self.degrees` is a tuple (0, 360) which defines the range of degree
        self.degrees = degrees
        
    def __call__(self, data):
        ## TASK 1.2
        ## This function takes as input a point cloud of layout `3 x N`, 
        ## and output the rotated point cloud of layout `3 x N`.
        ##
        ## The rotation is along z-axis, and the degree is uniformly distributed
        ## between [0, 360]
        ##
        ## hint: useful function `torch.randn`， `torch.randn_like` and `torch.matmul`
        ##
        ## Notice:   
        ## Different from its math notation `N x 3`, the input has size of `3 x N`
        degree = torch.LongTensor(1).random_(self.degrees[0], self.degrees[1])
#         print(degree)
        d= 2 * np.pi * (degree/360)
        T = torch.Tensor([[np.cos(d), np.sin(d),0],
                         [-np.sin(d),np.cos(d),0],
                         [0,0,1]])
        data = torch.matmul(T,data)
        
        return data

In [28]:
## random generate data and test your transform here

In [8]:
randomZRotate = RandomZRotate((0,360))

In [9]:
randomZRotate.__call__(torch.randn(3,16))

tensor([[ 0.7704,  1.1422, -0.6398,  0.4064, -1.0655, -1.3393,  0.5662,  0.6398,
         -0.1455,  1.3356, -1.0211, -1.1103,  0.8489,  0.1762,  0.7887, -0.5296],
        [-1.2052,  0.0097,  0.2326, -0.9892,  0.0609,  1.3887,  1.2696,  0.2261,
         -2.5262,  0.9990,  0.2980,  0.4682,  0.8535,  0.4311, -0.5221, -1.3735],
        [ 1.3465,  1.0376,  0.5592,  1.5081, -0.5739, -0.3559,  0.8704,  0.0045,
          0.2031, -0.1129, -0.6363, -2.5462, -0.4523,  0.2583, -0.3420, -0.0498]])

## 1.3 Load dataset ModelNet10 for Point Cloud Classification

### ModelNet10
By loading this dataset, we have data of shape `B x 3 x N` and label of shape `B`.

In [4]:
# It may taske some time to download and pre-process the dataset.
train_transform = Compose([RandomZRotate((0, 360)), RandomJitter(0.02)])
train_cls_dataset = dataset.ModelNet(root='./ModelNet10', transform=train_transform, train=True)
test_cls_dataset = dataset.ModelNet(root='./ModelNet10', train=False)
train_cls_loader = data.DataLoader(
    train_cls_dataset,
    batch_size=16,
    shuffle=True,
    num_workers=1,
)
test_cls_loader = data.DataLoader(
    test_cls_dataset,
    batch_size=1,
    shuffle=False,
    num_workers=1,
)

In [5]:
print(train_cls_dataset.num_classes)

10


## ShapeNet
By loading this dataset, we have data of shape `B x 3 x N` and target of shape `B x N`.

Here is the list of categories:
['Airplane', 'Bag', 'Cap', 'Car', 'Chair', 'Earphone', 'Guitar', 'Knife', 'Lamp', 'Laptop', 'Motorbike', 'Mug', 'Pistol', 'Rocket', 'Skateboard', 'Table']

<font color="green">classification for every point</font>

In [6]:
## Here as an example, we choose the cateogry 'Airplane'
category = 'Airplane'
train_seg_dataset = dataset.ShapeNet(root='./ShapeNet', category=category, train=True)
test_seg_dataset = dataset.ShapeNet(root='./ShapeNet', category=category, train=False)
train_seg_loader = data.DataLoader(
    train_seg_dataset,
    batch_size=16,
    shuffle=True,
    num_workers=1,
)
test_seg_loader = data.DataLoader(
    test_seg_dataset,
    batch_size=1,
    shuffle=False,
    num_workers=1,
)

In [7]:
print(train_seg_dataset.num_classes)

5


# 2 PointNet Architecture (Read Section 4.2 and Appendix C)
In this section, you will be asked to implement classification and segmentation step by step.
![pointnet](img/pointnet.jpg)

## 2.1 Joint Alignment Network 
This mini-network takes as input matrix of size $N \times K$, and outputs a transformation matrix of size $K \times K$. 

In programming, the input size of this module is `B x K x N` and output size is `B x K x K`.

For the shared MLP, use structure like this `(FC(64), BN, ReLU, FC(128), BN, ReLU, FC(1024), BN, ReLU)`.

For the MLP after global max pooling, use structure like this `(FC(512), BN, ReLU, FC(256), BN, ReLU, FC(K*K)`.


In [8]:
class Transformation(nn.Module):
    def __init__(self, k=3):
        super(Transformation, self).__init__()
        
        self.k = k
        
        ## TASK 2.1
        
        ## define your network layers here
        ## shared mlp
        ## input size: B x K x N
        ## output size: B x 1024 x N
        ## hint: you may want to use `nn.Conv1d` here. Why?
        # depthwise groups=64? one filter will work on all of channels seperately, thus the number of output channels will be k*input_channels
        self.share_mlp = nn.Sequential(nn.Conv1d(k, 64, 1, stride=1),
                                      nn.BatchNorm1d(64),
                                      nn.ReLU(),
                                      nn.Conv1d(64, 64*2,1, stride=1),
                                      nn.BatchNorm1d(64*2),
                                      nn.ReLU(),
                                      nn.Conv1d(64*2, 64*16, 1, stride=1),
                                      nn.BatchNorm1d(64*16),
                                      nn.ReLU())
        ## define your network layers here
        ## mlp
        ## input size: B x 1024
        ## output size: B x (K*K)
        self.mlp = nn.Sequential(nn.Linear(64*16, 64*8), 
                                 nn.BatchNorm1d(64*8),
                                 nn.ReLU(),
                                 nn.Linear(64*8, 64*4),
                                 # batch size should be larger than 1, otherwise there will have an error
                                 nn.BatchNorm1d(64*4),
                                 nn.ReLU(),
                                 nn.Linear(64*4, k**2))
        
    
    def forward(self, x):
        B, K, N = x.shape # batch-size, dim, number of points
        ## TASK 2.1
        self.k = K
        ## forward of shared mlp
        # input - B x K x N
        # output - B x 1024 x N
        x = self.share_mlp(x)
#         ## global max pooling
#         # input - B x 1024 x N
#         # output - B x 1024
        x = nn.MaxPool1d(N)(x)
        x = x.view(-1,1024)
        
#         ## mlp
#         # input - B x 1024
#         # output - B x (K*K)
        x = self.mlp(x)
        
        ## reshape the transformation matrix to B x K x K
        identity = torch.eye(self.k, device=x.device)
        x = x.view(B, self.k, self.k) + identity[None]
        return x

In [None]:
## random generate data and test this network

In [31]:
transformation = Transformation()

In [32]:
transformation(torch.randn(5,3,8)).shape

torch.Size([5, 3, 3])

## 2.2 Regularization Loss
$$L_{reg}=\|I-TT^\intercal\|^2_F$$

The output of `Transformation` network is of size `B x K x K`. The module `OrthoLoss` has no trainable parameters, only computes this norm.

In [9]:
class OrthoLoss(nn.Module):
    def __init__(self):
        super(OrthoLoss, self).__init__()
        
    def forward(self, x):
        ## hint: useful function `torch.bmm` and `torch.matmul`
#         Performs a batch matrix-matrix product of matrices
        ## TASK 2.2
        ## compute the matrix product
        prod = torch.bmm(x,torch.transpose(x, 1, 2))
        norm = torch.norm(prod - torch.eye(x.shape[1], device=x.device)[None], dim=(1,2))
        return norm.mean()

In [None]:
## random generate data and test this network

In [12]:
orthoLoss = OrthoLoss()

In [25]:
(-1.7844)**2+(-1.3648)**2

5.0467623999999995

In [24]:
orthoLoss(torch.randn(2,2,2))

tensor([[[-1.7844, -1.3648],
         [-0.6096, -1.7659]],

        [[-1.1674, -0.6116],
         [-0.6308, -0.4505]]])
tensor([[[-1.7844, -0.6096],
         [-1.3648, -1.7659]],

        [[-1.1674, -0.6308],
         [-0.6116, -0.4505]]])
tensor([[[5.0470, 3.4978],
         [3.4978, 3.4899]],

        [[1.7370, 1.0120],
         [1.0120, 0.6009]]])


tensor(4.2588)

## 2.3 Feature Network
In this subsection, you will be asked to implement the feature network (the top branch).

Local features are a matrix of size `B x 64 x N`, which will be used in the segmentation task.

Global features are a matrix of size `B x 1024`, which will be used in the classification task.

In [10]:
class Feature(nn.Module):
    def __init__(self, alignment=False):
        super(Feature, self).__init__()
        
        self.alignment = alignment
        
        ## `input_transform` calculates the input transform matrix of size `3 x 3`
        if self.alignment:
            self.input_transform = Transformation(3)
        
        ## TASK 2.3
        ## define your network layers here
        ## local feature
        ## shared mlp
        ## input size: B x 3 x N
        ## output size: B x 64 x N
        ## hint: you may want to use `nn.Conv1d` here.
        self.local_feature = nn.Sequential(nn.Conv1d(3, 64, 1, stride=1),
                                      nn.BatchNorm1d(64),
                                      nn.ReLU(),
                                      nn.Conv1d(64, 64, 1, stride=1),
                                      nn.BatchNorm1d(64),
                                      nn.ReLU())
        ## `feature_transform` calculates the feature transform matrix of size `64 x 64`
        if self.alignment:
            self.feature_transform = Transformation(64)
        
        ## TASK 2.4
        ## define your network layers here
        ## global feature
        ## shared mlp
        ## input size: B x 64 x N
        ## output size: B x 1024 x N
        self.global_feature = nn.Sequential(
                              nn.Conv1d(64, 64*2,1, stride=1),
                              nn.BatchNorm1d(64*2),
                              nn.ReLU(),
                              nn.Conv1d(64*2, 64*16, 1, stride=1),
                              nn.BatchNorm1d(64*16),
                              nn.ReLU())
    
    def forward(self, x):
        B,K,N = x.shape
        ## apply the input transform
        if self.alignment:
            transform = self.input_transform(x)
            ## TASK 2.5
            ## apply the input transform
            x = torch.bmm(transform,x)
    

        ## TASK 2.3
        ## forward of shared mlp
        # input - B x K x N
        # output - B x 64 x N
        x = self.local_feature(x)
        
        if self.alignment:
            transform = self.feature_transform(x)
            ## TASK 2.5
            ## apply the feature transform
            x = torch.bmm(transform,x)
        else:
            ## do not modify this line
            transform = None
        
        local_feature = x
        
        ## TASK 2.4
        ## forward of shared mlp
        # input - B x 64 x N
        # output - B x 1024 x N
        x = self.global_feature(x)
        
        
        ## TASK 2.4
        ## global max pooling
        # input - B x 1024 x N
        # output - B x 1024
        x = nn.MaxPool1d(N)(x)
        x = x.view(-1,1024)
        global_feature = x
        ## summary:
        ## global_feature: B x 1024
        ## local_feature: B x 64 x N
        ## transform: B x K x K
        return global_feature, local_feature, transform

In [None]:
## random generate data and test this network

In [46]:
feature = Feature(True)

In [47]:
result = feature(torch.randn(3,3,5))

In [48]:
## transform: B x K x K the size of transform??
result[2].shape

torch.Size([3, 64, 64])

## 2.4 Classification Network
In this network, you will use the global features generated by the `Feature` network defined above.

In [11]:
class Classification(nn.Module):
    def __init__(self, num_classes, alignment=False):
        super(Classification, self).__init__()
                
        self.feature = Feature(alignment=alignment)
        
        ## TASK 2.6
        ## define your network layers here
        ## mlp
        ## input size: B x 1024
        ## output size: B x num_classes=10
        self.mlp = nn.Sequential(nn.Linear(64*16, 64*8), 
                         nn.BatchNorm1d(64*8),
                         nn.ReLU(),
                         nn.Linear(64*8, 64*4),
                         # batch size should be larger than 1, otherwise there will have an error
                         nn.BatchNorm1d(64*4),
                         nn.ReLU(),
                         nn.Linear(64*4, 64*2),
                         nn.BatchNorm1d(64*2),
                         nn.ReLU(),
                         nn.Linear(64*2, 64),
                         nn.BatchNorm1d(64),
                         nn.ReLU(),
                         nn.Linear(64, num_classes))

    def forward(self, x):
        # x is the global feature matrix
        # the size of global_feature: B x 1024
        # here we don't use local feature matrix
        x, _, trans = self.feature(x)
        
        ## TASK 2.6
        ## forward of mlp
        # input - B x 1024
        # output - B x num_classes        
        x = self.mlp(x)
        ## x: B x num_classes
        ## trans: B x K x K
        # add a sigmoid
        return x, trans

In [None]:
## random generate data and test this network

In [52]:
# num_classes=10
classification = Classification(10)
result = classification(torch.randn(3,3,2))

In [53]:
result[0].shape

torch.Size([3, 10])

### 2.4.1 Train this network on ModelNet10

In [12]:
# main train function for classification
def train_cls(train_loader, test_loader, network, optimizer, epochs, scheduler):
    reg = OrthoLoss()
    for epoch in range(epochs):
        print('Epoch:[{:02d}/{:02d}]'.format(epoch+1, epochs))
        print('Training...')
        network.train()
        train_loss = 0
        correct = 0
        for batch, (pos, label) in enumerate(train_loader):
            network.zero_grad()
            pos, label = pos.cuda(), label.cuda()
            
            ## TASK 2.7
            ## forward propagation
            output, trans = network(pos)
            loss = loss = nn.CrossEntropyLoss()(output,label)
            ##########
            
            ## regularizer
            if trans is not None:
                loss += reg(trans) * 0.001

            pred = output.max(1)[1]
            correct += pred.eq(label).sum().item()

            loss.backward()
            optimizer.step()
            train_loss += loss.item()
            print('\rIter: [{:03d}/{:03d}] Loss: {:.4f}'.format(batch+1, len(train_loader), loss.item()), end='', flush=True)
        
        scheduler.step()
        print('\nAverage Train Loss: {:.4f}; Train Acc: {:.4f}'.format(train_loss/len(train_loader), correct/len(train_loader.dataset) * 100))
        
        print('\nTesting...')
        with torch.no_grad():
            network.eval()
            test_loss = 0
            correct = 0
            for batch, (pos, label) in enumerate(test_loader):
                pos, label = pos.cuda(), label.cuda()
    
                ## TASK 2.7
                ## forward propagation
                output, trans = network(pos)
                loss = nn.CrossEntropyLoss()(output,label)
                ##########

                if trans is not None:
                    loss += reg(trans) * 0.001

                pred = output.max(1)[1]
                correct += pred.eq(label).sum().item()

                test_loss += loss.item()
                print('\rIter: [{:03d}/{:03d}] Loss: {:.4f}'.format(batch+1, len(test_loader), loss.item()), end='', flush=True)

            print('\nAverage Test Loss: {:.4f}; Test Acc: {:.4f}'.format(test_loss/len(test_loader), correct/len(test_loader.dataset) * 100))
        print('-------------------------------------------')


In [11]:
network = Classification(10, alignment=True).cuda()
epochs = 60 # you can change the value to a small number for debugging

## TASK 2.8
# see Appendix C
# choose an optimizer and an initial learning rate
# lr=0.001 is the default value
optimizer = torch.optim.Adam(network.parameters(),lr=0.001)
# # choose a lr scheduler
# The learning rate is divided by 2 every 20 epochs
scheduler = torch.optim.lr_scheduler.StepLR(optimizer,20,gamma=0.5)
#######3

# start training
train_cls(train_cls_loader, test_cls_loader, network, optimizer, epochs, scheduler)

Epoch:[01/60]
Training...
Iter: [250/250] Loss: 2.5480
Average Train Loss: 1.3437; Train Acc: 60.9622

Testing...
Iter: [908/908] Loss: 0.2261
Average Test Loss: 1.1003; Test Acc: 68.2819
-------------------------------------------
Epoch:[02/60]
Training...
Iter: [250/250] Loss: 0.9268
Average Train Loss: 0.9260; Train Acc: 72.5883

Testing...
Iter: [908/908] Loss: 0.6264
Average Test Loss: 1.3723; Test Acc: 55.0661
-------------------------------------------
Epoch:[03/60]
Training...
Iter: [250/250] Loss: 2.0243
Average Train Loss: 0.8142; Train Acc: 75.4197

Testing...
Iter: [908/908] Loss: 0.3723
Average Test Loss: 0.7383; Test Acc: 73.7885
-------------------------------------------
Epoch:[04/60]
Training...
Iter: [250/250] Loss: 0.9356
Average Train Loss: 0.6601; Train Acc: 80.2305

Testing...
Iter: [908/908] Loss: 0.1054
Average Test Loss: 0.7166; Test Acc: 72.0264
-------------------------------------------
Epoch:[05/60]
Training...
Iter: [250/250] Loss: 1.0837
Average Train Los

Iter: [908/908] Loss: 0.00358
Average Test Loss: 0.3067; Test Acc: 89.6476
-------------------------------------------
Epoch:[37/60]
Training...
Iter: [250/250] Loss: 0.5369
Average Train Loss: 0.1196; Train Acc: 95.9659

Testing...
Iter: [908/908] Loss: 0.00630
Average Test Loss: 0.4259; Test Acc: 86.8943
-------------------------------------------
Epoch:[38/60]
Training...
Iter: [250/250] Loss: 0.9147
Average Train Loss: 0.1356; Train Acc: 95.7905

Testing...
Iter: [908/908] Loss: 0.0016
Average Test Loss: 0.3755; Test Acc: 87.0044
-------------------------------------------
Epoch:[39/60]
Training...
Iter: [250/250] Loss: 0.1215
Average Train Loss: 0.1306; Train Acc: 95.8657

Testing...
Iter: [908/908] Loss: 0.0018
Average Test Loss: 0.2585; Test Acc: 91.6300
-------------------------------------------
Epoch:[40/60]
Training...
Iter: [250/250] Loss: 0.0555
Average Train Loss: 0.1151; Train Acc: 96.1914

Testing...
Iter: [908/908] Loss: 0.0057
Average Test Loss: 0.8041; Test Acc: 77.6

In [None]:
# dropout can be considered to added to outcome the overfitting problem

### Report the best test accuracy you can get.

<table>
  <tr>
    <th>Date</th>
    <th>Best test accuracy</th>
  </tr>
    <tr>
    <th>2019-11-21</th>
    <th>91.8502</th>
  </tr>
</table>

## 2.5 Segmentation Network
In this network, you will use the global features and local features generated by the `Feature` network defined above.

The global feature matrix is of size `B x 1024` and the local feature matrix is of size `B x 64 x N`.

They need to be stacked together to a new matrix of size `B x 1088 x n` (How?). 

In [13]:
# main train function for classification
class Segmentation(nn.Module):
    def __init__(self, num_classes, alignment=False):
        super(Segmentation, self).__init__()
               
        self.feature = Feature(alignment=alignment)

        ## TASK 2.9
        ## shared mlp
        ## input size: B x 1088 x N
        ## output size: B x num_classes x N
        self.shared_mlp = nn.Sequential(nn.Conv1d(1088, 512, 1, stride=1),
                              nn.BatchNorm1d(512),
                              nn.ReLU(),
                              nn.Conv1d(512, 256, 1, stride=1),
                              nn.BatchNorm1d(256),
                              nn.ReLU(),
                              nn.Conv1d(256, 128, 1, stride=1), 
                              nn.BatchNorm1d(128), 
                              nn.ReLU(), 
                              nn.Conv1d(128, num_classes, 1, stride=1),
                              nn.Softmax(dim=1))
        
    def forward(self, x):
        g, l, trans = self.feature(x)
        _,_,N = l.shape
        ## TASK 2.10
        # concat global features and local features to a single matrix
        # g - B x 1024, global features
        # l - B x 64 x N, local features
        # x - B x 1088 x N, concatenated features
        g = g.view(-1,1024,1)
        g = torch.repeat_interleave(g, repeats=N, dim=2)
        x = torch.cat((l, g),dim=1)
        ## TASK 2.9
        ## forward of shared mlp
        # input - B x 1088 x N
        # output - B x num_classes x N  
        x = self.shared_mlp(x)
        
        return x, trans

In [None]:
## random generate data and test this network

In [92]:
segmentation = Segmentation(5)
# B x 3 x N 
segmentation(torch.randn(2,3,10))[0]

torch.Size([2, 64, 10])
torch.Size([2, 1024])
torch.Size([2, 1024, 10])
torch.Size([2, 1088, 10])


tensor([[[0.3085, 0.3580, 0.2055, 0.3077, 0.1827, 0.2007, 0.2444, 0.1436,
          0.2685, 0.2685],
         [0.1180, 0.0993, 0.1350, 0.1060, 0.1260, 0.1242, 0.2211, 0.2402,
          0.1519, 0.1605],
         [0.1610, 0.1450, 0.1800, 0.1362, 0.1675, 0.2704, 0.0992, 0.2245,
          0.1378, 0.1410],
         [0.1881, 0.2316, 0.1807, 0.1233, 0.2530, 0.1836, 0.1218, 0.1965,
          0.1605, 0.1336],
         [0.2244, 0.1660, 0.2988, 0.3267, 0.2707, 0.2212, 0.3136, 0.1951,
          0.2813, 0.2965]],

        [[0.1676, 0.2040, 0.1574, 0.2405, 0.2126, 0.1016, 0.1611, 0.2288,
          0.1782, 0.1799],
         [0.1393, 0.1075, 0.1346, 0.1307, 0.0588, 0.0962, 0.1500, 0.1068,
          0.1625, 0.1115],
         [0.1833, 0.2392, 0.2997, 0.2236, 0.1437, 0.2246, 0.2608, 0.1987,
          0.1709, 0.1799],
         [0.1369, 0.2200, 0.0913, 0.1767, 0.2773, 0.3039, 0.1483, 0.1531,
          0.1404, 0.2840],
         [0.3730, 0.2293, 0.3171, 0.2286, 0.3076, 0.2737, 0.2799, 0.3125,
          0.348

### 2.5.1 Calculating Intersection over Union (IoU) 
For 2D image, the IoU is calculated as follows,
![iou](img/iou.png)

How is it used in the literature of point clouds?

In [41]:
## TASK 2.11
# implement the helper functions to calculate the IoU
def get_i_and_u(pred, target, num_classes):
    """Calculate intersection and union between pred and target.
    
    pred -- B x N matrix
    target -- B x N matrix
    num_classes -- number of classes
    
    return i, u
    i -- B x N binary matrix, intersection, i[b, n] equals 1 if and only if it is a true-positive.
    u -- B x N binary matrix, union, u[b, n] equals 0 if and only if it is a true-negative
    """
    ## TASK 2.11
    ## calculate i and u here
    ## hint: useful function `F.one_hot`    
    ## hint: use element-wise logical tensor operation (`&` and `|`)
    target_onehot = F.one_hot(target, num_classes=num_classes)
    pre_onehot = F.one_hot(pred, num_classes=num_classes)
    
    i = torch.sum((target_onehot & pre_onehot).type(torch.float64), dim=1)
    u = torch.sum((target_onehot | pre_onehot).type(torch.float64), dim=1)

    return i, u

def get_iou(pred, target, num_classes):
    """Calculate IoU
    pred -- B x N matrix
    target -- B x N matrix
    num_classes -- number of classes
    
    return iou
    iou -- B matrix, iou[b] is the IoU of b-th point cloud in this batch
    """
    
    ## use the helper function `i_and_u` defined above
    i, u = get_i_and_u(pred, target, num_classes)
    
    ## TASK 2.11
    ## calculate iou
    iou = torch.sum(i,dim=1) / torch.sum(u,dim=1)
    
    return iou

In [48]:
a = torch.LongTensor([[2,3],[1,2]])
b = torch.LongTensor([[2,3],[1,2]])
num_classes = 5

In [49]:
a_ = F.one_hot(a, num_classes=num_classes)

In [50]:
b_ = F.one_hot(b, num_classes=num_classes)

In [51]:
print(a_) # target
print(b_) # predict

tensor([[[0, 0, 1, 0, 0],
         [0, 0, 0, 1, 0]],

        [[0, 1, 0, 0, 0],
         [0, 0, 1, 0, 0]]])
tensor([[[0, 0, 1, 0, 0],
         [0, 0, 0, 1, 0]],

        [[0, 1, 0, 0, 0],
         [0, 0, 1, 0, 0]]])


In [52]:
get_i_and_u(a, b, 5)

(tensor([[0., 0., 1., 1., 0.],
         [0., 1., 1., 0., 0.]], dtype=torch.float64),
 tensor([[0., 0., 1., 1., 0.],
         [0., 1., 1., 0., 0.]], dtype=torch.float64))

In [53]:
get_iou(a, b, 5) # largest iou is one

tensor([1., 1.], dtype=torch.float64)

### 2.5.2 Train this network on ShapeNet

In [54]:
# main train function for segmentation
def train_seg(train_loader, test_loader, network, optimizer, epochs, scheduler):  
    reg = OrthoLoss()
    for epoch in range(epochs):
        print('Epoch:[{:02d}/{:02d}]'.format(epoch+1, epochs))
        print('Training...')
        network.train()
        train_loss = 0
        correct = 0
        total = 0
        ious = []
        for batch, (pos, label) in enumerate(train_loader):
            network.zero_grad()
            pos, label = pos.cuda(), label.cuda()
            ## TASK 2.12
            ## forward propagation
            output, trans = network(pos)
            loss = nn.CrossEntropyLoss()(output,label.squeeze())
            ##########
            if trans is not None:
                loss += reg(trans) * 0.001        

            pred = output.max(1)[1]
            # calculate the correction 
            correct += pred.eq(label).sum().item()
            total += label.numel()

            loss.backward()
            optimizer.step()
            train_loss += loss.item()

#             ious += [get_iou(pred, label, train_loader.dataset.num_classes)]
            print('\rIter: [{:03d}/{:03d}] Loss: {:.4f}'.format(batch+1, len(train_loader), loss.item()), end='', flush=True)
        
        scheduler.step()
        print('\nAverage Train Loss: {:.4f}; Train Acc: {:.4f}'.format(train_loss/len(train_loader), correct/total * 100))
#         print('\nAverage Train Loss: {:.4f}; Train Acc: {:.4f}; Train mean IoU: {:.4f}'.format(train_loss/len(train_loader), correct/total * 100, torch.cat(ious, dim=0).mean().item()))

        print('\nTesting...')
        with torch.no_grad():
            network.eval()
            test_loss = 0
            correct = 0
            total = 0
            ious = []
            for batch, (pos, label) in enumerate(test_loader):
                pos, label = pos.cuda(), label.cuda()
                
                ## TASK 2.12
                ## forward propagation
                output, trans = network(pos)
                loss = nn.CrossEntropyLoss()(output,label)
                ##########
                
                if trans is not None:
                    loss += reg(trans) * 0.001   

                pred = output.max(1)[1]
                correct += pred.eq(label).sum().item()
                total += label.numel()

                test_loss += loss.item()

                ious += [get_iou(pred, label, train_loader.dataset.num_classes)]
                print('\rIter: [{:03d}/{:03d}] Loss: {:.4f}'.format(batch+1, len(test_loader), loss.item()), end='', flush=True)
            print('\nAverage Test Loss: {:.4f}; Test Acc: {:.4f}'.format(test_loss/len(test_loader), correct/total * 100))

            print('\nAverage Test Loss: {:.4f}; Test Acc: {:.4f}; Test mean IoU: {:.4f}'.format(test_loss/len(test_loader), correct/total * 100, torch.cat(ious, dim=0).mean().item()))
        print('-------------------------------------------')

In [None]:
network = Segmentation(train_seg_dataset.num_classes, alignment=True).cuda()
epochs = 12 # you can change the value to a small number for debugging
# Training parameters are the same as the classiﬁcation network.
## TASK 2.13
# see Appendix C
# choose an optimizer and an initial learning rate
optimizer = torch.optim.Adam(network.parameters(),lr=0.001)
# # choose a lr scheduler
# The learning rate is divided by 2 every 20 epochs
scheduler = torch.optim.lr_scheduler.StepLR(optimizer,20,gamma=0.5)
#######3

train_seg(train_seg_loader, test_seg_loader, network, optimizer, epochs, scheduler)

Epoch:[01/12]
Training...
Iter: [147/147] Loss: 1.1134
Average Train Loss: 1.1840; Train Acc: 76.6360

Testing...
Iter: [341/341] Loss: 1.0973
Average Test Loss: 1.1420; Test Acc: 77.2640

Average Test Loss: 1.1420; Test Acc: 77.2640; Test mean IoU: 0.6351
-------------------------------------------
Epoch:[02/12]
Training...
Iter: [147/147] Loss: 1.0977
Average Train Loss: 1.1130; Train Acc: 80.8585

Testing...
Iter: [341/341] Loss: 1.0536
Average Test Loss: 1.1153; Test Acc: 80.1705

Average Test Loss: 1.1153; Test Acc: 80.1705; Test mean IoU: 0.6781
-------------------------------------------
Epoch:[03/12]
Training...
Iter: [147/147] Loss: 1.1047
Average Train Loss: 1.1018; Train Acc: 81.3716

Testing...
Iter: [341/341] Loss: 1.0599
Average Test Loss: 1.0977; Test Acc: 81.6090

Average Test Loss: 1.0977; Test Acc: 81.6090; Test mean IoU: 0.6965
-------------------------------------------
Epoch:[04/12]
Training...
Iter: [147/147] Loss: 1.1003
Average Train Loss: 1.0992; Train Acc: 81.

### Report the best test mIoU you can get.

0.7175