# PointNets -> Frustum PointNets
The later builds on the concepts from the first one so we are going to take a look at it.

## PointNet:
Takes in point cloud as input and outputs either class labels for the entire input or per point segment/part labels for each point of the input.
Each point is processed independently and is represented by just three coordinates (x, y, z).

1. Consumes unordered point sets in 3D.
2. 3D space classification, shape part segmentation and scene semantic parsing
3. Detailed empirical and theoretical analysis on the stability and efficiency of our method
4. Illustration of 3D features computed by the selected neurons in the net and develop intuitive explanations on it's performance.

## Problem
Point Cloud: {P_i | i = 1, .... , n}
Where each point P_i is a vector of its (x, y, z) coordinates as well as additional information like color channels etc. 
For simplicity, we only used the (x,y,z) coordinates as the point's channels. 


### Euclidean Space:
Where points are represented by coordinates (one for each dimension) and the distance between the points is given by distance formula.

In Euclidean Space, the point sets have following properties: <br/>

**Unordered.** Unlike Pixel arrays in images, the point sets are random without specific order. So, the network needs to be able to consume N 3D point sets to be invariant to N! permutations of the input set in data feeding order. <br/>
**Interaction among points.** The neighboring points are important for other points because it is in euclidean space and each point is connected to other via distance. <br/>
**Invariance under transformations.** If we transform the point clouds, they should be invariant to certain transformations. For example, rotating and translating points all together should not modify the global point cloud category nor the segmentations of the points.

# PointNet Architecture

![PointNet](PointNet.png)


The PointNet Architecture uses both classification and segmentation network. The input is sampled from the point net cloud and passed into the classification network. 

**Symmetry Function for Unordered Input:**
To make the model invariant to input permutations, three strategies exist:
1. Sort input into a canonical order - 
2. Treat the input as a sequence to train an RNN and augment the training data with all kinds of permutations for the point cloud.
3. Use a simple, symmetric function to aggregate the information from each point. i.e.A symmetric function takes n vectors as input and the output is invariant to what the n vectors were. For example, + and * in binary operation are symmetric functions.

**Sorting** Issue with sorting is that there is not really an ordering that is stable with respect to the point perturbations that you can have in the high dimensional spaces. Asking for the point perturbations to keep the same order is the same as saying that the points should keep spacial proximity as the dimension reduce. 
**RNNs** Using RNNs is also not ideal because they work fine with small sequences but having them work with large sequences that point clouds are, is not ideal. 
**Symmetric function** takes in an N dimensional input and outputs in a specific way so that it is invariant of what it is intaking.
Simple model: the transformed inputs are passed into an h function which is basically the perceptron network and then we perform the activation function and max pooling. Through a collection of different h functions, we can learn different properties of the set. (basically a neural network... duh)



![functions](functions.png)

# Local and Global Information Aggregation:
The output from the f({x_1, .... x_n}) funtion forms a vector [f_1 ...., f_K], which is a global singature of the input set. We can easily perform a SVM or multi-layer perceptron classifier on global features of the set. However, we need to have information about the local as well as global features. We do so by:
After getting the global point cloud feature vector, we concatenate each of the point features with global feature. Then we extract new per point features that now contains the global as well as local information. 

# Frustum PointNets

Builds on the architecture of PointNets but also deviates from the PointNet because of some basic considerations. 
PointNet focuses on semantic segmentation of the points in the point cloud. On the contrary, frustum pointnets refers to instance segmentation and focuses on detecting a 3D object in a 3D space using PointNet architecture.

## Amodal detection: 
detecting the whole object as 3D object even though parts of it are still being covered by another object.
Frustum PointNets uses two variants of PointNets:
### Segmentation network:
detects the 3D mask of the object of interest i.e. instance segmentation
### Regression Network:
Estimates teh amodal 3D bounding box detection.

FP - Lift the 2D image to 3D data point cloud and then, use 3D techniques. 


# GOAL: 
Using **RGB-D** data, **classify** and **localize** objects in 3D space.

We do so by:
1. Converting the RGB-D data into 3D data
2. Use PointNet model architecture - with two variations - that perform classification and amodal 3D box detection. 

The object is representated by data: (x, y, z) for the center, (w, h, l) for the object dimensions, and for orientation, we just make use of the theta angle but there are also the azimuth angle and another angle.

# Frustum Proposal Generation



With a known camera project matrix, a 2D bounding box can be lifted to a frustum that defines a 3D search space for the object.
Q: Do we not lift the whole image to create a point cloud.

The frustums that we create from 2D images might not align exactly with the image plane, it may orient towards different directions. This results in the furstum showing large variations in the placement of point clouds.
Solution: We rotate the frustums toiwards a center view such that the center axis of the frustum is orthogonal to the image plane. 


# Pipeline:
Look into:
- How to create 3D point clouds from images
- How to pass those point clouds into the model, what do the LiDAR data processing models look like?
- A bit more information on object detection losses. 


In [None]:
class Solution:
    def containsDuplicate(self, nums: List[int]) -> bool:
        
        # duplicates = set()
        num_set = set(nums)
        

In [1]:
nums = [1, 2, 3, 1]
num_set = set(nums)
print(num_set)

{1, 2, 3}


In [None]:
num_set = set(nums)
if len(nums) >= len(num_set):
    
