# 16-825 Learning for 3D Vision
# Assignment 2: Single View to 3D

## 1. Exploring loss functions

### 1.1. Fitting a voxel grid (5 points)

In this section, I implemented binary cross entropy loss to help fit a 3D binary voxel grid.

`Source Voxel Grid`

![SegmentLocal](results/1_voxelsrc.gif "segment")

`Target Voxel Grid`

![SegmentLocal](results/1_voxelstgt.gif "segment")

## 1. Exploring loss functions

### 1.1. Fitting a voxel grid (5 points)

In this section, I implemented binary cross entropy loss using the torch.nn's BCEWithLogitsLoss() function to help fit a 3D binary voxel grid.


<div style="display: flex; justify-content: space-around; align-items: flex-start; margin: 20px 0;">
  <figure style="text-align: center; margin: 0;">
    <img src="results/1_voxelsrc.gif" alt="Source Voxel" style="height: 300px;">
    <figcaption>Source Voxel</figcaption>
  </figure>
  <figure style="text-align: center; margin: 0;">
    <img src="results/1_voxelstgt.gif" alt="Target Voxel" style="height: 300px;">
    <figcaption>Target Voxel</figcaption>
  </figure>
</div>


### 1.2. Fitting a point cloud (5 points)

In this section, I implemented chamfer loss which measures the distance between two point clouds by computing the average closest point distance between them in both directions.


<div style="display: flex; justify-content: space-around; align-items: flex-start; margin: 20px 0;">
  <figure style="text-align: center; margin: 0;">
    <img src="results/1_pointcloudsrc.gif" alt="Source Voxel" style="height: 300px;">
    <figcaption>Source Point Cloud</figcaption>
  </figure>
  <figure style="text-align: center; margin: 0;">
    <img src="results/1_pointcloudtgt.gif" alt="Target Voxel" style="height: 300px;">
    <figcaption>Target Point Cloud</figcaption>
  </figure>
</div>

### 1.3. Fitting a mesh (5 points)

In this section, I implemented laplacian loss using the pytorch3d.loss's built-in mesh_laplacian_smoothing() function to help fit a mesh.


<div style="display: flex; justify-content: space-around; align-items: flex-start; margin: 20px 0;">
  <figure style="text-align: center; margin: 0;">
    <img src="results/1_meshsrc.gif" alt="Source Voxel" style="height: 300px;">
    <figcaption>Source Mesh</figcaption>
  </figure>
  <figure style="text-align: center; margin: 0;">
    <img src="results/1_meshtgt.gif" alt="Target Voxel" style="height: 300px;">
    <figcaption>Target Mesh</figcaption>
  </figure>
</div>


## 2. Reconstructing 3D from single view

### 2.1. Image to voxel grid (20 points)

The network architecture was inspired by Pix2Vox's structure with some minor extenstions. Below shown are the input RGB, render of the predicted 3D voxel grid and a render of the ground truth mesh.

<table style="width: 100%; border-collapse: collapse; text-align: center;">
  <thead>
    <tr>
      <th style="border: 1px solid #ddd; padding: 10px;">Type</th>
      <th style="border: 1px solid #ddd; padding: 10px;">Sample 0</th>
      <th style="border: 1px solid #ddd; padding: 10px;">Sample 500</th>
      <th style="border: 1px solid #ddd; padding: 10px;">Sample 600</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="border: 1px solid #ddd; padding: 10px;">Input RGB</td>
      <td style="border: 1px solid #ddd; padding: 10px;">
        <img src="vis/mesh/0_gt_img.png" alt="Sample 0 RGB" style="display: block; margin: auto; width: 80%;">
      </td>
      <td style="border: 1px solid #ddd; padding: 10px;">
        <img src="vis/point/500_gt_img.png" alt="Sample 110 RGB" style="display: block; margin: auto; width: 80%;">
      </td>
      <td style="border: 1px solid #ddd; padding: 10px;">
        <img src="vis/mesh/600_gt_img.png" alt="Sample 250 RGB" style="display: block; margin: auto; width: 80%;">
      </td>
    </tr>
    <tr>
      <td style="border: 1px solid #ddd; padding: 10px;">Ground Truth Mesh</td>
      <td style="border: 1px solid #ddd; padding: 10px;">
        <img src="vis/vox/0_gt_mesh.gif" alt="Sample 0 Voxel" style="display: block; margin: auto; width: 80%;">
      </td>
      <td style="border: 1px solid #ddd; padding: 10px;">
        <img src="vis/vox/500_gt_mesh.gif" alt="Sample 110 Voxel" style="display: block; margin: auto; width: 80%;">
      </td>
      <td style="border: 1px solid #ddd; padding: 10px;">
        <img src="vis/vox/600_gt_mesh.gif" alt="Sample 250 Voxel" style="display: block; margin: auto; width: 80%;">
      </td>
    </tr>
    <tr>
      <td style="border: 1px solid #ddd; padding: 10px;">Voxel Grid Prediction</td>
      <td style="border: 1px solid #ddd; padding: 10px;">
        <img src="vis/vox/0_vox.gif" alt="Sample 0 Voxel" style="display: block; margin: auto; width: 80%;">
      </td>
      <td style="border: 1px solid #ddd; padding: 10px;">
        <img src="vis/vox/500_vox.gif" alt="Sample 110 Voxel" style="display: block; margin: auto; width: 80%;">
      </td>
      <td style="border: 1px solid #ddd; padding: 10px;">
        <img src="vis/vox/600_vox.gif" alt="Sample 250 Voxel" style="display: block; margin: auto; width: 80%;">
      </td>
    </tr>
    
   
  </tbody>
</table>


As seen, the voxel predictions are not up to par. I trained the model for 10,000 iterations with a learning rate of 0.001 and a batch size of 16. The results for voxel prediction were sub-optimal. Given the limited time and resources, this was the best result I could achieve.



### 2.2. Image to point cloud (20 points)

In this section I trained the model for 10,000 iterations with a learning rate of 0.001 and a batch size of 16.

 In order to reduce the loss during training, the network architecture was optimized by shifting from ReLU to LeakyReLU and adding additional layers, along with tuning hyperparameters such as `lr`, `n_points` & `batch_size`. Note that I have used 1000 points to render the point cloud. The point clouds are a bit sparse due to using lesser number of points. Given the limited time and resources, I decided to go with 1000 points to get results relatively quickly.


<table style="width: 100%; border-collapse: collapse; text-align: center;">
  <thead>
    <tr>
      <th style="border: 1px solid #ddd; padding: 10px;">Type</th>
      <th style="border: 1px solid #ddd; padding: 10px;">Sample 0</th>
      <th style="border: 1px solid #ddd; padding: 10px;">Sample 500</th>
      <th style="border: 1px solid #ddd; padding: 10px;">Sample 600</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="border: 1px solid #ddd; padding: 10px;">Input RGB</td>
      <td style="border: 1px solid #ddd; padding: 10px;">
        <img src="vis/mesh/0_gt_img.png" alt="Sample 0 RGB" style="display: block; margin: auto; width: 80%;">
      </td>
      <td style="border: 1px solid #ddd; padding: 10px;">
        <img src="vis/point/500_gt_img.png" alt="Sample 110 RGB" style="display: block; margin: auto; width: 80%;">
      </td>
      <td style="border: 1px solid #ddd; padding: 10px;">
        <img src="vis/mesh/600_gt_img.png" alt="Sample 250 RGB" style="display: block; margin: auto; width: 80%;">
      </td>
    </tr>
    <tr>
      <td style="border: 1px solid #ddd; padding: 10px;">Ground Truth Mesh</td>
      <td style="border: 1px solid #ddd; padding: 10px;">
        <img src="vis/vox/0_gt_mesh.gif" alt="Sample 0 Voxel" style="display: block; margin: auto; width: 80%;">
      </td>
      <td style="border: 1px solid #ddd; padding: 10px;">
        <img src="vis/vox/500_gt_mesh.gif" alt="Sample 110 Voxel" style="display: block; margin: auto; width: 80%;">
      </td>
      <td style="border: 1px solid #ddd; padding: 10px;">
        <img src="vis/vox/600_gt_mesh.gif" alt="Sample 250 Voxel" style="display: block; margin: auto; width: 80%;">
      </td>
    </tr>
    <tr>
      <td style="border: 1px solid #ddd; padding: 10px;">Point Cloud Prediction</td>
      <td style="border: 1px solid #ddd; padding: 10px;">
        <img src="vis/point/0_point.gif" alt="Sample 0 Voxel" style="display: block; margin: auto; width: 80%;">
      </td>
      <td style="border: 1px solid #ddd; padding: 10px;">
        <img src="vis/point/500_point.gif" alt="Sample 110 Voxel" style="display: block; margin: auto; width: 80%;">
      </td>
      <td style="border: 1px solid #ddd; padding: 10px;">
        <img src="vis/point/600_point.gif" alt="Sample 250 Voxel" style="display: block; margin: auto; width: 80%;">
      </td>
    </tr>
    
   
  </tbody>
</table>


Even though the predicted point clouds are sparse, they are able to capture the overall structure of the chair like the wider base of Sample 0 and thinner frame of Sample 500.




### 2.3. Image to mesh (20 points)

In this section I trained the model for 10,000 iterations with a learning rate of 0.001 and a batch size of 16.


<table style="width: 100%; border-collapse: collapse; text-align: center;">
  <thead>
    <tr>
      <th style="border: 1px solid #ddd; padding: 10px;">Type</th>
      <th style="border: 1px solid #ddd; padding: 10px;">Sample 0</th>
      <th style="border: 1px solid #ddd; padding: 10px;">Sample 500</th>
      <th style="border: 1px solid #ddd; padding: 10px;">Sample 600</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="border: 1px solid #ddd; padding: 10px;">Input RGB</td>
      <td style="border: 1px solid #ddd; padding: 10px;">
        <img src="vis/mesh/0_gt_img.png" alt="Sample 0 RGB" style="display: block; margin: auto; width: 80%;">
      </td>
      <td style="border: 1px solid #ddd; padding: 10px;">
        <img src="vis/point/500_gt_img.png" alt="Sample 110 RGB" style="display: block; margin: auto; width: 80%;">
      </td>
      <td style="border: 1px solid #ddd; padding: 10px;">
        <img src="vis/mesh/600_gt_img.png" alt="Sample 250 RGB" style="display: block; margin: auto; width: 80%;">
      </td>
    </tr>
    <tr>
      <td style="border: 1px solid #ddd; padding: 10px;">Ground Truth Mesh</td>
      <td style="border: 1px solid #ddd; padding: 10px;">
        <img src="vis/vox/0_gt_mesh.gif" alt="Sample 0 Voxel" style="display: block; margin: auto; width: 80%;">
      </td>
      <td style="border: 1px solid #ddd; padding: 10px;">
        <img src="vis/vox/500_gt_mesh.gif" alt="Sample 110 Voxel" style="display: block; margin: auto; width: 80%;">
      </td>
      <td style="border: 1px solid #ddd; padding: 10px;">
        <img src="vis/vox/600_gt_mesh.gif" alt="Sample 250 Voxel" style="display: block; margin: auto; width: 80%;">
      </td>
    </tr>
    <tr>
      <td style="border: 1px solid #ddd; padding: 10px;">Point Cloud Prediction</td>
      <td style="border: 1px solid #ddd; padding: 10px;">
        <img src="vis//mesh/0_mesh.gif" alt="Sample 0 Voxel" style="display: block; margin: auto; width: 80%;">
      </td>
      <td style="border: 1px solid #ddd; padding: 10px;">
        <img src="vis/mesh/500_mesh.gif" alt="Sample 110 Voxel" style="display: block; margin: auto; width: 80%;">
      </td>
      <td style="border: 1px solid #ddd; padding: 10px;">
        <img src="vis/mesh/600_mesh.gif" alt="Sample 250 Voxel" style="display: block; margin: auto; width: 80%;">
      </td>
    </tr>
    
   
  </tbody>
</table>


The visualizations show that mesh reconstruction is poor, evident from the intersecting and large mesh sizes. 



### 2.4. Quantitative comparisions(10 points)

To compare the three types quantitatively, let's look at the F1 scores for all. 

![SegmentLocal](eval_vox.png "segment")

![SegmentLocal](eval_point.png "segment")

![SegmentLocal](eval_mesh.png "segment")

The F1 score indicates a big gap among the three methods, with point clouds performing higher than voxel grids and meshes probably because of it's inherent ability to represent 3D shapes in a higher level of detail. Meshes perform worse than point clouds, failing to capture the nuances and complexity of the 3D object probably due to approximations made to create surfaces, but still, it outperforms voxel grids which, due to their binary grid-like structure oversimplifies the structure. 

To conclude, point clouds seem to be the best and most sutiable choice for representing 3D shapes in comparison to voxels and meshes, which are limited in their ability to reconstruct complex forms.



### 2.5. Analyse effects of hyperparams variations (10 points)

The hyperparameter chosen was `n_points`, which I tuned by testing with 1000, 5000 and 10000 points. All the models were run for 10,000 iterations with a learning rate of 0.001 and batch size of 16. The table below consolidates the results I achieved. 



<table style="width: 100%; border-collapse: collapse; text-align: center;">
  <thead>
    <tr>
      <th style="border: 1px solid #ddd; padding: 10px;">Type</th>
      <th style="border: 1px solid #ddd; padding: 10px;">Sample 0 - 1000 points</th>
      <th style="border: 1px solid #ddd; padding: 10px;">Sample 0 - 5000 points</th>
      <th style="border: 1px solid #ddd; padding: 10px;">Sample 0 - 10000 points</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="border: 1px solid #ddd; padding: 10px;">Input RGB</td>
      <td style="border: 1px solid #ddd; padding: 10px;">
        <img src="vis/point/0_gt_img.png" alt="Sample 0 RGB" style="display: block; margin: auto; width: 80%;">
      </td>
      <td style="border: 1px solid #ddd; padding: 10px;">
        <img src="vis/5000_points/0_gt_img.png" alt="Sample 500 RGB" style="display: block; margin: auto; width: 80%;">
      </td>
      <td style="border: 1px solid #ddd; padding: 10px;">
        <img src="vis/10000_points/0_gt_img.png" alt="Sample 600 RGB" style="display: block; margin: auto; width: 80%;">
      </td>
    </tr>
    <tr>
      <td style="border: 1px solid #ddd; padding: 10px;">Ground Truth Mesh</td>
      <td style="border: 1px solid #ddd; padding: 10px;">
        <img src="vis/point/0_gt_mesh.gif" alt="Sample 0 Ground Truth Mesh" style="display: block; margin: auto; width: 80%;">
      </td>
      <td style="border: 1px solid #ddd; padding: 10px;">
        <img src="vis/5000_points/0_gt_mesh.gif" alt="Sample 500 Ground Truth Mesh" style="display: block; margin: auto; width: 80%;">
      </td>
      <td style="border: 1px solid #ddd; padding: 10px;">
        <img src="vis/10000_points/0_gt_mesh.gif" alt="Sample 600 Ground Truth Mesh" style="display: block; margin: auto; width: 80%;">
      </td>
    </tr>
    <tr>
      <td style="border: 1px solid #ddd; padding: 10px;">Point Cloud Prediction</td>
      <td style="border: 1px solid #ddd; padding: 10px;">
        <img src="vis/point/0_point.gif" alt="Sample 600 Point Cloud" style="display: block; margin: auto; width: 80%;">
      </td>
      <td style="border: 1px solid #ddd; padding: 10px;">
        <img src="vis/5000_points/0_point.gif" alt="Sample 500 Point Cloud" style="display: block; margin: auto; width: 80%;">
      </td>
      <td style="border: 1px solid #ddd; padding: 10px;">
        <img src="vis/10000_points/0_point.gif" alt="Sample 600 Point Cloud" style="display: block; margin: auto; width: 80%;">
      </td>
    </tr>

    
   
  </tbody>
</table>


As seen from the results, the higher the number of points the more accurate the network is able to reconstruct the 3D shape. The reconsturctions become more clear and tend to match teh ground truth a lot more closely than the first case with 1000 points. With higher number of points, the network is also able to capture small nuances and complexities of the 3D forms. Also, from the F1 scores shown below, to no surprise, we see that it increases with increasing the number of points. The only drawback/ limitation of increasing the number of points is the fact that it requires more compute and time to run.



<table style="width: 100%; border-collapse: collapse; text-align: center;">
  <thead>
    <tr>
      <th style="border: 1px solid #ddd; padding: 10px;">Type</th>
      <th style="border: 1px solid #ddd; padding: 10px;">Sample 0 - 1000 points</th>
      <th style="border: 1px solid #ddd; padding: 10px;">Sample 0 - 5000 points</th>
      <th style="border: 1px solid #ddd; padding: 10px;">Sample 0 - 10000 points</th>
    </tr>
  </thead>
  <tbody>
    <tr>
          <td style="border: 1px solid #ddd; padding: 10px;">F1 Score- Curves</td>
          <td style="border: 1px solid #ddd; padding: 10px;">
            <img src="eval_point_1000.png" alt="Sample 600 Point Cloud" style="display: block; margin: auto; width: 80%;">
          </td>
          <td style="border: 1px solid #ddd; padding: 10px;">
            <img src="eval_point_5000.png" alt="Sample 500 Point Cloud" style="display: block; margin: auto; width: 80%;">
          </td>
          <td style="border: 1px solid #ddd; padding: 10px;">
            <img src="eval_model.png" alt="Sample 600 Point Cloud" style="display: block; margin: auto; width: 80%;">
          </td>
    </tr>
  </tbody>
</table>
