# 16-825 Learning for 3D Vision
# Assignment 3 : Neural Volume Rendering and Surface Rendering

## A. Neural Volume Rendering
### 0. Transmittance Calculation

![SegmentLocal](images/a3_transmittance.jpg "segment")


## 1. Differentiable Volume Rendering
### 1.3. Ray sampling

<div style="display: flex; justify-content: space-around; align-items: flex-start; margin: 20px 0;">
  <figure style="text-align: center; margin: 0;">
    <img src="images/1.3_grid.png" alt="Grid" style="height: 300px;">
    <figcaption>Grid</figcaption>
  </figure>
  <figure style="text-align: center; margin: 0;">
    <img src="images/1.3_rays.png" alt="Rays" style="height: 300px;">
    <figcaption>Ray</figcaption>
  </figure>
</div>


### 1.4. Point sampling

`Point Sampling Visualization`

![SegmentLocal](images/1.4_sample_pts.png "segment")


### 1.5. Volume rendering

<div style="display: flex; justify-content: space-around; align-items: flex-start; margin: 20px 0;">
  <figure style="text-align: center; margin: 0;">
    <img src="images/part_1.gif" alt="Box" style="height: 300px;">
    <figcaption>Rendered Box</figcaption>
  </figure>
  <figure style="text-align: center; margin: 0;">
    <img src="images/1.5_depth.png" alt="Depth" style="height: 300px;">
    <figcaption>Depth</figcaption>
  </figure>
</div>




## 2. Optimizing a basic implicit volume
### 2.1. Random ray sampling & 2.2. Loss and training

Predicted box dimensions:

`Box center:` (0.25, 0.25, 0.00)

`Box side lengths:` (2.01, 1.50, 1.50)

### 2.3. Visualization

`Rendered Box Visualization`

![SegmentLocal](images/part_2.gif "segment")

## 3. Optimizing a Neural Radiance Field (NeRF)

The structure of my network follows the one similar to the one mentioned in the NeRF paper. I used the `MLPWithInputSkips` function to create my network. 

Below is the visualization on the nerf_lego (lowres) dataset:

![SegmentLocal](images/part_3_loweres.gif "segment")

The NeRF model was trained for 250 epochs on 128x128 images.




## 4. NeRF Extras

View dependence was implemented using a few additional layers to the network created for Section 3. 

Below is the visualization on the nerf_materials (lowres) dataset:

![SegmentLocal](images/part_4_materials_lowres_240.gif "segment")



<table style="width: 100%; border-collapse: collapse; text-align: center;">
  <thead>
    <tr>
      <th style="border: 1px solid #ddd; padding: 10px;">Epoch</th>
      <th style="border: 1px solid #ddd; padding: 10px;">Epoch 20</th>
      <th style="border: 1px solid #ddd; padding: 10px;">Epoch 80</th>
      <th style="border: 1px solid #ddd; padding: 10px;">Epoch 160</th>
      <th style="border: 1px solid #ddd; padding: 10px;">Epoch 240</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="border: 1px solid #ddd; padding: 10px;">Rendered GIFs</td>
      <td style="border: 1px solid #ddd; padding: 10px;">
        <img src="images/part_4_materials_lowres_20.gif" alt="Epoch 20" style="display: block; margin: auto; width: 80%;">
      </td>
      <td style="border: 1px solid #ddd; padding: 10px;">
        <img src="images/part_4_materials_lowres_80.gif" alt="Epoch 80" style="display: block; margin: auto; width: 80%;">
      </td>
      <td style="border: 1px solid #ddd; padding: 10px;">
        <img src="images/part_4_materials_lowres_160.gif" alt="Epoch 160" style="display: block; margin: auto; width: 80%;">
      </td>
      <td style="border: 1px solid #ddd; padding: 10px;">
        <img src="images/part_4_materials_lowres_240.gif" alt="Epoch 240" style="display: block; margin: auto; width: 80%;">
      </td>
    </tr>
   
  </tbody>
</table>


Trade-offs:
By incorporating view-dependance, we can actually render more realistic images with reflections and shadows. However, sometimes the model can overfit and learn view-specific characteristics that reduce the generalisability to unseen views. Generalization of the model can also be limited if training data has low variety of viewing directions. 

# B. Neural Surface Rendering
## 5. Sphere Tracing
In my implementation of the `sphere_tracing` function, I iteratively compute ray-surface intersections. Starting with the ray origins, I update each point along the ray direction by adding the SDF value. At each step, I check if the SDF value is below a small threshold ($ \epsilon $), indicating a surface hit. If all rays converge or the maximum iterations are reached, the loop terminates. The function outputs the intersection points (`points`) and a boolean mask (`mask`) identifying which rays successfully hit the surface. This ensures efficient and accurate surface rendering.

Below is the visualizatin of the torus:

![SegmentLocal](images/part_5.gif "segment")



## 6. Optimizing a Neural SDF

The network structure was insprired by the NeRF networks implemented in Section 3. The changes being in the the ReLU layer omitted here as the output of the network is distance. The network parameters are dynamically initialized based on the configuration (.yaml) file. 

Eikonal loss is computed as the mean squared error between the L2 norm of input gradients and 1.0. This enforces the SDF to have a gradient value of 1.0 everywhere. 

Below is the visualization:

<div style="display: flex; justify-content: space-around; align-items: flex-start; margin: 20px 0;">
  <figure style="text-align: center; margin: 0;">
    <img src="images/part_6_input.gif" alt="Input" style="height: 300px;">
    <figcaption>Input</figcaption>
  </figure>
  <figure style="text-align: center; margin: 0;">
    <img src="images/part_6.gif" alt="SDF" style="height: 300px;">
    <figcaption>Rendered SDF</figcaption>
  </figure>
</div>

## 7. VolSDF
$\alpha$ and $\beta$ are parameters that play a crucial role in conversion of SDF to volume density during rendering. 

$\alpha$: This parameter controls how opaque the rendered object is. Larger $\alpha$ values result in the surface of the object to be more opaque.

$\beta$: This paramter controls how smooth the transition from low to high density regions are. Lower $\beta$ values result in sharp changes and higher $\beta$ values result in a more blurry and gradual transition.

My explanation is based on what I've observed by experimentating with different values for $\alpha$ and $\beta$.

Questions:
1. How does high $\beta$ bias your learned SDF? What about low $\beta$?

    A high $\beta$ results in smoother transitions between dense and non-dense regions. This results in the learned SDF to have a blurred surface and gradual desnity changes that spans over a large region. 
    On the other hand, a low $\beta$ results in sharper transitions between dense and non-dense regions. This results in a more precise learned SDF representation. However, upon experimentation, reducing $\beta$ too much results in a loss value of 'NaN' causing the training process to become unstable and potentially unable to render the SDF.

2. Would an SDF be easier to train with volume rendering and low $\beta$ or high $\beta$? Why?

    An SDF would be easier to train with high $\beta$ during volume rendering as high $\beta$ smooths out the transition between dense and non-dense regions making gradients smoother and training more stable, especially in early stages when the network is still learning coarse geometry.

3. Would you be more likely to learn an accurate surface with high $\beta$ or low $\beta$? Why?

    You would be more likely to learn an accurate surface with low $\beta$ as it enforces sharper transitions in density, which aligns closely with the true surface.

    
The color layers added have a similar structure to that created in Section 3. I also added a skip connection at layer 3 (volsdf_surface.yaml) to improve the geometric quality of the output. For Trial III, the loss value resulted in nan values, which I suspect is because $\beta$ was too low. 


<table style="width: 100%; border-collapse: collapse; text-align: center;">
  <thead>
    <tr>
      <th style="border: 1px solid #ddd; padding: 10px;">Trial</th>
      <th style="border: 1px solid #ddd; padding: 10px;">Trial I</th>
      <th style="border: 1px solid #ddd; padding: 10px;">Trial II</th>
      <th style="border: 1px solid #ddd; padding: 10px;">Trial III</th>
      
   
  </thead>
  <tbody>
    <tr>
      <td style="border: 1px solid #ddd; padding: 10px;">Rendered GIFs</td>
      <td style="border: 1px solid #ddd; padding: 10px;">
        <img src="images/part_7_10.0alpha_0.05beta.gif" alt="deafault" style="display: block; margin: auto; width: 80%;">
      </td>
      <td style="border: 1px solid #ddd; padding: 10px;">
        <img src="images/part_7_10.0alpha_0.1beta.gif" alt="Epoch 80" style="display: block; margin: auto; width: 80%;">
      </td>
      <td style="border: 1px solid #ddd; padding: 10px;">
        <img src="images/part_7_10.0alpha_0.01beta.gif" alt="Epoch 160" style="display: block; margin: auto; width: 80%;">
      </td>
    </tr>
     <tr>
      <td style="border: 1px solid #ddd; padding: 10px;">Geometry</td>
      <td style="border: 1px solid #ddd; padding: 10px;">
        <img src="images/part_7_geometry_10.0alpha_0.05beta.gif" alt="deafault" style="display: block; margin: auto; width: 80%;">
      </td>
      <td style="border: 1px solid #ddd; padding: 10px;">
        <img src="images/part_7_geometry_10.0alpha_0.1beta.gif" alt="Epoch 80" style="display: block; margin: auto; width: 80%;">
      </td>
      <td style="border: 1px solid #ddd; padding: 10px;">
        <img src="images/part_7_10.0alpha_0.01beta.gif" alt="Epoch 160" style="display: block; margin: auto; width: 80%;">
      </td>
    </tr>
    <tr>
      <td style="border: 1px solid #ddd; padding: 10px;">alpha</td>
      <td style="border: 1px solid #ddd; padding: 10px;">10.0</td>
      <td style="border: 1px solid #ddd; padding: 10px;">10.0</td>
      <td style="border: 1px solid #ddd; padding: 10px;">10.0</td>
    </tr>
    <tr>
      <td style="border: 1px solid #ddd; padding: 10px;">beta</td>
      <td style="border: 1px solid #ddd; padding: 10px;">0.05</td>
      <td style="border: 1px solid #ddd; padding: 10px;">0.1</td>
      <td style="border: 1px solid #ddd; padding: 10px;">0.01</td>
    </tr>
   
  </tbody>
</table>

`BEST RESULT`

<div style="display: flex; justify-content: space-around; align-items: flex-start; margin: 20px 0;">
  <figure style="text-align: center; margin: 0;">
    <img src="images/part_7_10.0alpha_0.05beta.gif" alt="Input" style="height: 300px;">
    <figcaption>SDF</figcaption>
  </figure>
  <figure style="text-align: center; margin: 0;">
    <img src="images/part_7_geometry_10.0alpha_0.05beta.gif" alt="SDF" style="height: 300px;">
    <figcaption>Geometry</figcaption>
  </figure>
</div>

## 8. Neural Surface Extras

### 8.1. Render a Large Scene with Sphere Tracing

I rendered a complex scene by adding a cube in the center of the torus along with 2 spheres above and below the center of the torus.

![SegmentLocal](images/part_8_complex_scene.gif "segment")

### 8.2 Fewer Training Views

I trained the network with fewer training views as shown in the table below:

<table style="width: 100%; border-collapse: collapse; text-align: center;">
  <thead>
    <tr>
      <th style="border: 1px solid #ddd; padding: 10px;">Number of Views</th>
      <th style="border: 1px solid #ddd; padding: 10px;">VolSDF</th>
      <th style="border: 1px solid #ddd; padding: 10px;">NeRF</th>

      
   
  </thead>
  <tbody>
    <tr>
      <td style="border: 1px solid #ddd; padding: 10px;">10</td>
      <td style="border: 1px solid #ddd; padding: 10px;">
        <img src="images/part_8_SDF_10view.gif" alt="SDF 10" style="display: block; margin: auto; width: 80%;">
      </td>
      <td style="border: 1px solid #ddd; padding: 10px;">
        <img src="images/part_8_NeRF_10view.gif" alt="Epoch 80" style="display: block; margin: auto; width: 80%;">
      </td>
    </tr>
     <tr>
      <td style="border: 1px solid #ddd; padding: 10px;">50</td>
      <td style="border: 1px solid #ddd; padding: 10px;">
        <img src="images/part_8_SDF_50view.gif" alt="deafault" style="display: block; margin: auto; width: 80%;">
      </td>
      <td style="border: 1px solid #ddd; padding: 10px;">
        <img src="images/part_8_NeRF_50view.gif" alt="Epoch 80" style="display: block; margin: auto; width: 80%;">
      </td>
    </tr>
    </tr>
     <tr>
      <td style="border: 1px solid #ddd; padding: 10px;">100</td>
      <td style="border: 1px solid #ddd; padding: 10px;">
        <img src="images/part_8_SDF_100view.gif" alt="deafault" style="display: block; margin: auto; width: 80%;">
      </td>
      <td style="border: 1px solid #ddd; padding: 10px;">
        <img src="images/part_3_loweres.gif" alt="Epoch 80" style="display: block; margin: auto; width: 80%;">
      </td>
    </tr>
   
  </tbody>
</table>



While comparing, NeRF produces sharper images than VolSDF with lower training views. 



### 8.3 Alternate SDF to Density Conversions 

I used the NeUS paper's implementation of conversion from SDF to density based on the below formula:

$$
\phi_s(f(x)) = \frac{s \cdot \exp\left(-s f(x)\right)}{\left(1 + \exp\left(-s f(x)\right)\right)^2}
$$
 
where:

$f(x)$ is the Signed Distance Function (SDF),

$s = \frac{1}{\beta}$ is the sharpness parameter controlling the steepness of the density transition near the surface.


<table style="width: 100%; border-collapse: collapse; text-align: center;">
  <thead>
    <tr>
      <th style="border: 1px solid #ddd; padding: 10px;">VolSDF</th>
      <th style="border: 1px solid #ddd; padding: 10px;">NeUS</th>
      <th style="border: 1px solid #ddd; padding: 10px;">NeUS - Geometry</th>

      
   
  </thead>
  <tbody>
    <tr>
      <td style="border: 1px solid #ddd; padding: 10px;">
        <img src="images/part_7_10.0alpha_0.05beta.gif" alt="SDF 10" style="display: block; margin: auto; width: 80%;">
      </td>
      <td style="border: 1px solid #ddd; padding: 10px;">
        <img src="images/part_8_NeUS_SDF.gif" alt="SDF 10" style="display: block; margin: auto; width: 80%;">
      </td>
      <td style="border: 1px solid #ddd; padding: 10px;">
        <img src="images/part_8_geometry_NeUS_SDF.gif" alt="Epoch 80" style="display: block; margin: auto; width: 80%;">
      </td>
   
    
   
  </tbody>
</table>