# Exercise 3: Shape Reconstruction

**Submission Deadline**: 13.06.2023, 23:55

We will take a look at two major approaches for 3D shape reconstruction in this last exercise.

Note that training reconstruction methods generally takes relatively long, even for simple shape completion. Training the generalization will take a few hours. **Thus, please make sure to start training well before the submission deadline.**

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
#!git status
!ls
# %cd E3

sample_data


In [None]:
!ls
!git status
# %cd drive/MyDrive/3DML/E2
# !git clone https://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@github.com/Streakfull/3DML.git
# %cd 3DML
# !git checkout ys-e3

!git add .
!git config --global user.email "youssef.ahmedyoussef98@gmail.com"
!git config --global user.name "Streakfull"
!git commit -m "finalizes shapenet model"
!git push origin ys-e3

exercise_3  exercise_3.ipynb  pyproject.toml  requirements.txt
On branch ys-e3
Your branch is up to date with 'origin/ys-e3'.

nothing to commit, working tree clean
On branch ys-e3
Your branch is up to date with 'origin/ys-e3'.

nothing to commit, working tree clean
Everything up-to-date


## 3.0. Running this notebook
We recommend running this notebook on a CUDA compatible local gpu. You can also run training on cpu, it will just take longer.

You have three options for running this exercise on a GPU, choose one of them and start the exercise below in section "Imports":
1. Locally on your own GPU
2. On our dedicated compute cluster
3. On Google Colab

We describe every option in more detail below:

---

### (a) Local Execution

If you run this notebook locally, you have to first install the python dependiencies again. They are the same as for exercise 1 so you can re-use the environment you used last time. If you use [poetry](https://python-poetry.org), you can also simply re-install everything (`poetry install`) and then run this notebook via `poetry run jupyter notebook`.

In case you are working with a RTX 3000-series GPU, you need to install a patched version of pytorch:

In [None]:
%pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/cu113

### (b) Compute Cluster

We provide access to a small compute cluster for the exercises and projects, consisting of a login node and 4 compute nodes with one dedicated RTX 3090 GPU each.
Please send us a short email with your name and preferred username so we can add you as a user.

We uploaded a PDF to Moodle with detailed information on how to access and use the cluster.

Since the cluster contains RTX 3000-series GPUs, you will need to install a patched version of pytorch:

In [None]:
%pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/cu113

### (c) Google Colab

If you don't have access to a GPU and don't want to use our cluster, you can also use Google Colab. However, we experienced the issue that inline visualization of shapes or inline images didn't work on colab, so just keep that in mind.
What you can also do is only train networks on colab, download the checkpoint, and visualize inference locally.

In case you're using Google Colab, you can upload the exercise folder (containing `exercise_2.ipynb`, directory `exercise_2` and the file `requirements.txt`) as `3d-machine-learning` to google drive (make sure you don't upload extracted datasets files).
Additionally you'd need to open the notebook `exercise_2.ipynb` in Colab using `File > Open Notebook > Upload`.

Next you'll need to run these two cells for setting up the environment. Before you do that make sure your instance has a GPU.

In [None]:
import os
#from google.colab import drive
#drive.mount('/content/drive', force_remount=True)

# We assume you uploaded the exercise folder in root Google Drive folder

#!cp -r /content/drive/MyDrive/3d-machine-learning 3d-machine-learning/
#os.chdir('/content/3d-machine-learning/')
#os.chdir('/content/drive/MyDrive/3DML/E3')
os.chdir('/content/3DML/E3')
print('Installing requirements')
%pip install -r requirements.txt

# Make sure you restart runtime when directed by Colab

Installing requirements
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting jupyter>=1.0.0 (from -r requirements.txt (line 1))
  Downloading jupyter-1.0.0-py2.py3-none-any.whl (2.7 kB)
Collecting K3D>=2.9.4 (from -r requirements.txt (line 2))
  Downloading k3d-2.15.3-py3-none-any.whl (23.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m23.0/23.0 MB[0m [31m50.1 MB/s[0m eta [36m0:00:00[0m
Collecting trimesh>=3.9.14 (from -r requirements.txt (line 4))
  Downloading trimesh-3.22.0-py3-none-any.whl (681 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m681.5/681.5 kB[0m [31m54.4 MB/s[0m eta [36m0:00:00[0m
Collecting pytorch-lightning>=1.2.8 (from -r requirements.txt (line 6))
  Downloading pytorch_lightning-2.0.3-py3-none-any.whl (720 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m720.6/720.6 kB[0m [31m53.5 MB/s[0m eta [36m0:00:00[0m
Collecting pyrender>=0.1.43 (f

Run this cell after restarting your colab runtime

In [None]:
import os
import sys
import torch
# os.chdir('/content/3d-machine-learning/')
# sys.path.insert(1, "/content/3d-machine-learning/")
os.chdir('/content/3DML/E3')
sys.path.insert(1, "/content/3DML/E3")
print('CUDA availability:', torch.cuda.is_available())

CUDA availability: True


---

### Imports

The following imports should work regardless of whether you are using Colab or local execution.

### Imports

The following imports should work regardless of whether you are using Colab or local execution.

In [None]:
%load_ext autoreload
%autoreload 2
from pathlib import Path
import numpy as np
import matplotlib as plt
import k3d
import trimesh
import torch
import skimage
import tqdm

Use the next cell to test whether a GPU was detected by pytorch.

In [None]:
torch.cuda.is_available()

True

## 3.1 Shape Reconstruction from 3D SDF grids with 3D-EPN

In the first part of this exercise, we will take a look at shape complation using [3D-EPN](https://arxiv.org/abs/1612.00101). This approach was also introduced in the lecture.

The visualization below shows an overview of the method: From an incomplete shape observation (which you would get when scanning an object with a depth sensor for example), we use a 3D encoder-predictor network that first encodes the incomplete shapes into a common latent space using several 3D convolution layers and then decodes them again using multiple 3D transpose convolutions.

This way, we get from a 32^3 SDF voxel grid to a 32^3 DF (unsigned) voxel grid that represents the completed shape. We only focus on this part here; in the original implementation, this 32^3 completed prediction would then be further improved (in an offline step after inference) by sampling parts from a shape database to get the final resolution to 128^3.

<img src="exercise_3/images/3depn_teaser.png" alt="3D-EPN Teaser" style="width: 800px;"/>

The next steps will follow the structure we established in exercise 2: Taking a look at the dataset structure and downloading the data; then, implementing dataset, model, and training loop.

### (a) Downloading the data
We will use the original dataset used in the official implementation. It consists of SDF and DF grids (representing incomplete input data and complete target data) with a resolution of 32^3 each. Each input-target pair is generated from a ShapeNet shape.

The incomplete SDF data are generated by sampling virtual camera trajectories around every object. Each trajectory is assigned an ID which is part of the file names (see below). The camera views for each trajectory are combined into a common SDF grid by volumetric fusion. It is easy to generate an SDF here since we know both camera location and object surface: Everything between camera and surface is known free space and outside the object, leading to a positive SDF sign. Everything behind the surface has a negative sign. For the complete shapes, however, deciding whether a voxel in the DF grid is inside or outside an object is not a trivial problem. This is why we use unsigned distance fields as target and prediction representation instead. This still encodes the distance to the closest surface but does not contain explicit information about the inside/outside location.

In terms of dataset layout, we follow the ShapeNet directory structure as seen in the last exercise:
Each folder in the `exercise_3/data/shapenet_dim32_sdf` and `exercise_3/data/shapenet_dim32_df` directories contains one shape category represented by a number, e.g. `02691156`.
We provide the mapping between these numbers and the corresponding names in `exercise_3/data/shape_info.json`. Each of these shape category folders contains lots of shapes in sdf or df format. In addition to that, every shape now also contains multiple trajectories: 0 to 7, encoded as `__0__` to `__7__`. These 8 files are just different input representations, meaning they vary in the level of completeness and location of missing parts; they all map to the `.df` file with corresponding shape ID and `__0__` at the end.

```
# contents of exercise_2/data/shapenet_dim32_sdf
02691156/                                           # Shape category folder with all its shapes
    ├── 10155655850468db78d106ce0a280f87__0__.sdf   # Trajectory 0 for a shape of the category
    ├── 10155655850468db78d106ce0a280f87__1__.sdf   # Trajectory 1 for the same shape
    ├── :                                      
    ├── 10155655850468db78d106ce0a280f87__7__.sdf   # Trajectory 7 for the same shape
    ├── 10155655850468db78d106ce0a280f87__0__.sdf   # Trajectory 0 for another shape
    ├── :                                           # And so on ...
02933112/                                           # Another shape category folder
02958343/                                           # In total you should have 8 shape category folders
:

# contents of exercise_2/data/shapenet_dim32_df
02691156/                                           # Shape category folder with all its shapes
    ├── 10155655850468db78d106ce0a280f87__0__.df    # A single shape of the category
    ├── 1021a0914a7207aff927ed529ad90a11__0__.df    # Another shape of the category
    ├── :                                           # And so on ...
02933112/                                           # Another shape category folder
02958343/                                           # In total you should have 55 shape category folders
:
```

Download and extract the data with the code cell below.

**Note**: If you are training on Google Colab and are running out of disk space, you can do the following:
- Only download the zip files below without extracting them (comment out all lines after `print('Extracting ...')`)
- Change `from exercise_3.data.shapenet import ShapeNet` to `from exercise_3.data.shapenet_zip import ShapeNet`
- Implement your dataset in `shapenet_zip.py`. This implementation extracts the data on-the-fly without taking up any additional disk space. Your training will therefore run a bit slower.
- Make sure you uncomment the lines setting the worker_init_fn in `train_3depn.py` (marked with TODOs)

In [None]:
print('Downloading ...')
# File sizes: 11GB for shapenet_dim32_sdf.zip (incomplete scans), 4GB for shapenet_dim32_df.zip (target shapes)
!wget http://kaldir.vc.in.tum.de/adai/CNNComplete/shapenet_dim32_sdf.zip -P exercise_3/data
!wget http://kaldir.vc.in.tum.de/adai/CNNComplete/shapenet_dim32_df.zip -P exercise_3/data
print('Extracting ...')
!unzip -q exercise_3/data/shapenet_dim32_sdf.zip -d exercise_3/data
!unzip -q exercise_3/data/shapenet_dim32_df.zip -d exercise_3/data
!rm exercise_3/data/shapenet_dim32_sdf.zip
!rm exercise_3/data/shapenet_dim32_df.zip
print('Done.')

Downloading ...
--2023-06-08 09:45:29--  http://kaldir.vc.in.tum.de/adai/CNNComplete/shapenet_dim32_sdf.zip
Resolving kaldir.vc.in.tum.de (kaldir.vc.in.tum.de)... 131.159.98.128
Connecting to kaldir.vc.in.tum.de (kaldir.vc.in.tum.de)|131.159.98.128|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://kaldir.vc.in.tum.de:443/adai/CNNComplete/shapenet_dim32_sdf.zip [following]
--2023-06-08 09:45:30--  https://kaldir.vc.in.tum.de/adai/CNNComplete/shapenet_dim32_sdf.zip
Connecting to kaldir.vc.in.tum.de (kaldir.vc.in.tum.de)|131.159.98.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 11741813909 (11G) [application/zip]
Saving to: ‘exercise_3/data/shapenet_dim32_sdf.zip’


2023-06-08 09:52:26 (26.9 MB/s) - ‘exercise_3/data/shapenet_dim32_sdf.zip’ saved [11741813909/11741813909]

--2023-06-08 09:52:27--  http://kaldir.vc.in.tum.de/adai/CNNComplete/shapenet_dim32_df.zip
Resolving kaldir.vc.in.tum.de (kaldir.vc.in.tum.de

### (b) Dataset

The dataset implementation follows the same general structure as in exercise 2. We prepared an initial implementation already in `exercise_3/data/shapenet.py`; your task is to resolve all TODOs there.

The data for SDFs and DFs in `.sdf`/`.df` files are stored in binary form as follows:
```
dimX    #uint64 
dimY    #uint64 
dimZ    #uint64 
data    #(dimX*dimY*dimZ) floats for sdf/df values
```
The SDF values stored per-voxel represent the distance to the closest surface *in voxels*.

You have to take care of three important steps before returning the SDF and DF for the corresponding `index` in `__getitem__`:
1. **Truncation**: 3D-EPN uses a truncated SDF which means that for each voxel, the distance to the closest surface will be clamped to a max absolute value. This is helpful since we do not care about longer distances (Marching Cubes only cares about distances close to the surface). It allows us to focus our predictions on the voxels near the surface. We use a `truncation_distance` of 3 (voxels) which means we expect to get an SDF with values between -3 and 3 as input to the model.
2. **Separation** of distances and sign: 3D-EPN uses as input a 2x32x32x32 SDF grid, with absolute distance values of the SDF in channel 0 and the signs (-1 or 1) in channel 1.
3. **Log** scaling: We scale targets and prediction with a log operation to further guide predictions to focus on the surface voxels. Therefore, you should return target DFs as `log(df + 1)`.

**Hint**: An easy way to load the data from `.sdf` and `.df` files is to use `np.fromfile`. First, load the dimensions, then the data, then reshape everything into the shape you loaded in the beginning. Make sure you get the datatypes and byte offsets right! If you are using the zip version of the dataset as explained above, you should use `np.frombuffer` instead of `np.fromfile` to load from the `data`-buffer. The syntax is identical.

In [None]:
from exercise_3.data.shapenet import ShapeNet

# Create a dataset with train split
train_dataset = ShapeNet('train')
val_dataset = ShapeNet('val')
overfit_dataset = ShapeNet('overfit')

# Get length, which is a call to __len__ function
print(f'Length of train set: {len(train_dataset)}')  # expected output: 153540
# Get length, which is a call to __len__ function
print(f'Length of val set: {len(val_dataset)}')  # expected output: 32304
# Get length, which is a call to __len__ function
print(f'Length of overfit set: {len(overfit_dataset)}')  # expected output: 64

Length of train set: 153540
Length of val set: 32304
Length of overfit set: 64


In [None]:
# Visualize some shapes
from exercise_3.util.visualization import visualize_mesh
from skimage.measure import marching_cubes

train_sample = train_dataset[1]
print(f'Name: {train_sample["name"]}')  # expected output: 03001627/798a46965d9e0edfcea003eff0268278__3__-03001627/798a46965d9e0edfcea003eff0268278__0__
print(f'Input SDF: {train_sample["input_sdf"].shape}')  # expected output: (2, 32, 32, 32)
print(f'Target DF: {train_sample["target_df"].shape}')  # expected output: (32, 32, 32)
input_mesh = marching_cubes(train_sample['input_sdf'][0], level=1)
visualize_mesh(input_mesh[0], input_mesh[1], flip_axes=True)

Name: 03001627/798a46965d9e0edfcea003eff0268278__3__-03001627/798a46965d9e0edfcea003eff0268278__0__
Input SDF: (2, 32, 32, 32)
Target DF: (32, 32, 32)


Output()

In [None]:
!ls
from google.colab import files
files.download("exercise_3/data/shapenet_dim32_sdf/03636649/3889631e42a84b0f51f77a6d7299806__2__.sdf")
files.download("exercise_3/data/shapenet_dim32_df/03636649/3889631e42a84b0f51f77a6d7299806__0__.df")

exercise_3  exercise_3.ipynb  pyproject.toml  requirements.txt


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [None]:
train_sample = train_dataset[223]
print(f'Name: {train_sample["name"]}')  # expected output: 04379243/a1be21c9a71d133dc5beea20858a99d5__5__-04379243/a1be21c9a71d133dc5beea20858a99d5__0__
print(f'Input SDF: {train_sample["input_sdf"].shape}')  # expected output: (2, 32, 32, 32)
print(f'Target DF: {train_sample["target_df"].shape}')  # expected output: (32, 32, 32)

input_mesh = marching_cubes(train_sample['input_sdf'][0], level=1)
visualize_mesh(input_mesh[0], input_mesh[1], flip_axes=True)

Name: 04379243/a1be21c9a71d133dc5beea20858a99d5__5__-04379243/a1be21c9a71d133dc5beea20858a99d5__0__
Input SDF: (2, 32, 32, 32)
Target DF: (32, 32, 32)


Output()

In [None]:
train_sample = train_dataset[95]
print(f'Name: {train_sample["name"]}')  # expected output: 03636649/3889631e42a84b0f51f77a6d7299806__2__-03636649/3889631e42a84b0f51f77a6d7299806__0__
print(f'Input SDF: {train_sample["input_sdf"].shape}')  # expected output: (2, 32, 32, 32)
print(f'Target DF: {train_sample["target_df"].shape}')  # expected output: (32, 32, 32)

input_mesh = marching_cubes(train_sample['input_sdf'][0], level=1)
visualize_mesh(input_mesh[0], input_mesh[1], flip_axes=True)

Name: 03636649/3889631e42a84b0f51f77a6d7299806__2__-03636649/3889631e42a84b0f51f77a6d7299806__0__
Input SDF: (2, 32, 32, 32)
Target DF: (32, 32, 32)


Output()

### (c) Model

The model architecture of 3D-EPN is visualized below:

<img src="exercise_3/images/3depn.png" alt="3D-EPN Architecture" style="width: 800px;"/>

For this exercise, we simplify the model by omitting the classification part - this will not have a big impact since most of the shape completion performance comes from the 3D encoder-decoder unet.

The model consists of three parts: The encoder, the bottleneck, and the decoder. Encoder and decoder are constructed with the same architecture, just mirrored.

The details of each part are:
- **Encoder**: 4 layers, each one containing a 3D convolution (with kernel size 4, as seen in the visualization), a 3D batch norm (except the very first layer), and a leaky ReLU with a negative slope of 0.2. Our goal is to reduce the spatial dimension from 32x32x32 to 1x1x1 and to get the feature dimension from 2 (absolute values and sign) to `num_features * 8`. We do this by using a stride of 2 and padding of 1 for all convolutions except for the last one where we use a stride of 1 and no padding. The feature channels are increased from 2 to `num_features` in the first layer and then doubled with every subsequent layer.
- **Decoder**: Same architecture as encoder, just mirrored: Going from `num_features * 8 * 2` (the 2 will be explained later) to 1 (the DF values). The spatial dimensions go from 1x1x1 to 32x32x32. Each layer use a 3D Transpose convolution now, together with 3D batch norm and ReLU (no leaky ReLUs anymore). Note that the last layer uses neither Batch Norms nor a ReLU since we do not want to constrain the range of possible values for the prediction.
- **Bottleneck**: This is realized with 2 fully connected layers, each one going from a vector of size 640 (which is `num_features * 8`) to a vector of size 640. Each such layer is followed by a ReLU activation.

Some minor details:
- **Skip connections** allow the decoder to use information from the encoder and also improve gradient flow. We use it here to connect the output of encoder layer 1 to decoder layer 4, the output of encoder layer 2 to decoder layer 3, and so on. This means that the input to a decoder layer is the concatenation of the previous decoder output with the corresponding encoder output, along the feature dimension. Hence, the number of input features for each decoder layer are twice those of the encoder layers, as mentioned above.
- **Log scaling**: You also need to scale the final outputs of the network logarithmically: `out = log(abs(out) + 1)`. This is the same transformation you applied to the target shapes in the dataloader before and ensures that prediction and target volumes are comparable.

With this in mind, implement the network architecture and `forward()` function in `exercise_3/model/threedepn.py`. You can check your architecture with the cell below.

In [None]:
from exercise_3.model.threedepn import ThreeDEPN
from exercise_3.util.model import summarize_model

threedepn = ThreeDEPN()
print(summarize_model(threedepn))  # Expected: Rows 0-34 and TOTAL = 52455681

sdf = torch.randn(4, 1, 32, 32, 32) * 2. - 1.

input_tensor = torch.cat([torch.abs(sdf), torch.sign(sdf)], dim=1)
predictions = threedepn(input_tensor)

print('Output tensor shape: ', predictions.shape)  # Expected: torch.Size([4, 32, 32, 32])

   | Name         | Type            | Params  
----------------------------------------------------
0  | encoder1     | Sequential      | 10320   
1  | encoder1.0   | Conv3d          | 10320   
2  | encoder1.1   | LeakyReLU       | 0       
3  | encoder2     | Sequential      | 819680  
4  | encoder2.0   | Conv3d          | 819360  
5  | encoder2.1   | BatchNorm3d     | 320     
6  | encoder2.2   | LeakyReLU       | 0       
7  | encoder3     | Sequential      | 3277760 
8  | encoder3.0   | Conv3d          | 3277120 
9  | encoder3.1   | BatchNorm3d     | 640     
10 | encoder3.2   | LeakyReLU       | 0       
11 | encoder4     | Sequential      | 13109120
12 | encoder4.0   | Conv3d          | 13107840
13 | encoder4.1   | BatchNorm3d     | 1280    
14 | encoder4.2   | LeakyReLU       | 0       
15 | bottleneck   | Sequential      | 820480  
16 | bottleneck.0 | Linear          | 410240  
17 | bottleneck.1 | ReLU            | 0       
18 | bottleneck.2 | Linear          | 410240  
19 | bo

### (d) Training script and overfitting to a single shape reconstruction

You can now go to the train script in `exercise_3/training/train_3depn.py` and fill in the missing pieces as you did for exercise 2. Then, verify that your training work by overfitting to a few samples below.

In [None]:
from exercise_3.training import train_3depn
config = {
    'experiment_name': '3_1_3depn_overfitting',
    'device': 'cuda:0',  # change this to cpu if you do not have a GPU
    'is_overfit': True,
    'batch_size': 32,
    'resume_ckpt': None,
    'learning_rate': 0.001,
    'max_epochs': 250,
    'print_every_n': 10,
    'validate_every_n': 25,
}
train_3depn.main(config)  # should be able to get <0.0025 train_loss and <0.13 val_loss

### (e) Training over the entire training set
If the overfitting works, we can go ahead with training on the entire dataset.

**Note**: As is the case with most reconstruction networks and considering the size of the model (> 50M parameters), this training will take a few hours on a GPU. *Please make sure to start training early enough before the submission deadline.*

In [None]:
config = {
    'experiment_name': '3_1_3depn_generalization',
    'device': 'cuda:0',  # change this to cpu if you do not have a GPU
    'is_overfit': False,
    'batch_size': 64,
    'resume_ckpt': None,
    'learning_rate': 0.001,
    'max_epochs': 5,
    'print_every_n': 50,
    'validate_every_n': 1000,
}
train_3depn.main(config)  # should be able to get best_loss_val < 0.1 after a few hours and 5 epochs

Using device: cuda:0


  0%|          | 0/5 [00:00<?, ?it/s]

[000/00049] train_loss: 0.002619
[000/00099] train_loss: 0.001339
[000/00149] train_loss: 0.001171
[000/00199] train_loss: 0.001094
[000/00249] train_loss: 0.001018
[000/00299] train_loss: 0.000992
[000/00349] train_loss: 0.000907
[000/00399] train_loss: 0.000919
[000/00449] train_loss: 0.000918
[000/00499] train_loss: 0.000895
[000/00549] train_loss: 0.000870
[000/00599] train_loss: 0.000840
[000/00649] train_loss: 0.000810
[000/00699] train_loss: 0.000752
[000/00749] train_loss: 0.000745
[000/00799] train_loss: 0.000785
[000/00849] train_loss: 0.000763
[000/00899] train_loss: 0.000731
[000/00949] train_loss: 0.000707
[000/00999] train_loss: 0.000734
[000/00999] val_loss: 0.145077 | best_loss_val: 0.145077
[000/01049] train_loss: 0.001787
[000/01099] train_loss: 0.000983
[000/01149] train_loss: 0.000848
[000/01199] train_loss: 0.000775
[000/01249] train_loss: 0.000770
[000/01299] train_loss: 0.000725
[000/01349] train_loss: 0.000716
[000/01399] train_loss: 0.000664
[000/01449] train_l

 20%|██        | 1/5 [20:13<1:20:55, 1213.87s/it]

[000/02399] train_loss: 0.000820
[001/00049] train_loss: 0.002859
[001/00099] train_loss: 0.001193
[001/00149] train_loss: 0.001059
[001/00199] train_loss: 0.000945
[001/00249] train_loss: 0.000929
[001/00299] train_loss: 0.000855
[001/00349] train_loss: 0.000790
[001/00399] train_loss: 0.000771
[001/00449] train_loss: 0.000725
[001/00499] train_loss: 0.000707
[001/00549] train_loss: 0.000712
[001/00599] train_loss: 0.000681
[001/00599] val_loss: 0.138179 | best_loss_val: 0.123951
[001/00649] train_loss: 0.000669
[001/00699] train_loss: 0.000675
[001/00749] train_loss: 0.000664
[001/00799] train_loss: 0.000641
[001/00849] train_loss: 0.000688
[001/00899] train_loss: 0.000696
[001/00949] train_loss: 0.000649
[001/00999] train_loss: 0.000600
[001/01049] train_loss: 0.000621
[001/01099] train_loss: 0.000612
[001/01149] train_loss: 0.000613
[001/01199] train_loss: 0.000621
[001/01249] train_loss: 0.000590
[001/01299] train_loss: 0.000584
[001/01349] train_loss: 0.000598
[001/01399] train_l

 40%|████      | 2/5 [40:12<1:00:15, 1205.16s/it]

[002/00049] train_loss: 0.001476
[002/00099] train_loss: 0.000735
[002/00149] train_loss: 0.000668
[002/00199] train_loss: 0.000683
[002/00199] val_loss: 0.115288 | best_loss_val: 0.111944
[002/00249] train_loss: 0.000594
[002/00299] train_loss: 0.000621
[002/00349] train_loss: 0.000617
[002/00399] train_loss: 0.000576
[002/00449] train_loss: 0.000531
[002/00499] train_loss: 0.000533
[002/00549] train_loss: 0.000524
[002/00599] train_loss: 0.000552
[002/00649] train_loss: 0.000529
[002/00699] train_loss: 0.000517
[002/00749] train_loss: 0.000538
[002/00799] train_loss: 0.000526
[002/00849] train_loss: 0.000533
[002/00899] train_loss: 0.000507
[002/00949] train_loss: 0.000524
[002/00999] train_loss: 0.000517
[002/01049] train_loss: 0.000542
[002/01099] train_loss: 0.000530
[002/01149] train_loss: 0.000520
[002/01199] train_loss: 0.000518
[002/01199] val_loss: 0.093435 | best_loss_val: 0.093435
[002/01249] train_loss: 0.000509
[002/01299] train_loss: 0.000513
[002/01349] train_loss: 0.00

 60%|██████    | 3/5 [1:01:54<41:38, 1249.05s/it]

[002/02399] train_loss: 0.000605
[003/00049] train_loss: 0.001645
[003/00099] train_loss: 0.000859
[003/00149] train_loss: 0.000597
[003/00199] train_loss: 0.000565
[003/00249] train_loss: 0.000539
[003/00299] train_loss: 0.000539
[003/00349] train_loss: 0.000521
[003/00399] train_loss: 0.000513
[003/00449] train_loss: 0.000500
[003/00499] train_loss: 0.000509
[003/00549] train_loss: 0.000486
[003/00599] train_loss: 0.000479
[003/00649] train_loss: 0.000480
[003/00699] train_loss: 0.000479
[003/00749] train_loss: 0.000473
[003/00799] train_loss: 0.000524
[003/00799] val_loss: 0.095523 | best_loss_val: 0.093435
[003/00849] train_loss: 0.000511
[003/00899] train_loss: 0.000518
[003/00949] train_loss: 0.000499
[003/00999] train_loss: 0.000490
[003/01049] train_loss: 0.000501
[003/01099] train_loss: 0.000500
[003/01149] train_loss: 0.000462
[003/01199] train_loss: 0.000446
[003/01249] train_loss: 0.000474
[003/01299] train_loss: 0.000449
[003/01349] train_loss: 0.000455
[003/01399] train_l

 80%|████████  | 4/5 [1:21:56<20:30, 1230.54s/it]

[003/02399] train_loss: 0.000495
[004/00049] train_loss: 0.004614
[004/00099] train_loss: 0.001600
[004/00149] train_loss: 0.001422
[004/00199] train_loss: 0.001321
[004/00249] train_loss: 0.001311
[004/00299] train_loss: 0.001273
[004/00349] train_loss: 0.001199
[004/00399] train_loss: 0.001151
[004/00399] val_loss: 0.305355 | best_loss_val: 0.088596
[004/00449] train_loss: 0.001203
[004/00499] train_loss: 0.001188
[004/00549] train_loss: 0.001210
[004/00599] train_loss: 0.001093
[004/00649] train_loss: 0.001084
[004/00699] train_loss: 0.001079
[004/00749] train_loss: 0.001083
[004/00799] train_loss: 0.001064
[004/00849] train_loss: 0.001035
[004/00899] train_loss: 0.001054
[004/00949] train_loss: 0.001129
[004/00999] train_loss: 0.001022
[004/01049] train_loss: 0.000984
[004/01099] train_loss: 0.000963
[004/01149] train_loss: 0.001018
[004/01199] train_loss: 0.001046
[004/01249] train_loss: 0.000954
[004/01299] train_loss: 0.000999
[004/01349] train_loss: 0.000962
[004/01399] train_l

100%|██████████| 5/5 [1:43:23<00:00, 1240.76s/it]


### (f) Inference

Implement the missing bits in `exercise_3/inference/infer_3depn.py`. You should then be able to see your reconstructions below.

The outputs of our provided visualization functions are, from left to right:
- Input, partial shape
- Predicted completion
- Target shape

In [None]:
from exercise_3.util.visualization import visualize_meshes
from exercise_3.inference.infer_3depn import InferenceHandler3DEPN

# create a handler for inference using a trained checkpoint
inferer = InferenceHandler3DEPN('exercise_3/runs/3_1_3depn_generalization/model_best.ckpt')

In [None]:
input_sdf = ShapeNet.get_shape_sdf('03636649/b286c9c136784db2af1744fdb1fbe7df__0__')
target_df = ShapeNet.get_shape_df('03636649/b286c9c136784db2af1744fdb1fbe7df__0__')

input_mesh, reconstructed_mesh, target_mesh = inferer.infer_single(input_sdf, target_df)
visualize_meshes([input_mesh, reconstructed_mesh, target_mesh], flip_axes=True)

In [None]:
input_sdf = ShapeNet.get_shape_sdf('03636649/23eaba9bdd51a5b0dfe9cab879fd37e8__1__')
target_df = ShapeNet.get_shape_df('03636649/23eaba9bdd51a5b0dfe9cab879fd37e8__0__')

input_mesh, reconstructed_mesh, target_mesh = inferer.infer_single(input_sdf, target_df)
visualize_meshes([input_mesh, reconstructed_mesh, target_mesh], flip_axes=True)

In [None]:
input_sdf = ShapeNet.get_shape_sdf('02691156/5de2cc606b65b960e0b6546e08902f28__0__')
target_df = ShapeNet.get_shape_df('02691156/5de2cc606b65b960e0b6546e08902f28__0__')

input_mesh, reconstructed_mesh, target_mesh = inferer.infer_single(input_sdf, target_df)
visualize_meshes([input_mesh, reconstructed_mesh, target_mesh], flip_axes=True)

In [None]:
#%cd 3DML/E3
from google.colab import files
files.download("exercise_3/data/shapenet_dim32_sdf/02691156/5de2cc606b65b960e0b6546e08902f28__0__.sdf")
files.download("exercise_3/data/shapenet_dim32_df/02691156/5de2cc606b65b960e0b6546e08902f28__0__.df")

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

## 3.2 DeepSDF


Here, we will take a look at 3D-reconstruction using [DeepSDF](https://arxiv.org/abs/1901.05103). We recommend reading the paper before attempting the exercise.

DeepSDF is an auto-decoder based approach that learns a continuous SDF representation for a class of shapes. Once trained, it can be used for shape representation, interpolation and shape completion. We'll look at each of these
applications.

<img src="exercise_3/images/deepsdf_teaser.png" alt="deepsdf_teaser" style="width: 800px;"/>

During training, the autodecoder optimizes both the network parameters and the latent codes representing each of the training shapes. Once trained, to reconstruct a shape given its SDF observations, a latent code is
optimized keeping the network parameters fixed, such that the optimized latent code gives the lowest error with observed SDF values.

An advantage that implicit representations have over voxel/grid based approaches is that they are not tied to a particular grid resolution, and can be evaluated at any resolution once trained.

Similar to previous exercise, we'll first download the processed dataset, look at the implementation of the dataset, the model and the trainer, try out overfitting and generalization over the entire dataset, and finally inference on unseen samples.

### (a) Downloading the data

Whereas volumetric models output entire 3d shape representations, implicit models like DeepSDF work on per point basis. The network takes in a 3D-coordinate (and additionally the latent vector) and outputs the SDF value at the queried point. To train such a model,
we therefore need, for each of the training shapes, a bunch of points with their corresponding SDF values for supervision. Points are sampled more aggressively near the surface of the object as we want to capture a more detailed SDF near the surface. For those curious,
data preparation is decribed in more detail in section 5 of the paper.

We'll be using the ShapeNet Sofa class for the experiments in this exercise. We've already prepared this data, so that you don't need to deal with the preprocessing. For each shape, the following files are provided:
- `mesh.obj` representing the mesh representation of the shape
- `sdf.npz` file containing large number of points sampled on and around the mesh and their sdf values; contains numpy arrays under keys "pos" and "neg", containing points with positive and negative sdf values respectively

```
# contents of exercise_3/data/sdf_sofas
1faa4c299b93a3e5593ebeeedbff73b/                    # shape 0
    ├── mesh.obj                                    # shape 0 mesh
    ├── sdf.npz                                     # shape 0 sdf
    ├── surface.obj                                 # shape 0 surface
1fde48d83065ef5877a929f61fea4d0/                    # shape 1
1fe1411b6c8097acf008d8a3590fb522/                   # shape 2
:
```
Download and extract the data with the code cell below.

In [None]:
!ls

exercise_3  exercise_3.ipynb  pyproject.toml  requirements.txt


In [None]:
print('Downloading ...')
# File sizes: ~10GB
!wget https://www.dropbox.com/s/4k5pw126nzus8ef/sdf_sofas.zip\?dl\=0 -O exercise_3/data/sdf_sofas.zip -P exercise_3/data

print('Extracting ...')
!unzip -q exercise_3/data/sdf_sofas.zip -d exercise_3/data
!rm exercise_3/data/sdf_sofas.zip

print('Done.')

Downloading ...
--2023-06-08 12:39:06--  https://www.dropbox.com/s/4k5pw126nzus8ef/sdf_sofas.zip?dl=0
Resolving www.dropbox.com (www.dropbox.com)... 162.125.65.18, 2620:100:601d:18::a27d:512
Connecting to www.dropbox.com (www.dropbox.com)|162.125.65.18|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: /s/raw/4k5pw126nzus8ef/sdf_sofas.zip [following]
--2023-06-08 12:39:06--  https://www.dropbox.com/s/raw/4k5pw126nzus8ef/sdf_sofas.zip
Reusing existing connection to www.dropbox.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://uce651a248af14b8d94ba1c090af.dl.dropboxusercontent.com/cd/0/inline/B9kW0kU3RNEjimHrslWRuS6a6-rxGSqK4RnTJk3KOBYgjrNgu3rjCdUxd7vRyPhOqBtSri_Jks7XPU4GNj3qZJHnwRWoYXUzGV5yqQzlX5dclCnNSJHxv_auVG6Qvnxac8uxFtxnDxWzrGegoHip_ykk-4CD-C81LoziHTZEo5KKlQ/file# [following]
--2023-06-08 12:39:07--  https://uce651a248af14b8d94ba1c090af.dl.dropboxusercontent.com/cd/0/inline/B9kW0kU3RNEjimHrslWRuS6a6-rxGSqK4RnTJk3KOBYgjrNgu3rjCd

### (b) Dataset

We provide a partial implementation of the dataset in `exercise_3/data/shape_implicit.py`.
Your task is to complete the `#TODOs` so that the dataset works as specified by the docstrings.

Once done, you can try running the following code blocks as sanity checks.

In [None]:
from exercise_3.data.shape_implicit import ShapeImplicit

num_points_to_samples = 40000
train_dataset = ShapeImplicit(num_points_to_samples, "train")
val_dataset = ShapeImplicit(num_points_to_samples, "val")
overfit_dataset = ShapeImplicit(num_points_to_samples, "overfit")

# Get length, which is a call to __len__ function
print(f'Length of train set: {len(train_dataset)}')  # expected output: 1226
# Get length, which is a call to __len__ function
print(f'Length of val set: {len(val_dataset)}')  # expected output: 137
# Get length, which is a call to __len__ function
print(f'Length of overfit set: {len(overfit_dataset)}')  # expected output: 1

Length of train set: 1226
Length of val set: 137
Length of overfit set: 1


Let's take a look at the points sampled for a particular shape.

In [None]:
from exercise_3.util.visualization import visualize_mesh, visualize_pointcloud
train_dataset = ShapeImplicit(num_points_to_samples, "train")
shape_id = train_dataset[0]['name']
print(shape_id,"SHAPE_ID")
points = train_dataset[0]['points']
sdf = train_dataset[0]['sdf']

# sampled points inside the shape
inside_points = points[sdf[:, 0] < 0, :].numpy()

# sampled points outside the shape
outside_points = points[sdf[:, 0] > 0, :].numpy()


print(inside_points.shape,"INSIDE")
print(outside_points.shape,"OUTSIDE")

7ae657b39aa2be68ccd1bcd57588acf8 SHAPE_ID
(20000, 3) INSIDE
(20000, 3) OUTSIDE


In [None]:
mesh = ShapeImplicit.get_mesh(shape_id)
print('Mesh')
visualize_mesh(mesh.vertices, mesh.faces, flip_axes=True)

Mesh




Output()

In [None]:
print('Sampled points with negative SDF (inside)')
visualize_pointcloud(inside_points, 0.025, flip_axes=True)

In [None]:
print('Sampled points with positive SDF (outside)')
visualize_pointcloud(outside_points, 0.025, flip_axes=True)

You'll notice that more points are sampled close to the surface rather than away from the surface.

### (c) Model

The DeepSDF auto-decoder architecture is visualized below:

<img src="exercise_3/images/deepsdf_architecture.png" alt="deepsdf_arch" style="width: 640px;"/>

Things to note:

- The network takes in the latent code for a shape concatenated with the query 3d coordinate, making up a 259 length vector (assuming latent code length is 256).
- The network consist of a sequence of weight-normed linear layers, each followed by a ReLU and a dropout. For weight norming a layer, check out `torch.nn.utils.weight_norm`. Each of these linear layers outputs a 512 dimensional vector, except the 4th layer which outputs a 253 dimensional vector.
- The output of the 4th layer is concatenated with the input, making the input to the 5th layer a 512 dimensional vector.
- The final layer is a simple linear layer without any norm, dropout or non-linearity, with a single dimensional output representing the SDF value.

Implement this architecture in file `exercise_3/model/deepsdf.py`.

Here are some basic sanity tests once you're done with your implementation.

In [None]:
from exercise_3.model.deepsdf import DeepSDFDecoder
from exercise_3.util.model import summarize_model

deepsdf = DeepSDFDecoder(latent_size=256)
print(summarize_model(deepsdf))

# input to the network is a concatenation of point coordinates (3) and the latent code (256 in this example);
# here we use a batch of 4096 points
input_tensor = torch.randn(4096, 3 + 256)
predictions = deepsdf(input_tensor)

print('\nOutput tensor shape: ', predictions.shape)  # expected output: 4096, 1

num_trainable_params = sum(p.numel() for p in deepsdf.parameters() if p.requires_grad) / 1e6
print(f'\nNumber of traininable params: {num_trainable_params:.2f}M')  # expected output: ~1.8M

   | Name           | Type           | Params 
----------------------------------------------------
0  | first_half     | Sequential     | 790010 
1  | first_half.0   | Linear         | 133632 
2  | first_half.1   | ReLU           | 0      
3  | first_half.2   | Dropout        | 0      
4  | first_half.3   | Linear         | 263168 
5  | first_half.4   | ReLU           | 0      
6  | first_half.5   | Dropout        | 0      
7  | first_half.6   | Linear         | 263168 
8  | first_half.7   | ReLU           | 0      
9  | first_half.8   | Dropout        | 0      
10 | first_half.9   | Linear         | 130042 
11 | first_half.10  | ReLU           | 0      
12 | first_half.11  | Dropout        | 0      
13 | second_half    | Sequential     | 1053186
14 | second_half.0  | Linear         | 263168 
15 | second_half.1  | ReLU           | 0      
16 | second_half.2  | Dropout        | 0      
17 | second_half.3  | Linear         | 263168 
18 | second_half.4  | ReLU           | 0      
19 | se

### (d) Training script and overfitting to a single shape

Fill in the train script in `exercise_3/training/train_deepsdf.py`, and verify that your training work by overfitting to a few samples below.

In [None]:
from exercise_3.training import train_deepsdf

overfit_config = {
    'experiment_name': '3_2_deepsdf_overfit',
    'device': 'cuda:0',  # change this to cpu if you do not have a GPU
    'is_overfit': True,
    'num_sample_points': 4096,
    'latent_code_length': 256,
    'batch_size': 1,
    'resume_ckpt': None,
    'learning_rate_model': 0.0005,
    'learning_rate_code': 0.001,
    'lambda_code_regularization': 0.0001,
    'max_epochs': 2000,
    'print_every_n': 50,
    'visualize_every_n': 250,
}

train_deepsdf.main(overfit_config)  # expected loss around 0.0062

Using device: cuda:0


  3%|▎         | 53/2000 [00:07<01:54, 16.95it/s]

[049/00000] train_loss: 0.035976


  5%|▌         | 103/2000 [00:09<01:45, 18.01it/s]

[099/00000] train_loss: 0.024810


  8%|▊         | 153/2000 [00:12<01:42, 17.95it/s]

[149/00000] train_loss: 0.018519


 10%|█         | 203/2000 [00:15<01:40, 17.92it/s]

[199/00000] train_loss: 0.014604


 12%|█▏        | 249/2000 [00:18<02:12, 13.26it/s]

[249/00000] train_loss: 0.012692


 15%|█▌        | 303/2000 [00:22<01:38, 17.27it/s]

[299/00000] train_loss: 0.011477


 18%|█▊        | 353/2000 [00:25<01:32, 17.75it/s]

[349/00000] train_loss: 0.010535


 20%|██        | 403/2000 [00:27<01:28, 18.03it/s]

[399/00000] train_loss: 0.010194


 23%|██▎       | 453/2000 [00:30<01:25, 18.00it/s]

[449/00000] train_loss: 0.009521


 25%|██▍       | 499/2000 [00:33<01:45, 14.20it/s]

[499/00000] train_loss: 0.008915


 28%|██▊       | 551/2000 [00:37<01:48, 13.32it/s]

[549/00000] train_loss: 0.008322


 30%|███       | 603/2000 [00:40<01:16, 18.36it/s]

[599/00000] train_loss: 0.008141


 33%|███▎      | 653/2000 [00:43<01:14, 18.03it/s]

[649/00000] train_loss: 0.007995


 35%|███▌      | 703/2000 [00:45<01:13, 17.71it/s]

[699/00000] train_loss: 0.007767


 37%|███▋      | 749/2000 [00:48<01:30, 13.86it/s]

[749/00000] train_loss: 0.007667


 40%|████      | 801/2000 [00:52<01:29, 13.45it/s]

[799/00000] train_loss: 0.007564


 43%|████▎     | 853/2000 [00:55<01:05, 17.42it/s]

[849/00000] train_loss: 0.007429


 45%|████▌     | 903/2000 [00:58<01:00, 18.09it/s]

[899/00000] train_loss: 0.007331


 48%|████▊     | 953/2000 [01:00<00:57, 18.06it/s]

[949/00000] train_loss: 0.007161


 50%|████▉     | 999/2000 [01:03<01:01, 16.15it/s]

[999/00000] train_loss: 0.007102


 53%|█████▎    | 1051/2000 [01:07<01:08, 13.93it/s]

[1049/00000] train_loss: 0.006851


 55%|█████▌    | 1103/2000 [01:10<00:48, 18.62it/s]

[1099/00000] train_loss: 0.006840


 58%|█████▊    | 1153/2000 [01:13<00:46, 18.37it/s]

[1149/00000] train_loss: 0.006752


 60%|██████    | 1203/2000 [01:15<00:46, 17.32it/s]

[1199/00000] train_loss: 0.006688


 62%|██████▏   | 1249/2000 [01:18<00:40, 18.70it/s]

[1249/00000] train_loss: 0.006598


 65%|██████▌   | 1301/2000 [01:22<00:48, 14.56it/s]

[1299/00000] train_loss: 0.006643


 68%|██████▊   | 1353/2000 [01:25<00:36, 17.87it/s]

[1349/00000] train_loss: 0.006573


 70%|███████   | 1403/2000 [01:28<00:32, 18.29it/s]

[1399/00000] train_loss: 0.006516


 73%|███████▎  | 1453/2000 [01:30<00:30, 18.17it/s]

[1449/00000] train_loss: 0.006458


 75%|███████▍  | 1499/2000 [01:33<00:27, 17.99it/s]

[1499/00000] train_loss: 0.006442


 78%|███████▊  | 1552/2000 [01:38<00:45,  9.77it/s]

[1549/00000] train_loss: 0.006347


 80%|████████  | 1602/2000 [01:42<00:22, 17.72it/s]

[1599/00000] train_loss: 0.006323


 83%|████████▎ | 1652/2000 [01:44<00:19, 17.75it/s]

[1649/00000] train_loss: 0.006255


 85%|████████▌ | 1702/2000 [01:47<00:15, 18.67it/s]

[1699/00000] train_loss: 0.006258


 87%|████████▋ | 1748/2000 [01:50<00:13, 18.70it/s]

[1749/00000] train_loss: 0.006237


 90%|█████████ | 1802/2000 [01:54<00:14, 13.47it/s]

[1799/00000] train_loss: 0.006170


 93%|█████████▎| 1852/2000 [01:57<00:08, 18.06it/s]

[1849/00000] train_loss: 0.006167


 95%|█████████▌| 1902/2000 [02:00<00:05, 18.53it/s]

[1899/00000] train_loss: 0.006175


 98%|█████████▊| 1952/2000 [02:02<00:02, 18.04it/s]

[1949/00000] train_loss: 0.006159


100%|█████████▉| 1998/2000 [02:05<00:00, 17.11it/s]

[1999/00000] train_loss: 0.006091


100%|██████████| 2000/2000 [02:05<00:00, 15.89it/s]


Let's visualize the overfitted shape reconstruction to check if it looks reasonable.

In [None]:
# !ls
# from google.colab import files
# files.download("exercise_3/runs/3_2_deepsdf_overfit/meshes/01999_000.obj")

exercise_3  exercise_3.ipynb  pyproject.toml  requirements.txt


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [None]:
# Load and visualize GT mesh of the overfit sample
gt_mesh = ShapeImplicit.get_mesh('7e728818848f191bee7d178666aae23d')
print('GT')
visualize_mesh(gt_mesh.vertices, gt_mesh.faces, flip_axes=True)

# Load and visualize reconstructed overfit sample; it's okay if they don't look visually exact, since we don't run 
# the training too long and have a learning rate decay while training 
mesh_path = "exercise_3/runs/3_2_deepsdf_overfit/meshes/01999_000.obj"
overfit_output = trimesh.load(mesh_path)
print('Overfit')
visualize_mesh(overfit_output.vertices, overfit_output.faces, flip_axes=True)

GT




Output()

Overfit


Output()

### (e) Training over entire train set

Once overfitting works, we can train on the entire train set.

Note: This training will take a few hours on a GPU (took ~3 hrs for 500 epochs on our 2080Ti, which already gave decent results). Please make sure to start training early enough before the submission deadline.

In [None]:
from exercise_3.training import train_deepsdf

generalization_config = {
    'experiment_name': '3_2_deepsdf_generalization',
    'device': 'cuda:0',  # run this on a gpu for a reasonable training time
    'is_overfit': False,
    'num_sample_points': 4096, # you can adjust this such that the model fits on your gpu
    'latent_code_length': 256,
    'batch_size': 1,
    'resume_ckpt': None,
    'learning_rate_model': 0.0005,
    'learning_rate_code': 0.001,
    'lambda_code_regularization': 0.0001,
    'max_epochs': 2000,  # not necessary to run for 2000 epochs if you're short on time, at 500 epochs you should start to see reasonable results
    'print_every_n': 50,
    'visualize_every_n': 5000,
}

train_deepsdf.main(generalization_config)

Using device: cuda:0


  0%|          | 0/2000 [00:00<?, ?it/s]

[000/00049] train_loss: 0.037484
[000/00099] train_loss: 0.034493
[000/00149] train_loss: 0.035888
[000/00199] train_loss: 0.033578
[000/00249] train_loss: 0.032462
[000/00299] train_loss: 0.032524
[000/00349] train_loss: 0.032328
[000/00399] train_loss: 0.031745
[000/00449] train_loss: 0.033002
[000/00499] train_loss: 0.032122
[000/00549] train_loss: 0.032328
[000/00599] train_loss: 0.032662
[000/00649] train_loss: 0.033754
[000/00699] train_loss: 0.032155
[000/00749] train_loss: 0.031840
[000/00799] train_loss: 0.030911
[000/00849] train_loss: 0.032301
[000/00899] train_loss: 0.032196
[000/00949] train_loss: 0.032887
[000/00999] train_loss: 0.032258
[000/01049] train_loss: 0.031721
[000/01099] train_loss: 0.031450
[000/01149] train_loss: 0.031144
[000/01199] train_loss: 0.032503


  0%|          | 1/2000 [01:59<66:27:19, 119.68s/it]

[001/00023] train_loss: 0.030132
[001/00073] train_loss: 0.031243
[001/00123] train_loss: 0.031281
[001/00173] train_loss: 0.030552
[001/00223] train_loss: 0.032110
[001/00273] train_loss: 0.031024
[001/00323] train_loss: 0.030956
[001/00373] train_loss: 0.030882
[001/00423] train_loss: 0.031029
[001/00473] train_loss: 0.030829
[001/00523] train_loss: 0.030333
[001/00573] train_loss: 0.030893
[001/00623] train_loss: 0.030512
[001/00673] train_loss: 0.031160
[001/00723] train_loss: 0.030036
[001/00773] train_loss: 0.030761
[001/00823] train_loss: 0.029917
[001/00873] train_loss: 0.030677
[001/00923] train_loss: 0.030183
[001/00973] train_loss: 0.030827
[001/01023] train_loss: 0.030405
[001/01073] train_loss: 0.031306
[001/01123] train_loss: 0.030688
[001/01173] train_loss: 0.031321


  0%|          | 2/2000 [03:55<65:11:12, 117.45s/it]

[001/01223] train_loss: 0.031086
[002/00047] train_loss: 0.030270
[002/00097] train_loss: 0.029375
[002/00147] train_loss: 0.029553
[002/00197] train_loss: 0.029401
[002/00247] train_loss: 0.030019
[002/00297] train_loss: 0.029439
[002/00347] train_loss: 0.028992
[002/00397] train_loss: 0.029690
[002/00447] train_loss: 0.029885
[002/00497] train_loss: 0.030108
[002/00547] train_loss: 0.030238
[002/00597] train_loss: 0.029723
[002/00647] train_loss: 0.029397
[002/00697] train_loss: 0.029333
[002/00747] train_loss: 0.029508
[002/00797] train_loss: 0.029775
[002/00847] train_loss: 0.028508
[002/00897] train_loss: 0.028748
[002/00947] train_loss: 0.028625
[002/00997] train_loss: 0.028851
[002/01047] train_loss: 0.029825
[002/01097] train_loss: 0.028900
[002/01147] train_loss: 0.029131
[002/01197] train_loss: 0.027758


  0%|          | 3/2000 [05:51<64:45:53, 116.75s/it]

[003/00021] train_loss: 0.030295
[003/00071] train_loss: 0.028859
[003/00121] train_loss: 0.028304
[003/00171] train_loss: 0.028929
[003/00221] train_loss: 0.027478
[003/00271] train_loss: 0.028787
[003/00321] train_loss: 0.028084
[003/00371] train_loss: 0.027945
[003/00421] train_loss: 0.027264
[003/00471] train_loss: 0.027606
[003/00521] train_loss: 0.027811
[003/00571] train_loss: 0.027942
[003/00621] train_loss: 0.027360
[003/00671] train_loss: 0.027440
[003/00721] train_loss: 0.027809
[003/00771] train_loss: 0.027143
[003/00821] train_loss: 0.027926
[003/00871] train_loss: 0.027451
[003/00921] train_loss: 0.026784
[003/00971] train_loss: 0.028832
[003/01021] train_loss: 0.026890
[003/01071] train_loss: 0.027666
[003/01121] train_loss: 0.027037
[003/01171] train_loss: 0.026920
[003/01221] train_loss: 0.027050


  0%|          | 4/2000 [07:46<64:26:14, 116.22s/it]

[004/00045] train_loss: 0.027342
[004/00095] train_loss: 0.027301
[004/00145] train_loss: 0.026904
[004/00195] train_loss: 0.026581
[004/00245] train_loss: 0.026110
[004/00295] train_loss: 0.026457
[004/00345] train_loss: 0.027206
[004/00395] train_loss: 0.026749
[004/00445] train_loss: 0.026964
[004/00495] train_loss: 0.025967
[004/00545] train_loss: 0.025626
[004/00595] train_loss: 0.026526
[004/00645] train_loss: 0.025653
[004/00695] train_loss: 0.026161
[004/00745] train_loss: 0.026212
[004/00795] train_loss: 0.026504
[004/00845] train_loss: 0.025559
[004/00895] train_loss: 0.026748
[004/00945] train_loss: 0.025528
[004/00995] train_loss: 0.025397
[004/01045] train_loss: 0.025965
[004/01095] train_loss: 0.025896
[004/01145] train_loss: 0.026041
[004/01195] train_loss: 0.025605


  0%|          | 5/2000 [09:40<63:53:27, 115.29s/it]

[005/00019] train_loss: 0.027294
[005/00069] train_loss: 0.027043
[005/00119] train_loss: 0.025658
[005/00169] train_loss: 0.025653
[005/00219] train_loss: 0.024800
[005/00269] train_loss: 0.025797
[005/00319] train_loss: 0.025436
[005/00369] train_loss: 0.025418
[005/00419] train_loss: 0.025680
[005/00469] train_loss: 0.025492
[005/00519] train_loss: 0.024780
[005/00569] train_loss: 0.025006
[005/00619] train_loss: 0.024872
[005/00669] train_loss: 0.025293
[005/00719] train_loss: 0.024946
[005/00769] train_loss: 0.025124
[005/00819] train_loss: 0.024413
[005/00869] train_loss: 0.025283
[005/00919] train_loss: 0.024675
[005/00969] train_loss: 0.024992
[005/01019] train_loss: 0.024980
[005/01069] train_loss: 0.025355
[005/01119] train_loss: 0.025457
[005/01169] train_loss: 0.024908
[005/01219] train_loss: 0.024816


  0%|          | 6/2000 [11:34<63:34:00, 114.76s/it]

[006/00043] train_loss: 0.026624
[006/00093] train_loss: 0.025646
[006/00143] train_loss: 0.026086
[006/00193] train_loss: 0.024646
[006/00243] train_loss: 0.024455
[006/00293] train_loss: 0.024132
[006/00343] train_loss: 0.024092
[006/00393] train_loss: 0.024434
[006/00443] train_loss: 0.023644
[006/00493] train_loss: 0.024098
[006/00543] train_loss: 0.023998
[006/00593] train_loss: 0.024135
[006/00643] train_loss: 0.023003
[006/00693] train_loss: 0.023707
[006/00743] train_loss: 0.023149
[006/00793] train_loss: 0.023194
[006/00843] train_loss: 0.024413
[006/00893] train_loss: 0.024978
[006/00943] train_loss: 0.024075
[006/00993] train_loss: 0.024668
[006/01043] train_loss: 0.023936
[006/01093] train_loss: 0.023991
[006/01143] train_loss: 0.024260
[006/01193] train_loss: 0.022916


  0%|          | 7/2000 [13:24<62:46:53, 113.40s/it]

[007/00017] train_loss: 0.025372
[007/00067] train_loss: 0.024782
[007/00117] train_loss: 0.024163
[007/00167] train_loss: 0.023453
[007/00217] train_loss: 0.024925
[007/00267] train_loss: 0.023835
[007/00317] train_loss: 0.023557
[007/00367] train_loss: 0.023111
[007/00417] train_loss: 0.024011
[007/00467] train_loss: 0.023365
[007/00517] train_loss: 0.023566
[007/00567] train_loss: 0.023794
[007/00617] train_loss: 0.022549
[007/00667] train_loss: 0.024316
[007/00717] train_loss: 0.022746
[007/00767] train_loss: 0.022980
[007/00817] train_loss: 0.023215
[007/00867] train_loss: 0.023114
[007/00917] train_loss: 0.022450
[007/00967] train_loss: 0.022690
[007/01017] train_loss: 0.023297
[007/01067] train_loss: 0.023945
[007/01117] train_loss: 0.023812
[007/01167] train_loss: 0.022380
[007/01217] train_loss: 0.023929


  0%|          | 8/2000 [15:16<62:27:32, 112.88s/it]

[008/00041] train_loss: 0.025023
[008/00091] train_loss: 0.025048
[008/00141] train_loss: 0.022638
[008/00191] train_loss: 0.023440
[008/00241] train_loss: 0.022883
[008/00291] train_loss: 0.023160
[008/00341] train_loss: 0.022506
[008/00391] train_loss: 0.022485
[008/00441] train_loss: 0.022636
[008/00491] train_loss: 0.022450
[008/00541] train_loss: 0.022745
[008/00591] train_loss: 0.022993
[008/00641] train_loss: 0.022053
[008/00691] train_loss: 0.023085
[008/00741] train_loss: 0.022576
[008/00791] train_loss: 0.022505
[008/00841] train_loss: 0.023074
[008/00891] train_loss: 0.022624
[008/00941] train_loss: 0.022430
[008/00991] train_loss: 0.022330
[008/01041] train_loss: 0.022315
[008/01091] train_loss: 0.022420
[008/01141] train_loss: 0.023264
[008/01191] train_loss: 0.021930


  0%|          | 9/2000 [17:08<62:19:08, 112.68s/it]

[009/00015] train_loss: 0.022439
[009/00065] train_loss: 0.024648
[009/00115] train_loss: 0.023466
[009/00165] train_loss: 0.022858
[009/00215] train_loss: 0.024145
[009/00265] train_loss: 0.022661
[009/00315] train_loss: 0.023529
[009/00365] train_loss: 0.021644
[009/00415] train_loss: 0.023106
[009/00465] train_loss: 0.022787
[009/00515] train_loss: 0.022209
[009/00565] train_loss: 0.022006
[009/00615] train_loss: 0.022154
[009/00665] train_loss: 0.022107
[009/00715] train_loss: 0.022930
[009/00765] train_loss: 0.021915
[009/00815] train_loss: 0.022003
[009/00865] train_loss: 0.021836
[009/00915] train_loss: 0.021740
[009/00965] train_loss: 0.021192
[009/01015] train_loss: 0.021298
[009/01065] train_loss: 0.022617
[009/01115] train_loss: 0.021826
[009/01165] train_loss: 0.021926
[009/01215] train_loss: 0.022143


  0%|          | 10/2000 [18:55<61:10:20, 110.66s/it]

[010/00039] train_loss: 0.023943
[010/00089] train_loss: 0.023332
[010/00139] train_loss: 0.023046
[010/00189] train_loss: 0.021799
[010/00239] train_loss: 0.022759
[010/00289] train_loss: 0.022085
[010/00339] train_loss: 0.021643
[010/00389] train_loss: 0.020932
[010/00439] train_loss: 0.021944
[010/00489] train_loss: 0.021637
[010/00539] train_loss: 0.021146
[010/00589] train_loss: 0.022071
[010/00639] train_loss: 0.021285
[010/00689] train_loss: 0.021401
[010/00739] train_loss: 0.020975
[010/00789] train_loss: 0.021653
[010/00839] train_loss: 0.021135
[010/00889] train_loss: 0.021683
[010/00939] train_loss: 0.021984
[010/00989] train_loss: 0.020891
[010/01039] train_loss: 0.021897
[010/01089] train_loss: 0.021518
[010/01139] train_loss: 0.021186
[010/01189] train_loss: 0.021046


  1%|          | 11/2000 [20:42<60:36:35, 109.70s/it]

[011/00013] train_loss: 0.022791
[011/00063] train_loss: 0.024311
[011/00113] train_loss: 0.023707
[011/00163] train_loss: 0.022369
[011/00213] train_loss: 0.021326
[011/00263] train_loss: 0.021712
[011/00313] train_loss: 0.022896
[011/00363] train_loss: 0.021291
[011/00413] train_loss: 0.020347
[011/00463] train_loss: 0.021255
[011/00513] train_loss: 0.020767
[011/00563] train_loss: 0.021741
[011/00613] train_loss: 0.020557
[011/00663] train_loss: 0.021645
[011/00713] train_loss: 0.021967
[011/00763] train_loss: 0.020958
[011/00813] train_loss: 0.020675
[011/00863] train_loss: 0.020903
[011/00913] train_loss: 0.020928
[011/00963] train_loss: 0.020835
[011/01013] train_loss: 0.020883
[011/01063] train_loss: 0.021866
[011/01113] train_loss: 0.021290
[011/01163] train_loss: 0.021268
[011/01213] train_loss: 0.021590


  1%|          | 12/2000 [22:31<60:24:41, 109.40s/it]

[012/00037] train_loss: 0.022774
[012/00087] train_loss: 0.023092
[012/00137] train_loss: 0.022603
[012/00187] train_loss: 0.021636
[012/00237] train_loss: 0.020683
[012/00287] train_loss: 0.021091
[012/00337] train_loss: 0.021291
[012/00387] train_loss: 0.020731
[012/00437] train_loss: 0.020642
[012/00487] train_loss: 0.020954
[012/00537] train_loss: 0.020522
[012/00587] train_loss: 0.021522
[012/00637] train_loss: 0.021102
[012/00687] train_loss: 0.021053
[012/00737] train_loss: 0.021268
[012/00787] train_loss: 0.020915
[012/00837] train_loss: 0.020604
[012/00887] train_loss: 0.020694
[012/00937] train_loss: 0.020650
[012/00987] train_loss: 0.020869
[012/01037] train_loss: 0.020309
[012/01087] train_loss: 0.020312
[012/01137] train_loss: 0.020670
[012/01187] train_loss: 0.020936


  1%|          | 13/2000 [24:21<60:27:16, 109.53s/it]

[013/00011] train_loss: 0.021602


### (f) Inference using the trained model on observed SDF values

Fill in the inference script `exercise_3/inference/infer_deepsdf.py`. Note that it's not simply a forward pass, but an optimization of the latent code such that we have lowest error on observed SDF values.

In [None]:
from exercise_3.inference.infer_deepsdf import InferenceHandlerDeepSDF

device = torch.device('cuda:0')  # change this to cpu if you're not using a gpu

inference_handler = InferenceHandlerDeepSDF(256, "exercise_3/runs/3_2_deepsdf_generalization", device)

First, we try inference on a shape from validation set, for which we have a complete observation of sdf values. This is an easier problem as compared to shape completion,
since we have all the information already in the input.

Let's visualize the observations.

In [None]:
# get observed data
points, sdf = ShapeImplicit.get_all_sdf_samples("b351e06f5826444c19fb4103277a6b93")

inside_points = points[sdf[:, 0] < 0, :].numpy()
outside_points = points[sdf[:, 0] > 0, :].numpy()

# visualize observed points; you'll observe that the observations are very complete
print('Observations with negative SDF (inside)')
visualize_pointcloud(inside_points, 0.025, flip_axes=True)
print('Observations with positive SDF (outside)')
visualize_pointcloud(outside_points, 0.025, flip_axes=True)

Reconstruction on these observations with the trained model:

In [None]:
# reconstruct
vertices, faces = inference_handler.reconstruct(points, sdf, 800)
# visualize
visualize_mesh(vertices, faces, flip_axes=True)

Next, we can try the shape completion task, i.e., inference on a shape from validation set, for which we do not have a complete observation of sdf values. The observed points are visualized below:

In [None]:
# get observed data
points, sdf = ShapeImplicit.get_all_sdf_samples("b351e06f5826444c19fb4103277a6b93_incomplete")

inside_points = points[sdf[:, 0] < 0, :].numpy()
outside_points = points[sdf[:, 0] > 0, :].numpy()

# visualize observed points; you'll observe that the observations are incomplete
# making this is a shape completion task
print('Observations with negative SDF (inside)')
visualize_pointcloud(inside_points, 0.025, flip_axes=True)
print('Observations with positive SDF (outside)')
visualize_pointcloud(outside_points, 0.025, flip_axes=True)

Shape completion using the trained model:

In [None]:
# reconstruct
vertices, faces = inference_handler.reconstruct(points, sdf, 800)
# visualize
visualize_mesh(vertices, faces, flip_axes=True)

### (g) Latent space interpolation

The latent space learned by DeepSDF is interpolatable, meaning that decoding latent codes from this space produced meaningful shapes. Given two latent codes, a linearly interpolatable latent space will decode
each of the intermediate codes to some valid shape. Let's see if this holds for our trained model.

We'll pick two shapes from the train set as visualized below.

In [None]:
from exercise_3.data.shape_implicit import ShapeImplicit
from exercise_3.util.visualization import visualize_mesh

mesh = ShapeImplicit.get_mesh("494fe53da65650b8c358765b76c296")
print('GT Shape A')
visualize_mesh(mesh.vertices, mesh.faces, flip_axes=True)

mesh = ShapeImplicit.get_mesh("5ca1ef55ff5f68501921e7a85cf9da35")
print('GT Shape B')
visualize_mesh(mesh.vertices, mesh.faces, flip_axes=True)

Implement the missing parts in `exercise_3/inference/infer_deepsdf.py` such that it interpolates two given latent vectors, and run the code fragement below once done.

In [None]:
from exercise_3.inference.infer_deepsdf import InferenceHandlerDeepSDF

inference_handler = InferenceHandlerDeepSDF(256, "exercise_3/runs/3_2_deepsdf_generalization", torch.device('cuda:0'))
# interpolate; also exports interpolated meshes to disk
inference_handler.interpolate('494fe53da65650b8c358765b76c296', '5ca1ef55ff5f68501921e7a85cf9da35', 60)

Visualize the interpolation below. If everything works out correctly, you should see a smooth transformation between the shapes, with all intermediate shapes being valid sofas.

In [None]:
from exercise_3.util.mesh_collection_to_gif import  meshes_to_gif
from exercise_3.util.misc import show_gif

# create list of meshes (just exported) to be visualized
mesh_paths = sorted([x for x in Path("exercise_3/runs/3_2_deepsdf_generalization/interpolation").iterdir() if int(x.name.split('.')[0].split("_")[1]) == 0], key=lambda x: int(x.name.split('.')[0].split("_")[0]))
mesh_paths = mesh_paths + mesh_paths[::-1]

# create a visualization of the interpolation process
meshes_to_gif(mesh_paths, "exercise_3/runs/3_2_deepsdf_generalization/latent_interp.gif", 20)
show_gif("exercise_3/runs/3_2_deepsdf_generalization/latent_interp.gif")

## Submission

This is the end of exercise 3 🙂. Please create a zip containing all files we provided, everything you modified, your visualization images/gif (no need to submit generated OBJs), including your checkpoints. Name it with your matriculation number(s) as described in exercise 1. Make sure this notebook can be run without problems. Then, submit via Moodle.

**Note**: The maximum submission file size limit for Moodle is 100M. You do not need to submit your overfitting checkpoints; however, the generalization checkpoint will be >200M. The easiest way to still be able to submit that one is to split it with zip like this: `zip -s 100M model_best.ckpt.zip model_best.ckpt` which creates a `.zip` and a `.z01`. You can then submit both files alongside another zip containing all your code and outputs.

**Submission Deadline**: 13.06.2023, 23:55

## References

[1] Dai, Angela, Charles Ruizhongtai Qi, and Matthias Nießner. "Shape completion using 3d-encoder-predictor cnns and shape synthesis." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017.

[2] Park, Jeong Joon, et al. "Deepsdf: Learning continuous signed distance functions for shape representation." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.