# Data preparation

You can download prepared data [here](https://drive.google.com/drive/folders/1AE_mohNpxRg3JXoBX2oiN-8xaMcGYYuX?usp=sharing)

## Methodology
My kea idea here was to create a dataset of `[index, (xyz)] -> sdf` mappings,
which would then be tokenized by index in the runtime.  

Runtime tokenization allows us to train torch Embedding layer representations for each training shape, and for the test set we would just freeze the decoder and fine-tune the corresponding Embedding layer for the test shapes, obtaining the representations.

For a generalizable model, we need to sample both points within the surface and arbitrarily inside the unit sphere. That was the main challenge I faced in this module, besides calculating the correct SDF itself.

## Explored methods and their challenges

### Self-written solver for sdf calculation
That's what I initially thought of when received the test task, and it sounded exciting. But then I re-read and found the "representation" word that lead me to the DeepSDF kind-of approach. Anyways,
#### Rewards
- Should be pretty flexible and accurate
- Sounds like fun and I love digging into the math, projective geometry and stuff

#### Challenges
- Both research and implementation are time-consuming
- May be very slow in pure Python, and I don't feel like fighting against JIT-compilers this time around

### Taking the `DeepSDF` preprocessing
Could be easy, but for God's sake I hate C++ libraries and their dependencies. x2 pain if you're working on a M1 Macbook as myself
#### Rewards
- Very easy

#### Challenges
- Badly written documentation and prepro code, slowing down my attempts
- Billion of C++ dependencies, each having billion other dependencies that drove me nuts at some point

### `sample_sdf_near_surface` function from [`mesh-to-sdf`](https://github.com/marian42/mesh_to_sdf#sample-sdf-points-non-uniformly-near-the-surface) package
Initially, I was glad to find this package, as it could make my life a ton easier, but... it never worked correctly for me. The SDF values were very noisy and sometimes just absurd. I tried it both on my M1 Macbook and on my Ubuntu laptops. Spent at least a couple of days trying to make it work but no success.

#### Rewards
- Could be easy to use and fast to implement

#### Challenges
- OpenGL engine doesn't work well in M1 Macbooks, crashing every time after 10 renderer calls
- OpenGL engine doesn't work well on Ubuntu as well, producing very corrupted raytracing SDF results
- It's fairly slow (150 models with 100k sampling points took almost 2 hours for me)
- 0 flexibility in choosing the distribution between uniform and surface points
- Bad in-shape sampling, resulting in less than 0.1% inside-points sampled
- Again, the results were just always wrong for me, resulting in very poor model performance

### Tricky `sample_sdf_near_surface` application
I tried sampling 500k points with the function, leaving the neg-SDF points, and subsampling pos-SDF. But no good luck :(

#### Rewards
- Could at least work

#### Challenges
- Scaling number of sampled points slows down the pipeline even more
- Incorrect SDF values, again

###  Aggressive surface sampling with [`pysdf`](https://github.com/sxyu/sdf) package
After days of struggling with `mesh-to-sdf` I started seeking for alternatives and found this library. Being tired of making things work I considered using very tight and aggressive sampling of the shape. Not that easy, though:

#### Rewards
- Pretty accurate results
- Very fast and lightweight, since it's a purely C++ package with python bindings

#### Challenges
- Only surface sampling
- Produces outliers on very sparce point clounds

### Points sampling with `mesh-to-sdf` and SDF calculation with `pysdf`

#### Rewards
- Was supposed to be accurate

#### Challenges
- Again, `mesh-to-sdf` didn't work for me, producing low quality sampling inside the shape


## Final solution
This guy, who posted the `mesh-to-sdf` package, released a [71GB archive of the preprocessed data](https://github.com/marian42/shapegan#data-preparation), with surprisingly okay-ish sampling. After spending 3-4 days on doing the prepro myself I decided to just take his work. 

#### Rewards
- Almost plug-and-play solution, with okay quality

#### Challenges
- I don't have this much spare memory, so I had to download the archive on my second machine, divide it and send to my main laptop
- Very slow download speed ._.

# Anyways, here's an example of the final preprocessed model (run the cells below)
Blue => Point is inside the shape => SDF < 0

In [1]:
import torch
import numpy as np
import pyrender

In [2]:
idx = torch.load('./processed_data/train_idx.pt')
points = torch.load('./processed_data/train_X.pt')
sdf = torch.load('./processed_data/train_y.pt')

In [3]:
N_sample = 0
points_vis = points[idx == N_sample]
sdf_vis = sdf[idx == N_sample]

In [4]:
colors = np.zeros(points_vis.shape)
colors[sdf_vis < 0, 2] = 1
colors[sdf_vis > 0, 0] = 1
cloud = pyrender.Mesh.from_points(points_vis, colors=colors)
scene = pyrender.Scene()
scene.add(cloud)
viewer = pyrender.Viewer(scene, use_raymond_lighting=True, point_size=2)