Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about the equations in the paper #2

Closed
sunshineatnoon opened this issue Nov 27, 2023 · 5 comments
Closed

Questions about the equations in the paper #2

sunshineatnoon opened this issue Nov 27, 2023 · 5 comments

Comments

@sunshineatnoon
Copy link

Hi, Thanks for uploading this awesome work, the result is very cool and interesting!

I have few maybe naive questions about the equations in the paper and hope for a discussion.

  • I am not fully understanding why eq. (4) hold?
  • For eq. (6), my high-level understanding is that instead of opacity, you choose to regularize SDF, so eq.(6) serves as a conversion from opacity to SDF? How do you come to this particular formulation?
  • For Fig.5, why can we compute f(p) by computing the difference between the depth of p's projection and the true depth of p?

Looking forward your reply!

@Anttwo
Copy link
Owner

Anttwo commented Nov 28, 2023

Hi sunshineatnoon,

Thank you so much for your kind words, I really appreciate it!
No problem, I love questions!
Here are some answers that could help you:


I am not fully understanding why eq. (4) hold?

Equation (4) comes from the approximation we make in the ideal case where Gaussians would be 'flat'. Flat means that for each Gaussian g, one of the scaling factor would be much smaller than others. Let's denote by $s_g$ this scaling factor.

Then, if we write the full inverse covariance matrix $\Sigma_g^{-1} = R_g S_g^{-1} S_g^{-1} R_g^T$ using the rotation matrix $R_g$ and the inverse of the diagonal scaling matrix $S_g$ (just like authors from the original Gaussian Splatting paper did), then we see that we can approximate $S_g^{-1}$ as a diagonal matrix with only one nonzero value on the diagonal (the other two being much smaller). This non zero value would be $\frac{1}{s_g}$; Let's say it is the $i$-th value on the diagonal, for some $i\in [1,3]$.

Therefore, multiplying $(p - \mu_g)$ by $S_g^{-1} R_g^T$ is equivalent to projecting $(p - \mu_g)$ on the $i$-th column of $R_g$, (seen as a vector) and scaling the result by $\frac{1}{s_g}$. Since the $i$-th column of $R_g$ is precisely the scaling axis $n_g$ associated to $s_g$, Equation (4) holds true.


Q: For eq. (6), my high-level understanding is that instead of opacity, you choose to regularize SDF, so eq.(6) serves as a conversion from opacity to SDF? How do you come to this particular formulation?

Exactly, we noticed that converting the density function to a distance function better regularizes the reconstruction. Actually, directly using $|d - \bar{d}|$ to regularize already works pretty well for extracting good looking meshes, especially for foreground objects. However, using the distance function gives better quantitative results as it also better regularizes the background.

The intuition is the following: In equation 5 (which presents the density function in the ideal case of flat gaussians well spread on the surface), the scalar product inside the exponential is actually equal to the distance between the 3D point p and the plane passing through the center of the 3D Gaussian $g$ with a normal $n_g$.

The vector $n_g$ is the scaling axis associated with the smallest scaling factor of the Gaussian $g$, so if the gaussian is flat, its smallest scaling factor should be close to 0 and then, it is very intuitive to consider $n_g$ as the normal of the surface.

Finally, Equation 6 is simply the inverse formula of Equation 5. If Equation 5 gives something like $d(p) = h(\langle p-\mu_g, n_g \rangle)$, then Equation 6 provides $\langle p-\mu_g, n_g \rangle = h^{-1}(d(p))$.

The equations hold true in the ideal case where $d = \bar{d}$. However, during optimization, we use the real, non-ideal density $d$ to compute $f$, following Equation 6. This enforces the density to converge toward the ideal density function in a non-destructive way.
I just submitted an update on arxiv that better clarifies this point.


Q: For Fig.5, why can we compute f(p) by computing the difference between the depth of p's projection and the true depth of p?

The estimator $\hat{f}(p)$ is a rough approximation of what would be the real SDF associated with the current scene. But this approximation makes sense in the context of "splatted" depth maps. Let me explain.

For a camera $c$ during optimization, we compute a depth map using the Gaussian Splatting rasterizer. This depth map is not perfectly true, as Gaussians are converted to flat splats facing the camera during rasterization. But still, it is a good approximation: we suppose that the depth map describes well the surface of the scene, as seen by the current camera.

Let's consider a point $p$ sampled using the product of all Gaussian distributions (for gaussians inside the field of view of $c$); since most Gaussians are very small, this point $p$ is likely to be located near the real surface of the scene. Consequently, the SDF value of $p$, i.e. its distance to the surface, should be equal to the distance between $p$ and the surface observed in the depth map, i.e. the distance between $p$ and the surface point that is the closest to $p$; let's call this point $q$. To approximate this distance $|p-q|$, we choose as $q$ the 3D point that corresponds to the projection of $p$ in the depth map.

Why do we do that?
Because the depth map is "splatted", the surface observed in the depth map is approximately composed of small surface elements facing the camera (in practice the Gaussian functions smooth things, but still).
Therefore, the point $q$, which is the surface point that is the closest to $p$, is likely to be the point located on the same ray/line of sight than $p$, i.e. the points that has the same projection as $p$ in the depth map.

This last point is a little tricky, but still, you should just see all this as a regularization tool on the density that allows for involving depth regularization (as we compute $\hat{f}$ usind the depth). It also encourages the Gaussians not only to align with the surface, but also to face the camera poses on average, which is a useful prior for regularizing the background.


I hope my message provides the answers you need!
If not, of course, feel free to ask additional questions.

Best!

@yuedajiong
Copy link

yuedajiong commented Nov 29, 2023

Sigma_g = R_g S_g^{-1} S_g^{-1} R_g^T

Sigma_g = R_g S_g S_g^T R_g^T ?

@Anttwo
Copy link
Owner

Anttwo commented Nov 29, 2023

Oops sorry, the correct formula is indeed $\Sigma_g = R_g S_g S_g^T R_g^T$.
It's just that inside the gaussian function, we use the inverse of the covariance matrix, which is $\Sigma_g^{-1} = R_g S_g^{-1} S_g^{-1} R_g^T$.
I replaced $\Sigma_g$ with $\Sigma_g^{-1}$ in my previous message.

@Anttwo Anttwo closed this as completed Nov 30, 2023
@YingJiang96
Copy link

Thanks for the awesome work!

Could you please provide more details about how to compute the depth map using the Gaussian Splatting rasterizer?

Thank you very much!

@zParquet
Copy link

@Anttwo thanks for your explanation! I have another question about eq. 4.
I understand that "Flat" means for each Gaussian g, one of the scaling factor would be much smaller than others. However, why can we regard the reciprocal of other two scales as 0?
For example, $S_g=[[1e-6,0,0],[0,0.1,0],[0,0,0.5]]$, then $S_g^{-1}=[[1e6,0,0],[0,10,0],[0,0,2]]$. Though there is a scale much larger than the other two, it can not be regarded as a diagonal matrix with only one nonzero value. How to understand this problem? Or are there any other scaling operations before using $S_g^{-1}$? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants