Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How are camera translations scaled? #5

Closed
NagabhushanSN95 opened this issue Apr 26, 2022 · 2 comments
Closed

How are camera translations scaled? #5

NagabhushanSN95 opened this issue Apr 26, 2022 · 2 comments

Comments

@NagabhushanSN95
Copy link

Hi,

I've a dataset of images with given camera intrinsics and extrinsics (pose). So, I'm trying to generate the transforms_train.json file without needing to run colmap and other stuff. To do this, I'm trying to figure out how transforms_train.json is created based on colmap sparse reconstruction for the ScanNet scenes. I figured out the relation between rotation matrices, however, I'm not able to figure out how translation is scaled. I found different scaling factors for the 5 scenes. I tried to understand the C++ code that generates the transforms_train.json file, but I'm not used to C++ and hence couldn't figure it out.

Can you please tell me how to compute the scaling factor for the camera translation?

PS: I also noticed that, unlike the original NeRF which scales translation so that nearest depth becomes unity, you do not do such a scaling here.

@barbararoessle
Copy link
Owner

Here is an explanation on the content of transforms_train.json and the definition of the transformation matrix. Does this answer your question?

In the preprocessing on the ScanNet SfM reconstructions we apply a scaling to the reconstruction (sparse depth and poses), because SfM reconstructions are only upto a scaling factor. This scaling allows to compute depth metrics against the ScanNet sensor depth.
Scaling of the sparse depth and poses is needed only when:

  • you have ground truth depth to evaluate against with different scaling than the camera poses and sparse depth, or
  • the depth prior network was trained on a different range of sparse depth.

@NagabhushanSN95
Copy link
Author

Thanks, Barbara! This helps :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants