Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a question about your paper #12

Open
yuyu0927 opened this issue Nov 2, 2023 · 4 comments
Open

a question about your paper #12

yuyu0927 opened this issue Nov 2, 2023 · 4 comments

Comments

@yuyu0927
Copy link

yuyu0927 commented Nov 2, 2023

Hi,

Thanks for your brilliant work! I have a small question about your work.

In Figure 3: Coordinate Systems for Estimating Camera Translation in your paper, I feel confused about the translation coordinates. Taking the right one 'Look-at Centered' as an example, the world origin is setted at the unique point closest to the optical axes of all cameras. I can understand why T1 = [0,0,1], but why T2 = [0,0,2]? I suppose it should be [0,1.5,1.5] according to the relative position between the left camera and the world origin. Am I wrong? Could you please explain this? Thanks in advance!

@yuyu0927 yuyu0927 closed this as completed Nov 2, 2023
@jasonyzhang
Copy link
Collaborator

Hi,

The translation of a camera is defined as the position of the origin with respect to the camera center in the coordinate system of that camera. Camera 2 is rotated 45 degrees such that the origin lies along the optical axis of the camera. Thus, it's translation is [0, 0, 2] since the origin is 2 units away.

@yuyu0927
Copy link
Author

yuyu0927 commented Nov 4, 2023

Hi,

Thanks for your explaination! But I still have 2 questions about some details in your paper.

  1. ti_hat = s(ti - Ric), which is above equation(3). Here, c means the new world origin (object center), I can understand we need to convert the T in GT to a new T in Look_at_centered coordinate. But I assume it should be '+' instead of '-' in this equation. Because in Figure 3, T in the First Camera Frame represnets T in GT, if we wanna convert it to a new T in Look_at_centered, we need to add R2[0,0,1]. I am not sure about it. Hope you can explain it to me if possible.

  2. Ideally, all axes of cameras intersect at the 'object center' ? How can you account for the errors here? Because obviously sometimes there are some different intersection points, which are not 'object center'. In some situation where the shooting target is very big, all cameras can only capture one part of it, it is hard to make all cameras aim at the same object center. In this case, is it better to choose 'First Camera Frame' rather than 'Look_at_centered' ?

looking forward to your reply! Thanks a lot!

@yuyu0927 yuyu0927 reopened this Nov 4, 2023
@jasonyzhang
Copy link
Collaborator

  1. It's a minus because we have defined it to be that way: that the green camera is on the same side of the object as the orange camera. We could also have placed it on the other side of the object, and it would be [0, 0, 2] + R @ [0, 0, 1]. If you want verify that the sign is correct, you could also think about a situation where R=Identity matrix (ie the orange camera is directly behind the green camera).

  2. You're right that the intersection point may not lie at the object center. In fact, the optical axes may not intersect at all. In practice, we treat the point that is closest to all the optical axes as a proxy for where the object is roughly located. We believe this is a reasonable assumption for center-facing object capture setups, but you are correct that this is just an assumption and probably does not hold for general scenes, or even some CO3D sequences (e.g. sequences in which the camera is exhibiting mostly panning motion). Note that any choice of world origin is mathematically fine and can be used. Thus, it does not matter whether the object is actually located at this center, it just serves as a useful inductive bias in that translations become the form of (≈0, ≈0, scale).

@yuyu0927
Copy link
Author

yuyu0927 commented Nov 6, 2023

Hi,

For question1, I can understand the formulas in Figure 3. I should be minus. What I don't understand is ti_hat=s(ti−Ri@c), here, ti represents the GT in SFM, so ti is T2 in First Camera Frame. ti_hat means converted translation, which corresponds to T2 in Look-at Centered. I assume it should be T2 in First Camera Frame + R2@[0 0 1] = T2 in Look-at Centered. Therefore, it should be ti(also GT in SFM) + Ri@c = ti_hat(also T2 in Look-at Centered). Here, 's' is ignored. That's why I think it should be '+' instead of '-' in ti_hat=s(ti−Ri@c). Please let know where I am wrong. Thanks!

BTW, I read the code of your project, when processing the GT, you calculate the intersection points considering all cameras and reset the new translation. From this point of view, the above formula is not very important because it only provides a transformation relationship between two coordinate systems. Am I right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants