Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor homogeneous transforms module #79

Merged
merged 9 commits into from Feb 28, 2019
Merged

Conversation

edgarriba
Copy link
Member

@edgarriba edgarriba commented Feb 21, 2019

This PR tries to fulfill the feature request in #74

Implements:

  • relative_transformation
  • inverse_transformation
  • compose_transformations.

Moves transform_points from module conversions to transformations.

@edgarriba edgarriba added the wip 🛠️ Work in progress label Feb 21, 2019
@edgarriba
Copy link
Member Author

@versatran01 this is a first proposal for #74 .
Take a look at it and share your thoughts.

@versatran01
Copy link

versatran01 commented Feb 21, 2019

@edgarriba Thanks for your effort. I have a few suggestions.

  1. eps should default to 0, I don't know why pytorch people suggest it but if you add eps to a rotation matrix, then it is not a rotation matrix any more (det R != 1) so I'm strongly against that. In fact, eps should be removed, if there's a problem with numerical issues, the user should take care of it instead of the function.

  2. The check in each function for tensor shape should allow both (Bx4x4) and (4x4) tensor.
    As a result, in inverse_transformation

rmat_21: torch.Tensor = torch.transpose(rmat_12, 1, 2)

should become

transpose(mat_12, -1, -2)
  1. In compose transformation, the parameters should be named T_0_1 and T_1_2 to emphasize that they are composable. And the return value should be named T_0_2.

  2. In relative_transformation, the frames of each transformation should be clearly stated as opposed to just trans1 and trans2. For example, one could force these two transformations to be expressed wrt to a common frame, thus the function signature should be

def relative_transformation(T_0_1, T_0_2) -> T_1_2:
    return (T_0_1)^-1 * T_0_2 = T_1_0 * T_0_2  = T_1_2

then this can be implemented using inverse and compose instead having its own implementation.

@codecov
Copy link

codecov bot commented Feb 25, 2019

Codecov Report

Merging #79 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #79      +/-   ##
==========================================
+ Coverage   99.28%   99.28%   +<.01%     
==========================================
  Files          21       21              
  Lines         976      985       +9     
==========================================
+ Hits          969      978       +9     
  Misses          7        7
Impacted Files Coverage Δ
torchgeometry/core/conversions.py 100% <100%> (ø) ⬆️
torchgeometry/core/homography_warper.py 100% <100%> (ø) ⬆️
torchgeometry/core/transformations.py 100% <100%> (ø) ⬆️
torchgeometry/core/depth_warper.py 100% <100%> (ø) ⬆️
torchgeometry/core/pinhole.py 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a1c25b1...f8f1949. Read the comment docs.

@edgarriba edgarriba changed the title first commit to refactor homogeneous transforms module refactor homogeneous transforms module Feb 26, 2019
@edgarriba
Copy link
Member Author

edgarriba commented Feb 26, 2019

@versatran01 @tiancheng-zhi could you do a quick review before I merge ? this are the basics, we can fix/improve later.

Copy link

@versatran01 versatran01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's necessary to convert to homogeneous in transform points. Just do R*p + t. Also that eps in convert from homogeneous should go.

@tiancheng-zhi
Copy link

@edgarriba The current implementation contains bug. I tested DepthWarper and found it cannot warp the image correctly. I'm not sure where the bug is currently (maybe in DepthWarper, or other functions). I'm investigating this and will come back later.

@tiancheng-zhi
Copy link

tiancheng-zhi commented Feb 27, 2019

@edgarriba @versatran01 I found the bug: From my point of view, trans_02=compose_transformation(trans_01, trans_12) should equal trans_02=torch.matmul(trans_12, trans_01). Right? But the current code does not perform like this.

@versatran01
Copy link

@tiancheng-zhi I don't think that's the bug. From my guess (I haven't looked at the code), before the code could be using extrinsics which is T_c_w, with the refactor of transformation, all transformation are poses which means they are T_w_c, so some care has to be taken when using these new functions.

@tiancheng-zhi
Copy link

I assume the extrinsic matrix in DepthWarper is "world-to-camera" rather than "camera-to-world". Looking at how DepthWarper calls relative_transformation, I guess the main issue is the definition of "trans_AB". Does it mean the matrix transforms a point from the view of A to the view of B? If so, then what @versatran01 mentioned:

def relative_transformation(T_0_1, T_0_2) -> T_1_2:
return (T_0_1)^-1 * T_0_2 = T_1_0 * T_0_2 = T_1_2

is incorrect. The correct one should be:

def relative_transformation(T_0_1, T_0_2) -> T_1_2:
return T_0_2 * (T_0_1)^-1

@tiancheng-zhi
Copy link

@edgarriba @versatran01
I guess there are several things need to clarify:

  1. Is camera extrinsics "world-to-camera"?
  2. What does it mean by transformation trans_AB? Is trans_AB a matrix that transforms a point in frame A to frame B? i.e., X_B = trans_AB * X_A ?
  3. What is the definition of compose_transformation?
    since X_B = trans_AB * X_A
    similarly, X_C = trans_AC * X_A
    X_C = trans_BC * X_B
    Then should trans_AC = trans_BC * trans_AB ?
  4. What is the definition of relative_transformation? Should the relative transformation between trans_AB and trans_AC be " trans_BC = trans_AC * (trans_AB)^-1 " ?

It looks like all the problems come from the unclear definitions.

@versatran01
Copy link

versatran01 commented Feb 27, 2019

@tiancheng-zhi
As for the definition of T_0_1:
p_0 = T_0_1 * p_1
This is a convention in robotics/cv.

  1. Extrinsics is world to camera which is T_c_w.
  2. T_A_B is B to A. so X_A = T_A_B * X_B.
  3. you got the convention wrong, if you use the correct one you will see that they are implemented correctly.
  4. relative transformation is a very bad name, but a common definition is
    def relative_transformation(T_0_1, T_0_2) -> T_1_2:
    I would use the name boxplus and boxminus to replace compose and relative.

Read more here
http://paulfurgale.info/news/2014/6/9/representing-robot-pose-the-good-the-bad-and-the-ugly
Also this one
https://www.amazon.com/Robot-Modeling-Control-Mark-Spong/dp/0471649902

@tiancheng-zhi
Copy link

tiancheng-zhi commented Feb 27, 2019

@versatran01 Thanks very much for your clarification.
@edgarriba According to the convention, should Line 84 in depth_warper.py be "relative_transformation(inverse_transformation(extrinsics_dst), inverse_transformation(extrinsics_src))" rather than "relative_transformation(extrinsics_src, extrinsics_dst)"?
(Note that there're two changes: 1. added inverse (converting world-to-camera to camera-to-world), 2. swapped src and dst)

@edgarriba
Copy link
Member Author

hey, thanks for the review. Just few things:
@versatran01

  1. Regarding to the transform_points - it's a generalized function that convert points from different types of transformation such as homographies, homogeneous transforms. So, we cannot assume all the time that there's an Rt. In the case of the homography, we need the whole transformation. I'd keep as it is. Notice that I fixed the epsilon thing in the homogeneous conversion.
  2. Regarding the signatures name, is the first time I see it in this convention (boxplus and boxminus) but if you think it's worth changing and make the whole thing more readable just let's do it. I suggest to do it in a new PR and before the end of this week since I want to do the release of v0.1.2 as soon as posible.

@tiancheng-zhi the convention I was using is the one that @versatran01 points out. Just to make it clear again: transformation from B to A == T_A_B == trans_ab. We read it from right to left and to make the code more readable I prefer to name the transformation variables trans_ rather that T_.

Regarding, the issue you mention in DepthWarper, do the tests fail in your machine ? Because I'm trying it, and seems to produce the expected output. What's more, by adding your fix seems to produce the same output.

@versatran01 @tiancheng-zhi Besides, I would like to hear your opinion about the DepthWarper API since I'm still not convinced the way it works. I adapted it from old codes that we had which accepts in the constructor a bunch of pinhole cameras, however, in practice I'm trying this pattern in a project and it doesn't scale at all. I'm thinking to go back and just accept a single PinholeCamera object instead of a list of them.

@tiancheng-zhi
Copy link

@edgarriba Plase see https://github.com/tiancheng-zhi/reproduce_bug to reproduce the bug. The images, depth map, and poses are taken from ScanNet dataset.

About the depthWarper API, I feel that there's no need to input a list of cameras. If the user wanna do so, he/she can make it a batch or use a for-loop by himself/herself.

@versatran01
Copy link

versatran01 commented Feb 27, 2019

@edgarriba

  1. I agree with you, so just leave it that way.
  2. As for the notation boxplus/boxminus, see this: https://arxiv.org/abs/1107.1119, which is from https://frc.ri.cmu.edu/~hpm/project.archive/reference.file/Smith,Self&Cheeseman.pdf

I assume this code is taken and adapted from sfmlearner-pytorch?
In my own implementation, I use inverse depth instead of depth, which handles point at infinity and also more numerically stable (no divide by 0 any more hence no eps needed). My function signature looks like this

warp(images: [B3HW], idepths: [B1HW], cameras: [B4], poses [B44]) -> warped [B3HW]

where camera is [fx,fy,cx,cy].

@edgarriba
Copy link
Member Author

edgarriba commented Feb 27, 2019

@versatran01 right, I took the code from there. Actually, @ClementPinard helped with that porting and he's willing to contribute with some stuff as well. So, do you think is it worth to reimplement using idepth instead ? Do you have the code anywhere to take a look at it ? I actually like the pattern of having the warpers as modules since that's how we have been using it for a while at arraiy, but it can be discussed.

BTW, @versatran01 @tiancheng-zhi I invite you to join the pytorch slack channel #geometry so that we can discuss trivial things there.

@versatran01
Copy link

My code is currently private, but I'm willing to share it once I clean it up a bit.
I think it makes sense to warp using inverse depth because that's what the network directly predicts.
It can be both a module and a function.

@edgarriba
Copy link
Member Author

@versatran01 sounds good. I guess that the inverse depth feature can be a future improvement. Let me learn about the signature name but I prefer to improve it later as well. If you don't mind could you open tickets to keep tracked about this so that we can organize ourselves and assign tasks ?

@ClementPinard
Copy link

ClementPinard commented Feb 28, 2019

Hi thanks for this contribution.
Just wrote down on paper the inverse warp (note that it's not a warp operation) operation with inverse depth, it checks out and it's indeed more elegant regarding the stability argument : instead of multiplying the point cloud to depth so that (u,v,1) is (x,y,z) you multiply the translation with inverse depth and everything is back to normal. You still have to make an inverse operation after transforming the points with rotation though.

It's worth noting though that this implies at some point the existence of two [B,H,W,3] arrays : R*(u,v,1) and 1/z * (Tx, Tz, Tz) while before we could broadcast the point cloud with the [B,3] translation tensor and have only one. Don't know how it would result memory wise for GPUs.

@ClementPinard
Copy link

Also regarding the eps parameter, I don't advocate for its application like this, but you still have to treat the case where the resulting depth goes to negative values, for example in the case of big rotations. otherwise, the grid_sample might output garbage while it just should output 0. (Note that this does not imply adding an eps values, but rather clamping the depth values to a minimum of eps >0)

With the current implementation how can we know from homogeneous coordinates that the points did have a valid depth (or inverse depth) ?

@edgarriba
Copy link
Member Author

@ClementPinard thanks for the feedback. I will open a ticket to state this. I guess this feature might be included after the v0.1.2 release.

@edgarriba edgarriba added PR: Good to Merge 👌 and removed wip 🛠️ Work in progress labels Feb 28, 2019
@edgarriba edgarriba merged commit 77eccea into master Feb 28, 2019
@edgarriba edgarriba deleted the fix/relative_pose branch February 28, 2019 15:59
edgarriba added a commit that referenced this pull request May 30, 2021
refactor homogeneous transforms module
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants