New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] convert_points_from_homogeneous - NaN gradients in backward pass #367
Comments
@poxyu thanks for reporting, I will investigate it |
with eps equals to 1e-5 seems to work on my computer |
Even in this toy example? |
I will try to give you a real example instead of this toy one. |
@edgarriba this is just an example. After about 13-14 iterations
|
tell me if you need more information |
been trying a couple of things so far. First, created this small unit test: def test_gradcheck_zeros(self, device):
points_h = torch.tensor([[1., 2., 3., 4., 5.],
[4., 6., 0., -3., 1e-9]]).t().to(device)
# evaluate function gradient
points_h = tensor_to_gradcheck_var(points_h) # to var
assert gradcheck(kornia.convert_points_from_homogeneous, (points_h,),
raise_exception=True) with your fix it doesn't pass what I've tried is the following, scale: torch.Tensor = torch.where(
torch.abs(z_vec) > eps,
torch.tensor(1.) / (z_vec + eps), -> torch.tensor(1.) / z_vec.clamp(min=eps),
torch.tensor(1.) / (z_vec),
torch.ones_like(z_vec)) plus reducing the epsilon to eps -> 1e-5 and it passes all the tests except in those where the points has negative values (for the clamp close to positive zero) the other trick is to directly add an epsilon value to the z_vec which also breaks other tests related to geometry transforms. Not really sure what should be the trade-off here, but it's an issue that has been here for a while and discussed at the early stage of the library. Check this: https://www.google.com/url?q=https://github.com/tensorflow/graphics/issues/17&sa=D&source=hangouts&ust=1575817956956000&usg=AFQjCNF7s96oq3UR0zzNUQ0S1TuA8x7rKw |
another thing to consider is a bug in |
https://discuss.pytorch.org/t/gradients-of-torch-where/26835/2 my current workaround is:
thank you, @edgarriba |
@edgarriba I love everything about your PR except this big epsilon 😂 |
well, it's the only way I found to pass the provided test for gradient check and to not break the other existing tests of the whole framework. I've been also playing a bit with the gradcheck tolerance, but it seems that eps 1e-5 is the magic number that makes gradcheck happy :D |
@ducha-aiki the test I provided above is still failing with your solution with an epsilon smaller than 1e-5 |
Well, actually, gradient is not correct here, as we are putting 1 instead of correct huge number/inf. |
sure, then I believe we should somehow provide @poxyu's test (or a small version of it) in order to assure correctness of this function since can be a bit critical for others functions that rely on it. |
@poxyu @ducha-aiki I'va updated #369 with a real case test (using random data). Check it out, |
Similar problem (zero division and NaN gradients) occurs in backward pass here: kornia/kornia/geometry/conversions.py Line 154 in a0bafcd
Super simple toy example:
I guess it has to be a separate issue, hasn't it? 🙂 P.S. - found it several minutes ago and haven't fixed it myself yet. |
Yes, please open a separated issue. Will close this once we merge #369 |
I just experienced a NaN-gradient problem while doing a backward pass here:
kornia/kornia/geometry/conversions.py
Line 99 in 4b0ae70
torch.where
works absolutely fine, but if you have zero divisions you find yourself with NaN-gradients for sure 💩Here is a toy example:
And these are z_vec gradients:
tensor([-0.0625, -0.0278, nan, -0.1111, -0.0000])
For now my little hack is:
But not sure if it's good enough.
The text was updated successfully, but these errors were encountered: