Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

torch.linalg.inv #45

Open
wtishere opened this issue Oct 10, 2023 · 6 comments
Open

torch.linalg.inv #45

wtishere opened this issue Oct 10, 2023 · 6 comments

Comments

@wtishere
Copy link

Thanks for your excellent work! May I ask for one possible solution for the problem shown as below? Thank you so much!

Traceback (most recent call last):
File "experiments/dkm/train_DKMv3_outdoor.py", line 259, in
train(args)
File "experiments/dkm/train_DKMv3_outdoor.py", line 250, in train
wandb.log(megadense_benchmark.benchmark(model))
File "/mnt/data-disk-1/home/cpii.local/wtwang/IM/codes/DKM/dkm/benchmarks/megadepth_dense_benchmark.py", line 72, in benchmark
matches, certainty = model.match(im1, im2, batched=True)
File "/mnt/data-disk-1/home/cpii.local/wtwang/IM/codes/DKM/dkm/models/dkm.py", line 695, in match
dense_corresps = self.forward(batch, batched = True)
File "/mnt/data-disk-1/home/cpii.local/wtwang/IM/codes/DKM/dkm/models/dkm.py", line 631, in forward
dense_corresps = self.decoder(f_q_pyramid, f_s_pyramid)
File "/mnt/data-disk-1/home/cpii.local/wtwang/miniconda3/envs/im/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/mnt/data-disk-1/home/cpii.local/wtwang/IM/codes/DKM/dkm/models/dkm.py", line 494, in forward
new_stuff = self.gps[new_scale](f1_s, f2_s, dense_flow=dense_flow)
File "/mnt/data-disk-1/home/cpii.local/wtwang/miniconda3/envs/im/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/mnt/data-disk-1/home/cpii.local/wtwang/IM/codes/DKM/dkm/models/dkm.py", line 360, in forward
K_yy_inv = torch.linalg.inv(K_yy + sigma_noise)
torch._C._LinAlgError: linalg.inv: (Batch element 0): The diagonal element 512 is zero, the inversion could not be completed because the input matrix is singular.

@Parskatt
Copy link
Owner

I've never had this happen before. It should mean that the features from the encoder are extremely correlated. Is it a weird image pair?

@wtishere
Copy link
Author

Thanks for your reply. I have no idea whether it has a weird image. I used the megadepth dataset and followed your steps to form data structure. Do you have any idea to solve this problem?

@Parskatt
Copy link
Owner

This seems to happen during the benchmark. You should be able to see the names of the images being sent in. If so I can check if Im able to reproduce the issue.

Otherwise Im not sure how to help.

@MantangGuo
Copy link

MantangGuo commented Jan 31, 2024

I have encountered the same Error, also used megadepth for training.
But I got something new: the loss value became 0 at some step, and then the Error raised.
I think the loss value leads to the Error. But I do not know why loss value became zero suddenly.
The most wired thing is that: I run the code twice. At the first run , it goes without any error. But at the second run (the exactly same code and devices), the loss became 0 and there raised an Error: torch._C._LinAlgError: torch.linalg.inv: (Batch element 0): The diagonal element 1 is zero, the inversion could not be completed because the input matrix is singular.

@Parskatt
Copy link
Owner

If its the megadepth training set we don't use seeds so it might be different image pairs etc. You might reduce the risk of this happening by increasing the diagonal term that we add here

sigma_noise = self.sigma_noise * torch.eye(h2 * w2, device=x.device)[None, :, :]

@MantangGuo
Copy link

MantangGuo commented Jan 31, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants