Fix a potential random generated related nan issue caused by 0/0 #1

CielAl · 2023-08-18T16:31:36Z

In the update_dict method in https://github.com/cwlkr/torchvahadane/blob/main/torchvahadane/dict_learning.py#L100, when the atomic norm is too small, the corresponding atom is re-initialized with tensor.normal_.
However, if all numbers drawn are negative, the following in-place clamp will yield a zero vector, and in the in-place normalization
of dictionary[:, k] /= dictionary[:, k].norm() a nan vector will be generated and this will invalidate the whole computation procedure.

Herein two layers of protection is added:
(1) make the re-initialized atom non-negative by adding abs()
(2) add a small eps torch.finfo(torch.float32).eps to the norm so the denominator (norm) will always be non-zero.

Fix a potential random generated related nan issue caused by 0/0

dfbfb71

CielAl mentioned this pull request Aug 18, 2023

Potential nan in dictionary update steps caused by 0/0 #2

Closed

cwlkr merged commit 8058800 into cwlkr:main Oct 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix a potential random generated related nan issue caused by 0/0 #1

Fix a potential random generated related nan issue caused by 0/0 #1

CielAl commented Aug 18, 2023

Fix a potential random generated related nan issue caused by 0/0 #1

Fix a potential random generated related nan issue caused by 0/0 #1

Conversation

CielAl commented Aug 18, 2023