Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a potential random generated related nan issue caused by 0/0 #1

Merged
merged 1 commit into from
Oct 16, 2023

Conversation

CielAl
Copy link
Contributor

@CielAl CielAl commented Aug 18, 2023

In the update_dict method in https://github.com/cwlkr/torchvahadane/blob/main/torchvahadane/dict_learning.py#L100, when the atomic norm is too small, the corresponding atom is re-initialized with tensor.normal_.
However, if all numbers drawn are negative, the following in-place clamp will yield a zero vector, and in the in-place normalization
of dictionary[:, k] /= dictionary[:, k].norm() a nan vector will be generated and this will invalidate the whole computation procedure.

Herein two layers of protection is added:
(1) make the re-initialized atom non-negative by adding abs()
(2) add a small eps torch.finfo(torch.float32).eps to the norm so the denominator (norm) will always be non-zero.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants