Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the alpha value #6

Open
hbin0701 opened this issue Nov 19, 2021 · 1 comment
Open

About the alpha value #6

hbin0701 opened this issue Nov 19, 2021 · 1 comment

Comments

@hbin0701
Copy link

hbin0701 commented Nov 19, 2021

Hello :)
Thank you for sharing this amazing code!

I'm running this code in colab (python3), and I'm changing several things to accomodate to my settings.
Everything works fine until now, except for when running script for training:

# in /models/iq.py ,
# the error occurs specifically in line 119,

eps = Variable((Normal(torch.zeros_like(mu).cuda(), self.alpha.data.pow(-1))).sample())

due to:

# line 82
self.alpha = nn.Parameter(torch.randn(z_size)) # may contain negative values.

I realized STD in Normal distribution should be positive, and along with this, gradient suddenly explodes, and loss becomes nan.

Thus, I introduced following lines of codes.

# Replace line 119 with:
d = self.alpha.data.pow(-1)
d = torch.nan_to_num(d.clamp(min=1e-4, max=2), 1e-4) # using 2 instead of 1e-4 for replacing nan causes gradient explosion.
eps = Variable((Normal(torch.zeros_like(mu).cuda(), d)).sample())

But still gradient suddenly explodes.
Do you have any suggestions?

Below is the training log.

Time: 1.2242, Epoch [0/15], Step [3470/5748], LR: 0.010000, Center-Loss: 2.8797, KL: 0.4320, I-recon: 0.5033, C-recon: 6.6174, C-cycle: 0.9655, Regularisation: 2.0211
Time: 1.2230, Epoch [0/15], Step [3480/5748], LR: 0.010000, Center-Loss: 20.7928, KL: 2370.2771, I-recon: 0.5432, C-recon: 186.4624, C-cycle: 1.1244, Regularisation: 2.0197
Time: 1.3085, Epoch [0/15], Step [3490/5748], LR: 0.010000, Center-Loss: 35342.1523, KL: 61860.1836, I-recon: 54.0946, C-recon: 491.9528, C-cycle: 1.8422, Regularisation: 2.7531
Time: 1.2436, Epoch [0/15], Step [3500/5748], LR: 0.010000, Center-Loss: 2750158.5000, KL: 20850.5977, I-recon: 4949.4941, C-recon: 1919.1410, C-cycle: 0.9973, Regularisation: 9.3367
Time: 1.3291, Epoch [0/15], Step [3510/5748], LR: 0.010000, Center-Loss: nan, KL: nan, I-recon: nan, C-recon: nan, C-cycle: 17.0083, Regularisation: nan
Time: 1.2298, Epoch [0/15], Step [3520/5748], LR: 0.010000, Center-Loss: nan, KL: nan, I-recon: nan, C-recon: nan, C-cycle: 2.0468, Regularisation: nan
Time: 1.2376, Epoch [0/15], Step [3530/5748], LR: 0.010000, Center-Loss: nan, KL: nan, I-recon: nan, C-recon: nan, C-cycle: 1.9318, Regularisation: nan
@mahmudhasankhan
Copy link

@hbin0701 hello! Were you able to fix the gradient explosion part?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants