About the alpha value #6

hbin0701 · 2021-11-19T16:26:05Z

Hello :)
Thank you for sharing this amazing code!

I'm running this code in colab (python3), and I'm changing several things to accomodate to my settings.
Everything works fine until now, except for when running script for training:

# in /models/iq.py ,
# the error occurs specifically in line 119,

eps = Variable((Normal(torch.zeros_like(mu).cuda(), self.alpha.data.pow(-1))).sample())

due to:

# line 82
self.alpha = nn.Parameter(torch.randn(z_size)) # may contain negative values.

I realized STD in Normal distribution should be positive, and along with this, gradient suddenly explodes, and loss becomes nan.

Thus, I introduced following lines of codes.

# Replace line 119 with:
d = self.alpha.data.pow(-1)
d = torch.nan_to_num(d.clamp(min=1e-4, max=2), 1e-4) # using 2 instead of 1e-4 for replacing nan causes gradient explosion.
eps = Variable((Normal(torch.zeros_like(mu).cuda(), d)).sample())

But still gradient suddenly explodes.
Do you have any suggestions?

Below is the training log.

Time: 1.2242, Epoch [0/15], Step [3470/5748], LR: 0.010000, Center-Loss: 2.8797, KL: 0.4320, I-recon: 0.5033, C-recon: 6.6174, C-cycle: 0.9655, Regularisation: 2.0211
Time: 1.2230, Epoch [0/15], Step [3480/5748], LR: 0.010000, Center-Loss: 20.7928, KL: 2370.2771, I-recon: 0.5432, C-recon: 186.4624, C-cycle: 1.1244, Regularisation: 2.0197
Time: 1.3085, Epoch [0/15], Step [3490/5748], LR: 0.010000, Center-Loss: 35342.1523, KL: 61860.1836, I-recon: 54.0946, C-recon: 491.9528, C-cycle: 1.8422, Regularisation: 2.7531
Time: 1.2436, Epoch [0/15], Step [3500/5748], LR: 0.010000, Center-Loss: 2750158.5000, KL: 20850.5977, I-recon: 4949.4941, C-recon: 1919.1410, C-cycle: 0.9973, Regularisation: 9.3367
Time: 1.3291, Epoch [0/15], Step [3510/5748], LR: 0.010000, Center-Loss: nan, KL: nan, I-recon: nan, C-recon: nan, C-cycle: 17.0083, Regularisation: nan
Time: 1.2298, Epoch [0/15], Step [3520/5748], LR: 0.010000, Center-Loss: nan, KL: nan, I-recon: nan, C-recon: nan, C-cycle: 2.0468, Regularisation: nan
Time: 1.2376, Epoch [0/15], Step [3530/5748], LR: 0.010000, Center-Loss: nan, KL: nan, I-recon: nan, C-recon: nan, C-cycle: 1.9318, Regularisation: nan

The text was updated successfully, but these errors were encountered:

mahmudhasankhan · 2023-03-30T22:43:32Z

@hbin0701 hello! Were you able to fix the gradient explosion part?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the alpha value #6

About the alpha value #6

hbin0701 commented Nov 19, 2021 •

edited

mahmudhasankhan commented Mar 30, 2023

About the alpha value #6

About the alpha value #6

Comments

hbin0701 commented Nov 19, 2021 • edited

mahmudhasankhan commented Mar 30, 2023

hbin0701 commented Nov 19, 2021 •

edited