You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I looked into the references in the corresponding paper but
Gal and Ghahramani (2016) are not using it and a Merity et al. (2018) are just stating that they do it. So I wonder what the idea behind the step is.
The text was updated successfully, but these errors were encountered:
I found the answer. Scaling is to account for the change in magnitude of activations when dropout is applied and seems to be the standard way to do dropout.
Hello,
why is the embedded dropout mask weighted by
/ (1 - dropout)
?hedwig/models/reg_lstm/embed_regularize.py
Line 38 in 98634d3
I looked into the references in the corresponding paper but
Gal and Ghahramani (2016) are not using it and a Merity et al. (2018) are just stating that they do it. So I wonder what the idea behind the step is.
The text was updated successfully, but these errors were encountered: