You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sorry for the very late reply. Dividing by 2 sounds like a good approach to me. Are you not seeing the distribution-behavior you're expecting? I.e., do activations not end up being a standard gaussian?
Also sorry from my side for not seeing this earlier. We found that adjusting alpha and lambda to the new variance (mean=0, var=2) works really good!
@gklambauer The method sounds good and hard to handle when we stack more layers because the variance for the next layer would be 4, and 8 for the next layer. It may be a good approach for some cases, however.
If you haven't dealt with skip connections, I am trying to solve and write about it.
1a36696#diff-c7c21fc90a9f9340db7b45c70ef0e393R13
Could you please give me some hints?
Thank you
The text was updated successfully, but these errors were encountered: