Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How should we handle skip connections properly? #12

Closed
qbx2 opened this issue Sep 26, 2018 · 3 comments
Closed

How should we handle skip connections properly? #12

qbx2 opened this issue Sep 26, 2018 · 3 comments

Comments

@qbx2
Copy link

qbx2 commented Sep 26, 2018

1a36696#diff-c7c21fc90a9f9340db7b45c70ef0e393R13

Could you please give me some hints?

Thank you

@qbx2 qbx2 changed the title How do we should handle skip connections properly? How should we handle skip connections properly? Sep 26, 2018
@qbx2 qbx2 closed this as completed Oct 22, 2018
@untom
Copy link
Member

untom commented Oct 22, 2018

Sorry for the very late reply. Dividing by 2 sounds like a good approach to me. Are you not seeing the distribution-behavior you're expecting? I.e., do activations not end up being a standard gaussian?

@gklambauer
Copy link
Member

Also sorry from my side for not seeing this earlier. We found that adjusting alpha and lambda to the new variance (mean=0, var=2) works really good!

@qbx2
Copy link
Author

qbx2 commented Oct 22, 2018

First, thanks for the reply.

Also sorry from my side for not seeing this earlier. We found that adjusting alpha and lambda to the new variance (mean=0, var=2) works really good!

@gklambauer The method sounds good and hard to handle when we stack more layers because the variance for the next layer would be 4, and 8 for the next layer. It may be a good approach for some cases, however.

If you haven't dealt with skip connections, I am trying to solve and write about it.

Thank you for hints again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants