Skip to content

Layer Normalization #11

Answered by Sengxian
conceptofmind asked this question in Q&A
Aug 18, 2022 · 1 comments · 3 replies
Discussion options

You must be logged in to vote

Hello @conceptofmind, thank you for your attention!

  1. We use LayerNorm with bias.
  2. Our implementation of DeepNorm generally follows its paper: xavier normal initialization with a $(2N)^{-1/2}$ scaling factor is applied to ffn, v_proj, out_proj.

Replies: 1 comment 3 replies

Comment options

You must be logged in to vote
3 replies
@conceptofmind
Comment options

@Sengxian
Comment options

@conceptofmind
Comment options

Answer selected by conceptofmind
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants