You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Nov 21, 2025. It is now read-only.
Hello, I read the source code of this project and I have two questions, the first one is: Are the a1 and a2 operations omitted in the Transformer block, I don't seem to see them, the second one is why three linear layers + activation function are used in the Pointwise Feedforward, and also after Transformer block is the Layer Norm and Linear and Reshape omitted.