Skip to content
This repository was archived by the owner on Nov 21, 2025. It is now read-only.
This repository was archived by the owner on Nov 21, 2025. It is now read-only.

About Transformer model details #143

@tanggang1997

Description

@tanggang1997

Hello, I read the source code of this project and I have two questions, the first one is: Are the a1 and a2 operations omitted in the Transformer block, I don't seem to see them, the second one is why three linear layers + activation function are used in the Pointwise Feedforward, and also after Transformer block is the Layer Norm and Linear and Reshape omitted.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions