Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SiLU FFN #1

Open
Ricardokevins opened this issue Mar 1, 2024 · 2 comments
Open

SiLU FFN #1

Ricardokevins opened this issue Mar 1, 2024 · 2 comments

Comments

@Ricardokevins
Copy link

LLaMA2/model.py

Line 216 in 5716de4

x_V = self.fc3(swish)

in LlaMA2 source code, they obtain 'X_V' with the origin 'x', instead of 'swish'

@Ricardokevins
Copy link
Author

Ricardokevins commented Mar 1, 2024

https://github.com/facebookresearch/llama/blob/6796a91789335a31c8309003339fe44e2fd345c2/llama/model.py#L348

    def forward(self, x):
        return self.w2(F.silu(self.w1(x)) * self.w3(x))

@viai957
Copy link

viai957 commented Apr 20, 2024

in LlaMA2 source code, they obtain 'X_V' with the origin 'x', instead of 'swish' its the same but in here the 'swish' function with beta =1 is nothing but SiLU (Sigmoid Linear Unit) it just spits into different components here just for easier understanding.
This code is a replica of Umar Jamil explaining coding Llama 2 from scratch here is the link to the video : https://www.youtube.com/watch?v=oM4VmoabDAI&t=7212s&ab_channel=UmarJamil

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants