Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Another interesting paper related to this idea #3

Open
jukofyork opened this issue Apr 22, 2024 · 3 comments
Open

Another interesting paper related to this idea #3

jukofyork opened this issue Apr 22, 2024 · 3 comments

Comments

@jukofyork
Copy link

Not all Layers of LLMs are Necessary during Inference (v2)

@shamanez
Copy link
Member

Interesting!

@jukofyork
Copy link
Author

jukofyork commented Apr 23, 2024

I've also read another paper along these lines too (but can't find it atm) that suggests the layers marked by the yellow areas at the bottom:

2

are just doing some kind of "averaging", and IIRC the paper suggests replacing these penultimate layers with some other much cheaper operation (IIRC, it was a training time modification rather than post-training though).

@jukofyork
Copy link
Author

jukofyork commented Apr 23, 2024

ShortGPT: Layers in Large Language Models are More Redundant Than You Expect

Can't find the paper about "averaging" though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants