New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Reduced max tokens of llama 30b to 1792 because of OOMs at 2048 #2411

Merged

yk merged 1 commit into main from llama-30b-1792

Apr 8, 2023

Collaborator

yk commented Apr 8, 2023

No description provided.


          Reduced max tokens of llama 30b to 1792 because of OOMs at 2048

0bcf4e2

yk requested review from andreaskoepf, melvinebenezer, olliestanley and AbdBarho as code owners

April 8, 2023 21:12

yk enabled auto-merge (squash)

April 8, 2023 21:12

yk added the inference label

andreaskoepf approved these changes

View reviewed changes

Collaborator

andreaskoepf left a comment

we need flash-attn for inference ..

yk merged commit 46521b4 into main

yk deleted the llama-30b-1792 branch

April 8, 2023 21:28

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

andreaskoepf andreaskoepf approved these changes

melvinebenezer Awaiting requested review from melvinebenezer melvinebenezer is a code owner

olliestanley Awaiting requested review from olliestanley olliestanley is a code owner

AbdBarho Awaiting requested review from AbdBarho AbdBarho is a code owner