Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

max token input token length #146

Closed
barthfab opened this issue Dec 10, 2022 · 7 comments
Closed

max token input token length #146

barthfab opened this issue Dec 10, 2022 · 7 comments
Labels
1day A problem that can be solved in a single day's time

Comments

@barthfab
Copy link

[in consultation with @mryab]

The max input token length is 2048 right now. It would be nice to process more than 2048 tokens through the distributed BLOOM. Increasing the max input token length would help me a lot in my research.

@mryab @borzunov @justheuristic

@justheuristic
Copy link
Collaborator

Sorry for taking so long to respond, we were a bit overwhelmed last week.

Can you please clarify: do you need >2048 tokens for both forward/backward or inference, or just inference? If inference, we can transition to 4096 or more tokens with a single line code change. If forward/backward, i can look into that and figure out how hard it's gonna be

@barthfab
Copy link
Author

Just inference is fine for in-context learning!
Thanks a lot

@justheuristic
Copy link
Collaborator

justheuristic commented Dec 26, 2022

#include stdsorryfortakingdaystorespond.h

We will increase it in the next major release (eta 1st-3rd jan) and post an update to this issue.

update: will take a bit longer - we need to get a few more things done in that release. We'll keep you updated in this issue.

@justheuristic justheuristic mentioned this issue Dec 31, 2022
32 tasks
@justheuristic justheuristic added the 1day A problem that can be solved in a single day's time label Jan 17, 2023
@Vectorrent
Copy link

I believe that I'm running into the same problem with the chat app. After a certain length, every conversation ends with the session crashing.

It doesn't appear that I can truncate conversations to "the most recent X number of characters/tokens," because history is saved within the open session (if I'm understanding correctly), and that's a Petals thing. There's nowhere in the chat app for me to fix this.

I'd be perfectly fine with chopping-off the beginning of the conversation history, to keep the total length under some maximum. I know it isn't ideal - but the user experience of "every conversation ending with a crash" is pretty bad, too.

Just posting my thoughts here, for posterity.

@borzunov
Copy link
Collaborator

Hi @barthfab @LuciferianInk,

We extended the context length to 8192 for the latest models that use multi-query attention (Llama 2, StableBeluga 2, CodeLlama, etc.). Feel free to reopen this if it is not enough and the issue is still relevant for you.

@TomExMachina
Copy link

TomExMachina commented Sep 4, 2023

Hi @barthfab @LuciferianInk,

We extended the context length to 8192 for the latest models that use multi-query attention (Llama 2, StableBeluga 2, CodeLlama, etc.). Feel free to reopen this if it is not enough and the issue is still relevant for you.

I think it's about to be an issue for me as I'm about to experiment with deploying some models that have 128k token length to petals. If len for inference is quick to modify, that's a start.

@TomExMachina
Copy link

No commts or PR were linked here or I might attempt a new PR myself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1day A problem that can be solved in a single day's time
Projects
None yet
Development

No branches or pull requests

5 participants