New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
max token input token length #146
Comments
Sorry for taking so long to respond, we were a bit overwhelmed last week. Can you please clarify: do you need >2048 tokens for both forward/backward or inference, or just inference? If inference, we can transition to 4096 or more tokens with a single line code change. If forward/backward, i can look into that and figure out how hard it's gonna be |
Just inference is fine for in-context learning! |
#include stdsorryfortakingdaystorespond.h We will increase it in the next major release (eta update: will take a bit longer - we need to get a few more things done in that release. We'll keep you updated in this issue. |
I believe that I'm running into the same problem with the chat app. After a certain length, every conversation ends with the session crashing. It doesn't appear that I can truncate conversations to "the most recent X number of characters/tokens," because history is saved within the open session (if I'm understanding correctly), and that's a Petals thing. There's nowhere in the chat app for me to fix this. I'd be perfectly fine with chopping-off the beginning of the conversation history, to keep the total length under some maximum. I know it isn't ideal - but the user experience of "every conversation ending with a crash" is pretty bad, too. Just posting my thoughts here, for posterity. |
Hi @barthfab @LuciferianInk, We extended the context length to 8192 for the latest models that use multi-query attention (Llama 2, StableBeluga 2, CodeLlama, etc.). Feel free to reopen this if it is not enough and the issue is still relevant for you. |
I think it's about to be an issue for me as I'm about to experiment with deploying some models that have 128k token length to petals. If len for inference is quick to modify, that's a start. |
No commts or PR were linked here or I might attempt a new PR myself. |
[in consultation with @mryab]
The max input token length is 2048 right now. It would be nice to process more than 2048 tokens through the distributed BLOOM. Increasing the max input token length would help me a lot in my research.
@mryab @borzunov @justheuristic
The text was updated successfully, but these errors were encountered: