max token input token length #146

barthfab · 2022-12-10T11:45:30Z

[in consultation with @mryab]

The max input token length is 2048 right now. It would be nice to process more than 2048 tokens through the distributed BLOOM. Increasing the max input token length would help me a lot in my research.

@mryab @borzunov @justheuristic

justheuristic · 2022-12-21T18:31:51Z

Sorry for taking so long to respond, we were a bit overwhelmed last week.

Can you please clarify: do you need >2048 tokens for both forward/backward or inference, or just inference? If inference, we can transition to 4096 or more tokens with a single line code change. If forward/backward, i can look into that and figure out how hard it's gonna be

barthfab · 2022-12-22T11:51:37Z

Just inference is fine for in-context learning!
Thanks a lot

justheuristic · 2022-12-26T19:20:22Z

#include stdsorryfortakingdaystorespond.h

We will increase it in the next major release (eta ~~1st-3rd jan~~) and post an update to this issue.

update: will take a bit longer - we need to get a few more things done in that release. We'll keep you updated in this issue.

Vectorrent · 2023-02-19T18:11:22Z

I believe that I'm running into the same problem with the chat app. After a certain length, every conversation ends with the session crashing.

It doesn't appear that I can truncate conversations to "the most recent X number of characters/tokens," because history is saved within the open session (if I'm understanding correctly), and that's a Petals thing. There's nowhere in the chat app for me to fix this.

I'd be perfectly fine with chopping-off the beginning of the conversation history, to keep the total length under some maximum. I know it isn't ideal - but the user experience of "every conversation ending with a crash" is pretty bad, too.

Just posting my thoughts here, for posterity.

borzunov · 2023-08-30T04:06:35Z

Hi @barthfab @LuciferianInk,

We extended the context length to 8192 for the latest models that use multi-query attention (Llama 2, StableBeluga 2, CodeLlama, etc.). Feel free to reopen this if it is not enough and the issue is still relevant for you.

TomExMachina · 2023-09-04T20:52:55Z

Hi @barthfab @LuciferianInk,

We extended the context length to 8192 for the latest models that use multi-query attention (Llama 2, StableBeluga 2, CodeLlama, etc.). Feel free to reopen this if it is not enough and the issue is still relevant for you.

I think it's about to be an issue for me as I'm about to experiment with deploying some models that have 128k token length to petals. If len for inference is quick to modify, that's a start.

TomExMachina · 2023-09-04T20:56:47Z

No commts or PR were linked here or I might attempt a new PR myself.

justheuristic mentioned this issue Dec 31, 2022

Roadmap (tentative) #12

Open

32 tasks

justheuristic added the 1day A problem that can be solved in a single day's time label Jan 17, 2023

justheuristic mentioned this issue Jan 17, 2023

Exception after a dialogue reaches certain length petals-infra/chat.petals.dev#3

Closed

borzunov closed this as completed Aug 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

max token input token length #146

max token input token length #146

barthfab commented Dec 10, 2022

justheuristic commented Dec 21, 2022

barthfab commented Dec 22, 2022

justheuristic commented Dec 26, 2022 •

edited

Loading

Vectorrent commented Feb 19, 2023

borzunov commented Aug 30, 2023

TomExMachina commented Sep 4, 2023 •

edited

Loading

TomExMachina commented Sep 4, 2023

max token input token length #146

max token input token length #146

Comments

barthfab commented Dec 10, 2022

justheuristic commented Dec 21, 2022

barthfab commented Dec 22, 2022

justheuristic commented Dec 26, 2022 • edited Loading

Vectorrent commented Feb 19, 2023

borzunov commented Aug 30, 2023

TomExMachina commented Sep 4, 2023 • edited Loading

TomExMachina commented Sep 4, 2023

justheuristic commented Dec 26, 2022 •

edited

Loading

TomExMachina commented Sep 4, 2023 •

edited

Loading