Skip to content

Conversation

@pmeier
Copy link
Member

@pmeier pmeier commented Mar 11, 2024

This is a temporary fix for the Cohere assistants. Although in theory they support a 4k context window

_CONTEXT_SIZE: int = 4_000
# See https://docs.cohere.com/docs/models#command

they seem to be using a very different tokenizer than what we use. When trying the assistant in the web UI with a document that actually supply 4k tokens, I get the following error message in the response

too many tokens: size limit exceeded by 1502 tokens. Try using shorter or fewer inputs, or setting prompt_truncation='AUTO'.

So the number of tokens actually differs by almost 40%.

In this PR I simply reduce the default number of tokens that we pull from the source storage to feed into the assistant. Setting this automatically depending on the assistant is a larger task that I'll open an issue about. Plus, the tokenizer would actually need to be dependent on the assistant as well.

Functionally, this change should make too much of a difference. Unless the user has a very complex query, by default we are still going to pull in four sources. That should be sufficient.

@pmeier pmeier requested a review from nenb March 11, 2024 14:18
@pmeier pmeier merged commit 329a1a0 into main Mar 11, 2024
@pmeier pmeier deleted the default-num-tokens branch March 11, 2024 20:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants