Skip to content

Question: How to keep longer conversations without breaking the code? #1139

@zah-tane

Description

@zah-tane

I want to have longer conversations with the model, but as I understand, the number of tokens in the prompt message + the number of tokens generated by the model should be less than the context size (n_ctx). As the conversation develops, more tokens are stored in messages, which would eventually result in the following error:

ValueError: Requested tokens (2083) exceed context window of 2048

I am new to all this, but I think the way it is handled in llama.cpp is that it is managed with the n_keep parameter (here is a helpful discussion). However, I can't find how to avoid the mentioned error using llama-cpp-python. I think the chat context should be reset at some point, because even if we use n_keep (which I don't know how to do using this repo), the context window would still get filled up.

So, my question is how can I handle longer chats when using create_chat_completion and avoid getting the mentioned error using some functionality similar to n_keep combined with resetting total_tokens and emptying the current context when filled up?

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions