Skip to content

Chat is very slow when using with the llama.cpp server #2845

@lehuythangit

Description

@lehuythangit

Validations

  • I believe this is a way to improve. I'll try to join the Continue Discord for questions
  • I'm not able to find an open issue that requests the same enhancement

Problem

Chat is very slow when using with the llama.cpp server when increasing the message because missing cache_prompt = true when call the /completion API of llama.cpp, so llama.cpp process also all previous message history when prompt instead of using from cache.

Solution

Please help add more property cache_prompt = true when call the /completion API,
or add more configuration property into the config.json

Thanks

Metadata

Metadata

Labels

area:chatRelates to chat interfacearea:configurationRelates to configuration options

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions