Chat is very slow when using with the llama.cpp server

### Validations

- [ ] I believe this is a way to improve. I'll try to join the [Continue Discord](https://discord.gg/NWtdYexhMs) for questions
- [X] I'm not able to find an [open issue](https://github.com/continuedev/continue/issues?q=is%3Aopen+is%3Aissue+label%3Aenhancement) that requests the same enhancement

### Problem

Chat is very slow when using with the llama.cpp server when increasing the message because missing cache_prompt = true when call the /completion API of llama.cpp, so llama.cpp process also all previous message history when prompt instead of using from cache.

### Solution

Please help add more property cache_prompt = true when call the /completion API,
or add more configuration property into the config.json

Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chat is very slow when using with the llama.cpp server #2845

Validations

Problem

Solution

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Chat is very slow when using with the llama.cpp server #2845

Description

Validations

Problem

Solution

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions