Validations
Problem
Chat is very slow when using with the llama.cpp server when increasing the message because missing cache_prompt = true when call the /completion API of llama.cpp, so llama.cpp process also all previous message history when prompt instead of using from cache.
Solution
Please help add more property cache_prompt = true when call the /completion API,
or add more configuration property into the config.json
Thanks
Validations
Problem
Chat is very slow when using with the llama.cpp server when increasing the message because missing cache_prompt = true when call the /completion API of llama.cpp, so llama.cpp process also all previous message history when prompt instead of using from cache.
Solution
Please help add more property cache_prompt = true when call the /completion API,
or add more configuration property into the config.json
Thanks