Before submitting your bug report
Relevant environment info
- OS: macOS
- Continue: 0.8.1
Description
Rejecting the LLM's response while it's streaming doesn't stop the LLM. The LLM still keeps generating the rest of the response (draining battery via high GPU usage).
To reproduce
- Ask a question from the LLM served by
llama.cpp.
- When it starts streaming, reject the answer via
CMD-BACKSPACE or your custom shortcut.
- The
llama.cpp server keeps generating the rest of the response even though you already rejected it and can't see the rest of the response.
Log output
No response
Before submitting your bug report
Relevant environment info
Description
Rejecting the LLM's response while it's streaming doesn't stop the LLM. The LLM still keeps generating the rest of the response (draining battery via high GPU usage).
To reproduce
llama.cpp.CMD-BACKSPACEor your custom shortcut.llama.cppserver keeps generating the rest of the response even though you already rejected it and can't see the rest of the response.Log output
No response