Skip to content

When rejecting the LLM's response before it's finished, the llama.cpp server keeps generating tokens #764

@ibehnam

Description

@ibehnam

Before submitting your bug report

Relevant environment info

- OS: macOS
- Continue: 0.8.1

Description

Rejecting the LLM's response while it's streaming doesn't stop the LLM. The LLM still keeps generating the rest of the response (draining battery via high GPU usage).

To reproduce

  1. Ask a question from the LLM served by llama.cpp.
  2. When it starts streaming, reject the answer via CMD-BACKSPACE or your custom shortcut.
  3. The llama.cpp server keeps generating the rest of the response even though you already rejected it and can't see the rest of the response.

Log output

No response

Metadata

Metadata

Assignees

Labels

kind:bugIndicates an unexpected problem or unintended behavior

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions