When rejecting the LLM's response before it's finished, the `llama.cpp` server keeps generating tokens

### Before submitting your bug report

- [X] I believe this is a bug. I'll try to join the [Continue Discord](https://discord.gg/NWtdYexhMs) for questions
- [X] I'm not able to find an [open issue](https://github.com/continuedev/continue/issues?q=is%3Aopen+is%3Aissue) that reports the same bug
- [X] I've seen the [troubleshooting guide](https://continue.dev/docs/troubleshooting) on the Continue Docs

### Relevant environment info

```Markdown
- OS: macOS
- Continue: 0.8.1
```


### Description

Rejecting the LLM's response while it's streaming doesn't stop the LLM. The LLM still keeps generating the rest of the response (draining battery via high GPU usage).

### To reproduce

1. Ask a question from the LLM served by `llama.cpp`.
2. When it starts streaming, reject the answer via `CMD-BACKSPACE` or your custom shortcut.
3. The `llama.cpp` server keeps generating the rest of the response even though you already rejected it and can't see the rest of the response.


### Log output

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When rejecting the LLM's response before it's finished, the `llama.cpp` server keeps generating tokens #764

Before submitting your bug report

Relevant environment info

Description

To reproduce

Log output

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

When rejecting the LLM's response before it's finished, the llama.cpp server keeps generating tokens #764

Description

Before submitting your bug report

Relevant environment info

Description

To reproduce

Log output

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

When rejecting the LLM's response before it's finished, the `llama.cpp` server keeps generating tokens #764