[Feature] Cancel generation mid-token-loop for faster re-prediction

## Summary

Add cooperative cancellation to the `generate()` token loop in `LlamaRuntimeCore` so that when the user types during active generation, the stale prediction is abandoned immediately instead of running to completion.

## Problem

When the user types a character while a suggestion is being generated, the old Swift `Task` is cancelled and a new prediction is queued. However, `generate()` (LlamaRuntimeCore.swift:192–217) never checks `Task.isCancelled`, so the loop continues sampling up to **30 tokens** (for the default 12–20 word preset) on a result that will be discarded.

Because `LlamaRuntimeCore` is an actor, the new generation request is **queued behind the old one** — it cannot start until the wasted loop finishes. On smaller machines this adds noticeable latency between keystrokes and the next suggestion appearing.

## Proposed direction

Add `if Task.isCancelled { break }` at the top of the token-sampling loop in `generate()`, matching the pattern already used in `summarize()` (LlamaRuntimeCore.swift:698–716). The existing `defer` cleanup block already handles partial generation correctly, so no additional teardown is needed.

```swift
for _ in 0 ..< options.maxPredictionTokens {
    if Task.isCancelled { break }          // ← add this
    let nextToken = llama_sampler_sample(sampler, context, -1)
    // ...
}
```

## Additional context

- `summarize()` in the same file already implements this exact pattern with a comment explaining cooperative cancellation.
- `SuggestionWorkController` already cancels the old task and bumps a monotonic `workID`, so stale results are rejected downstream — this change just stops burning GPU/CPU cycles on them.
- Highest ROI of the latency-related improvements: one line, zero architectural risk.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] Cancel generation mid-token-loop for faster re-prediction #207

Summary

Problem

Proposed direction

Additional context

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

[Feature] Cancel generation mid-token-loop for faster re-prediction #207

Description

Summary

Problem

Proposed direction

Additional context

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions