Explore streaming or chunk-based autocomplete generation

## Problem

Tabby currently waits for a completed model response before showing a suggestion. That can make autocomplete feel slower than necessary, especially with larger local models or longer suggestion length settings.

## Goal

Explore whether Tabby can stream or incrementally surface useful completion chunks before the full generation is finished.

## Proposed Scope

- Investigate whether the Open Source runtime can expose token streaming from `LlamaRuntimeCore`.
- Investigate what, if anything, Apple Intelligence exposes for incremental responses.
- Define a safe partial-output normalization strategy so incomplete text is not shown in awkward states.
- Decide when the overlay should first appear: after first word, first stable phrase, or a latency threshold.
- Ensure cancellation and stale-result handling work while streaming.
- Preserve partial acceptance semantics if the user presses accept before generation fully completes.

## Acceptance Criteria

- There is a documented recommendation for streaming/chunked generation feasibility by engine.
- If feasible for Open Source, the runtime can emit incremental chunks to the coordinator.
- The overlay can show stable partial suggestions without waiting for full completion.
- Stale generations and focus changes cancel active streams safely.
- Streaming does not regress output normalization or acceptance behavior.

## Open Questions

- Should streaming be engine-specific or hidden behind one shared `SuggestionGenerating` interface?
- What minimum chunk is stable enough to show?
- Should the app keep generating after the user accepts the first shown chunk?
- Does streaming meaningfully improve perceived latency with current model sizes?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Explore streaming or chunk-based autocomplete generation #11

Problem

Goal

Proposed Scope

Acceptance Criteria

Open Questions

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Explore streaming or chunk-based autocomplete generation #11

Description

Problem

Goal

Proposed Scope

Acceptance Criteria

Open Questions

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions