Skip to content

Explore streaming or chunk-based autocomplete generation #11

@FuJacob

Description

@FuJacob

Problem

Tabby currently waits for a completed model response before showing a suggestion. That can make autocomplete feel slower than necessary, especially with larger local models or longer suggestion length settings.

Goal

Explore whether Tabby can stream or incrementally surface useful completion chunks before the full generation is finished.

Proposed Scope

  • Investigate whether the Open Source runtime can expose token streaming from LlamaRuntimeCore.
  • Investigate what, if anything, Apple Intelligence exposes for incremental responses.
  • Define a safe partial-output normalization strategy so incomplete text is not shown in awkward states.
  • Decide when the overlay should first appear: after first word, first stable phrase, or a latency threshold.
  • Ensure cancellation and stale-result handling work while streaming.
  • Preserve partial acceptance semantics if the user presses accept before generation fully completes.

Acceptance Criteria

  • There is a documented recommendation for streaming/chunked generation feasibility by engine.
  • If feasible for Open Source, the runtime can emit incremental chunks to the coordinator.
  • The overlay can show stable partial suggestions without waiting for full completion.
  • Stale generations and focus changes cancel active streams safely.
  • Streaming does not regress output normalization or acceptance behavior.

Open Questions

  • Should streaming be engine-specific or hidden behind one shared SuggestionGenerating interface?
  • What minimum chunk is stable enough to show?
  • Should the app keep generating after the user accepts the first shown chunk?
  • Does streaming meaningfully improve perceived latency with current model sizes?

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:runtimellama.cpp wrapper, KV cache, sampling, downloadsenhancementNew feature or request

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions