Skip to content

Confidence-based suggestion suppression #81

@Jam-Cai

Description

@Jam-Cai

Problem

Tabby currently shows every non-empty suggestion the model generates, regardless of how confident the model is in its output. Low-confidence suggestions tend to be low-quality — vague, repetitive, or contextually wrong — and showing them erodes user trust.

Goal

Add a configurable probability threshold that suppresses suggestions when the model's first-token confidence is too low. This gives Tabby a lightweight quality gate that works across models without prompt changes.

Proposed Scope

  • After sampling the first token, inspect its probability (softmax over logits).
  • If the top-1 probability is below a configurable threshold, abort the generation and suppress the suggestion silently.
  • Expose a settings toggle (default on) and a threshold slider or preset.
  • Log suppression events in debug mode with the token, its probability, and the threshold.

Acceptance Criteria

  • First-token probability is measurable after sampling.
  • Suggestions below the threshold are silently suppressed (no ghost text shown).
  • Threshold is configurable with a sensible default.
  • Debug mode logs which token was suppressed and its probability.
  • Settings toggle, default on.
  • No regression in suggestion latency when the gate does not fire.

Relationship to Other Work

  • Sibling to Logit gating on first generated token #24 (logit gating on first token) — both live under the Quality epic.
  • Logit gating on first generated token #24 is a hard blocklist ("never start with these tokens"). This is a soft quality threshold ("don't show suggestions the model is uncertain about").
  • Both features use adjacent sampling infrastructure in LlamaRuntimeCore.
  • Applies to the Open Source engine only — Apple Intelligence does not expose token probabilities.

Open Questions

  • What is a good default threshold? Needs experimentation across models.
  • Should the threshold adapt per model, or is one global value sufficient?
  • Should suppression count toward a "model is struggling" heuristic that backs off debounce timing?

Parent: #13

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:runtimellama.cpp wrapper, KV cache, sampling, downloadsenhancementNew feature or request

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions