You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Tabby currently shows every non-empty suggestion the model generates, regardless of how confident the model is in its output. Low-confidence suggestions tend to be low-quality — vague, repetitive, or contextually wrong — and showing them erodes user trust.
Goal
Add a configurable probability threshold that suppresses suggestions when the model's first-token confidence is too low. This gives Tabby a lightweight quality gate that works across models without prompt changes.
Proposed Scope
After sampling the first token, inspect its probability (softmax over logits).
If the top-1 probability is below a configurable threshold, abort the generation and suppress the suggestion silently.
Expose a settings toggle (default on) and a threshold slider or preset.
Log suppression events in debug mode with the token, its probability, and the threshold.
Acceptance Criteria
First-token probability is measurable after sampling.
Suggestions below the threshold are silently suppressed (no ghost text shown).
Threshold is configurable with a sensible default.
Debug mode logs which token was suppressed and its probability.
Settings toggle, default on.
No regression in suggestion latency when the gate does not fire.
Logit gating on first generated token #24 is a hard blocklist ("never start with these tokens"). This is a soft quality threshold ("don't show suggestions the model is uncertain about").
Both features use adjacent sampling infrastructure in LlamaRuntimeCore.
Applies to the Open Source engine only — Apple Intelligence does not expose token probabilities.
Open Questions
What is a good default threshold? Needs experimentation across models.
Should the threshold adapt per model, or is one global value sufficient?
Should suppression count toward a "model is struggling" heuristic that backs off debounce timing?
Problem
Tabby currently shows every non-empty suggestion the model generates, regardless of how confident the model is in its output. Low-confidence suggestions tend to be low-quality — vague, repetitive, or contextually wrong — and showing them erodes user trust.
Goal
Add a configurable probability threshold that suppresses suggestions when the model's first-token confidence is too low. This gives Tabby a lightweight quality gate that works across models without prompt changes.
Proposed Scope
Acceptance Criteria
Relationship to Other Work
LlamaRuntimeCore.Open Questions
Parent: #13