Skip to content

Log first-token top-1 probability distribution for confidence-gate threshold tuning #98

@Jam-Cai

Description

@Jam-Cai

Problem

#81 (PR #88) ships first-token confidence suppression with a default threshold of `0.10`. That number is a guess — there is no field distribution of "what does the top-1 raw-logit softmax look like on real prompts?" to anchor it.

Proposal

Add a debug-only mode (settings toggle or env var, behind `#if DEBUG` or a hidden defaults key) that emits one log line per generation with the first-token top-1 probability and the token string, regardless of whether suppression fires.

Then dogfood for a session, plot the histogram, and pick the threshold where the long tail of "junk continuations" starts.

Why it matters

Right now the gate is opt-in with a guess threshold; even users who turn it on won't know if 0.10 is too lenient or too strict. We need one short telemetry pass to set a defensible default before considering default-on.

Acceptance

  • Debug-only logging path that emits top-1 probability + token for every first-token sample.
  • A short writeup of the observed distribution in the PR description or a follow-up comment on Confidence-based suggestion suppression #81.
  • Updated default threshold (or a justified "keep at 0.10") committed as a follow-up.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:runtimellama.cpp wrapper, KV cache, sampling, downloadsenhancementNew feature or request

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions