Log first-token top-1 probability distribution for confidence-gate threshold tuning

## Problem

#81 (PR #88) ships first-token confidence suppression with a default threshold of \`0.10\`. That number is a guess — there is no field distribution of "what does the top-1 raw-logit softmax look like on real prompts?" to anchor it.

## Proposal

Add a debug-only mode (settings toggle or env var, behind \`#if DEBUG\` or a hidden defaults key) that emits one log line per generation with the first-token top-1 probability and the token string, regardless of whether suppression fires.

Then dogfood for a session, plot the histogram, and pick the threshold where the long tail of "junk continuations" starts.

## Why it matters

Right now the gate is opt-in with a guess threshold; even users who turn it on won't know if 0.10 is too lenient or too strict. We need one short telemetry pass to set a defensible default before considering default-on.

## Acceptance

- Debug-only logging path that emits top-1 probability + token for every first-token sample.
- A short writeup of the observed distribution in the PR description or a follow-up comment on #81.
- Updated default threshold (or a justified "keep at 0.10") committed as a follow-up.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Log first-token top-1 probability distribution for confidence-gate threshold tuning #98

Problem

Proposal

Why it matters

Acceptance

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Log first-token top-1 probability distribution for confidence-gate threshold tuning #98

Description

Problem

Proposal

Why it matters

Acceptance

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions