feat(dictionary): graduated trust for non-dictionary learned words (#39)#64
Merged
Conversation
A learned word in no real dictionary (main/contacts/apps/personal) could out-rank a real dictionary word with better geometry after a single misfire. Now, as the LAST ranking step in getSuggestionResults (after session boost, so it can't be undone), an uncurated USER_HISTORY candidate that still outscores the best real-dictionary candidate is CAPPED just below it — until its user-history frequency crosses a confirmation threshold (~3 repetitions). This guarantees a one-off junk word can't hijack a real word regardless of native score magnitude, while a deliberately repeated new word still learns and keeps full score. When no real candidate exists, new words are left untouched (still offerable). - Capping (not score-scaling) avoids any dependence on native score calibration or sign; applied post-session-boost so the boost can't re-promote junk. - Decision is a pure companion helper (shouldPenalizeUnconfirmedWord), unit-tested; uncurated check reuses isInNonHistoryDictionary; threshold is a tunable constant. - Gated by new pref PREF_GRADUATED_TRUST (default on). The actual ranking effect needs the native scorer, so the threshold and the 'real candidate' set want on-device playtesting.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implements #39 (C4-smart) — completes the C4 dictionary epic (#18) on top of the #59 flag/Add/Block UI.
Problem
A learned word in no real dictionary (main/contacts/apps/personal) could out-rank a real dictionary word with better geometry after a single misfire — the "junk hijacks a real word" bug, worst in glide mode.
Fix (graduated trust)
As the last ranking step in
getSuggestionResults(after session boost, so it can't be undone): an uncuratedUSER_HISTORYcandidate that still outscores the best real-dictionary candidate is capped just below it, until its user-history frequency crosses a confirmation threshold (~3 repetitions). So:Details
shouldPenalizeUnconfirmedWord(uncurated, freq), unit-tested; uncurated reusesisInNonHistoryDictionary(from fix(dictionary): stop deleted/junk words from resurrecting (incl. via swipe) #43); threshold (120 ≈ 3 uses) is a documented tunable constant.isInNonHistoryDictionary/getFrequency) run only for history candidates that actually outscore a real word — a small subset, gated behind cheap checks.PREF_GRADUATED_TRUST(default on, Settings → Dictionaries).Verification
:app:testOfflineRunTestsUnitTest= 229 tests, 4 failed (all the Windows-only ParserTest set; pass on Linux CI) — zero new failures; newgraduatedTrusttest passes.Base
dev, leaving for review + merge.