fix(vocab-quiz): tighten rotation curve for fresh words (#191)#198
Merged
davidortinau merged 5 commits intomainfrom May 3, 2026
Merged
fix(vocab-quiz): tighten rotation curve for fresh words (#191)#198davidortinau merged 5 commits intomainfrom
davidortinau merged 5 commits intomainfrom
Conversation
Stream B Step 1 (Jayne). Adds 4 integration tests that pin down the expected post-state of VocabularyProgress after well-defined quiz interactions, run against a real EF Core + in-memory SQLite stack via PlanGenerationTestFixture (same pattern as MasteryAlgorithmIntegrationTests). #189 — Attempt counting / accuracy: Repro189_SingleCorrectRecognitionAttempt_ProducesExpectedPanelState — PASS Repro189_SingleCorrectRecognition_LegacyProductionFieldsRemainZero — PASS Both pass on main, which proves the ProgressService math is correct. Captain's '2 production attempts / 50% accuracy' panel reading therefore points at the UI panel reading legacy/wrong fields or a duplicate-call path — fix belongs in Stream A (Kaylee), not the service. Tests stay as regression guards for the service contract. #191 — Latter rounds rapidly empty: Repro191_NewWord_AllCorrect_DoesNotRotateOutBeforeFifthTurn — FAIL on main Repro191_CharacterizeCurrentBehavior_FreshWordRotatesAtTurnN — PASS (snapshot) Captured failure: a brand-new word receiving 4 all-correct answers (3 MC followed by 1 Text — which is the mode the quiz auto-selects once CurrentStreak >= 3) flips ReadyToRotateOut=True at turn 4. VocabularyQuizItem Tier 2 (mastery>=0.50 OR streak>=3, plus only SessionCorrectCount>=2 and SessionTextCorrect>=1) is the trigger. This is the over-aggressive rotation #191 describes. Test will pass after Wash tightens the Tier 2 gates. No production code changes.
Closes #191. Fresh words were rotating out of quiz rounds at turn 4 with all-correct answers, yielding only ~3 effective practice repetitions before the word disappeared. Two knobs are tuned to push the earliest legal rotation to turn 5 without regressing already-known words. Production changes (2 lines): 1. VocabularyProgressService.cs: EFFECTIVE_STREAK_DIVISOR 7.0f -> 12.0f Slows the mastery climb so MasteryScore reaches Tier 1 (>= 0.80) on turn 8+ rather than turn 6, and crosses the 0.50 promotion floor on turn 6 rather than turn 4. 2. VocabularyQuizItem.cs: Tier 2 trigger OR -> AND, floor (2,1) -> (4,2) - Trigger: mastery >= 0.50 && streak >= 3 (was OR). Closes a corner case where a single Text correct on a fresh word could drop the word into Tier 2 via streak alone. - Floor: SessionCorrectCount >= 4 && SessionTextCorrect >= 2 (was >= 2 && >= 1). Requires demonstrably more session evidence before a mid-mastery word is allowed to rotate out. Simulator: tools/quiz-rotation-sim/sim.py reproduces production math exactly. Headline (fresh, all-correct): | Turn | Current (/7, OR/2,1) | Proposed (/12, AND/4,2) | |------|---------------------|--------------------------| | 4 | mastery 0.714 -> ROTATES (bug) | mastery 0.417, no | | 5 | mastery 1.000 | mastery 0.583 -> ROTATES | Already-known words (mastery >= 0.80, streak >= 8) still rotate at the first qualifying turn (Tier 1 unchanged). Existing user MasteryScore data cannot regress: mastery is monotonic on correct (`max(streakScore, mastery)` in RecordAttemptAsync line 154). Tests: - Jayne's Repro191_NewWord_AllCorrect_DoesNotRotateOutBeforeFifthTurn flips FAIL -> PASS (PR #195 verification harness). - ~10 mastery-math fixtures bumped to track the new divisor (5 MC + 2 Text -> 8 MC + 2 Text for IsKnown demonstrations; divisor literals /7.0f -> /12.0f). - VocabQuizFilteringTests: Tier 2 floor test renamed and a new test Tier2_TriggerRequiresBothMasteryAndStreak added for the AND change. - All 520 unit tests pass. Language-tutor SLA review approved the turn-5 floor (vs turn-6) as the right balance between learner spaced-repetition load and within-session retention demonstration. Follow-up (separate issue, not in this PR): decouple MasteryScore from SessionRotationReady so session pacing and long-term mastery tracking are independent levers. Branched off PR #195 (Jayne's repro) so the fix lands together with its verification harness.
3 tasks
davidortinau
added a commit
that referenced
this pull request
May 3, 2026
- PR #196 (Stream A UI fixes): closes #189/#190/#192/#193/#194 - PR #198 (Stream B scoring fix): closes #191 - PR #195 (test-only draft): superseded, closed - Follow-ups filed: #197 (decouple Mastery from SessionRotation), #199 (test helper DifficultyWeight bug) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #191.
Summary
Fresh words were rotating out of vocab quiz rounds at turn 4 with all-correct answers — too fast for practice intent. This PR pushes the earliest legal rotation for a fresh word to turn 5 without regressing already-known words. Two knobs are tuned.
Production changes (2 lines)
VocabularyProgressService.cs:21EFFECTIVE_STREAK_DIVISOR7.0f→12.0fVocabularyQuizItem.cs:33-55OR→AND, floor(2,1)→(4,2)Why two knobs. The divisor change alone slows mastery growth but Tier 2's
ORtrigger plus weak floor still let a fresh word slip out via a single Text correct + streak alone. The Tier 2 tightening closes that escape hatch.Simulator (fresh word, all-correct turns)
tools/quiz-rotation-sim/sim.pyreproduces production math exactly. Run:python3 tools/quiz-rotation-sim/sim.py.Already-known words unchanged. Mastery ≥ 0.80 + streak ≥ 8 still hits Tier 1 with a single text correct (Tier 1 logic unchanged). No spaced-repetition penalty for words already mastered.
No user-data regression. Stored
MasteryScorecannot decrease from the divisor change because mastery is monotonic on correct:max(streakScore, mastery)atVocabularyProgressService.cs:154. Only words mid-climb grow more slowly going forward — which is the intent.Test impact
Repro191_NewWord_AllCorrect_DoesNotRotateOutBeforeFifthTurn, from PR test: failing repro for vocab quiz scoring bugs (#189, #191) #195): FAIL → PASS ✅. This PR is branched offtest/vocab-quiz-scoring-repro-189-191so the fix lands together with its verification harness.IsKnown; divisor literals/7.0f→/12.0f).Tier2_TriggerRequiresBothMasteryAndStreakcovers the OR → AND change.Tier2_MidMastery_Rotates_2CorrectWith1Text→Tier2_MidMastery_Rotates_4CorrectWith2Text, plusTier2_MidMastery_BlockedByLowSessionCorrectfor the new floor.Captain & SLA review
Captain approved the two-knob proposal (full markdown lives in
.squad/decisions/inbox/wash-vocab-quiz-scoring-proposal-191.md, gitignored). Language-tutor SLA review chose the turn-5 floor over a more aggressive turn-6 floor as the right balance between within-session retention demonstration and learner spaced-repetition load.Out of scope (separate issue)
MasteryScorefromSessionRotationReadyso session pacing and long-term mastery tracking become independent levers — the higher-leverage architectural fix the tutor flagged. Tracked in Decouple MasteryScore from SessionRotationReady (cross-session evidence requirement) #197 — out of scope for this PR.ProgressServiceare not touched (sync compat).Manual verification before merge
Recommend a Mac Catalyst smoke per
.claude/skills/e2e-testing/references/quiz-activities.md: load a fresh word, answer correctly several turns in a row, confirm rotation timing matches the simulator (rotates at turn 5, not turn 4) and Learning Details panel reflects the slower mastery climb.CI note
Pre-existing Linux MAUI/wasm-tools workload install failure on
mainis unrelated; per Captain's standing order,gh pr merge --adminis authorized if all unit tests are green and only that workload step fails.