Fix fair-use enforcement for free-tier users with exhausted credits#6084
Fix fair-use enforcement for free-tier users with exhausted credits#6084
Conversation
Free users with exhausted credits now bypass the LLM classifier score requirement and escalate on violation count alone. Adds is_free_credits_exhausted() helper and synthetic classifier path that skips the expensive LLM call. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When a free-tier user has exhausted monthly credits, skip Deepgram transcription entirely in the sync endpoint. Audio stays on device and can be re-synced after upgrade. Saves cloud STT cost on locked conversations that would have been inaccessible anyway. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add free_credits_dg_blocked flag to the live transcription WebSocket that gates all cloud STT sends (Deepgram, Soniox, Speechmatics, multi-channel) when a free-tier user's monthly credits are exhausted. Re-checked periodically so mid-session upgrades resume cloud STT. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
14 tests covering: escalation bypass for free exhausted users, paid user score gate preservation, is_free_credits_exhausted helper, LLM classifier skip, shorter cooldown, and synthetic event recording. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Greptile SummaryThis PR fixes a gap in the fair-use enforcement system where free-tier users who exhaust their 1200 min/month credits could never be escalated because the LLM classifier scores personal conversations at 0.10\u20130.35, far below the 0.7 threshold. The fix introduces a Confidence Score: 4/5Safe to merge after fixing the lock-ordering issue; the P1 is a cost/efficiency regression, not a data-loss or security risk. The feature logic is correct and well-tested (14 new unit tests, 121 existing passing). The P1 issue defeats the Redis deduplication optimisation and introduces unbounded extra Firestore reads per active session, which will increase costs at scale and warrants a fix before merging. The P2 notification messaging issue is not blocking but should be tracked. backend/utils/fair_use.py — specifically the ordering of the Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Cap triggered] --> B{FAIR_USE_ENABLED?}
B -- No --> Z[No-op]
B -- Yes --> C[trigger_classifier_if_needed]
C --> D[is_free_credits_exhausted ⚠️ Firestore read BEFORE lock]
D --> E{Redis lock acquired?}
E -- No --> F[Return — cooldown active]
E -- Yes --> G{free_exhausted?}
G -- Yes --> H[Synthetic result\nusage_type=free_exhausted\ncooldown=1h]
G -- No --> I[LLM classifier\ncooldown=12h]
H --> J[escalate_enforcement]
I --> J
J --> K{passes_score_gate?}
K -- free_exhausted=True --> L[Bypass score — escalate on violation count]
K -- misuse_score >= 0.7 --> L
K -- Neither --> M[action=none]
L --> N{stage transition}
N --> O[Update Firestore + notify user]
style D fill:#ffcccc,stroke:#cc0000
|
| @@ -513,7 +564,17 @@ async def trigger_classifier_if_needed(uid: str, triggered_caps: list, session_i | |||
| return | |||
There was a problem hiding this comment.
is_free_credits_exhausted called before Redis lock — defeats deduplication
is_free_credits_exhausted(uid) (which triggers a Firestore read via users_db.get_user_valid_subscription plus a call to has_transcription_credits) is executed unconditionally before the Redis lock is ever checked. The entire point of the Redis lock was to rate-limit expensive operations — pay the DB cost at most once per cooldown window.
With this ordering, every call to trigger_classifier_if_needed (fired every FAIR_USE_CHECK_INTERVAL_SECONDS = 5 min when caps are triggered) performs the DB read regardless of whether the lock is held. For a free-exhausted user in an active session under the 1-hour cooldown, that’s ~11 wasted Firestore reads per hour instead of one — directly reversing the cost-saving intent.
The fix is to move the free_exhausted check to after the lock is acquired. If the lock is not acquired, exit immediately at no cost as before. Once the lock is held, call is_free_credits_exhausted to choose the synthetic vs. LLM path, and optionally use redis_client.expire(lock_key, 3600) to shorten the TTL for the free-exhausted case.
…orcement Reviewer feedback: free_credits_exhausted in classifier_result dict is privilege-bearing. Now derived from is_free_credits_exhausted(uid) call instead of trusting the dict payload. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Reviewer feedback: deepgram_profile_socket.send(), soniox_profile_socket.send(), and send_initial_file_path() were not gated by free_credits_dg_blocked. Free users with exhausted credits could still send audio during speech profile phase. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Reviewer feedback: tests now mock is_free_credits_exhausted() directly since escalate_enforcement no longer reads free_credits_exhausted from the classifier_result dict. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Review fixes appliedAddressed all 3 reviewer items:
by AI for @beastoin |
…DB reads The Firestore subscription check was running before the cooldown lock, causing unnecessary reads on every 5-minute cap trigger even when the function would return immediately due to an active lock. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Review iteration 2 fixFirestore read optimization — Fix: Moved the call after the lock is acquired. The lock is first set with the default 12h TTL, then if the user is free-exhausted, shortened to 1h via Commit: by AI for @beastoin |
The free_credits_dg_blocked recheck was gated on user_has_credits, which is refreshed later in the loop. This created a 1-tick delay (up to 5min) where audio would still be sent to DG after credits crossed zero. Now calls is_free_credits_exhausted() directly on every fair-use check interval, independent of the local credits cache timing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Review iteration 3 fixMid-session credit exhaustion timing gap — Fix: Removed the Commit: by AI for @beastoin |
…on tests Addresses tester feedback: - Add classifier spoof negative test (classifier dict cannot bypass score gate) - Add sync credits-exhausted response structure test - Add is_free_credits_exhausted/should_lock equivalence test - Strengthen non-free classifier test with proper assertions - Fix mid-session DG gate to not depend on user_has_credits timing Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
PR ready for mergeAll checkpoints passed:
Awaiting explicit merge approval. by AI for @beastoin |
…throttle record_dg_usage_ms now accepts both restrict-stage and free-tier budgets. is_dg_budget_exhausted accepts limit_ms override for free-tier daily cap. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ribe.py Remove separate free_credits_dg_blocked flag. Free-exhausted users now use the same fair_use_dg_budget_exhausted mechanism with FAIR_USE_FREE_DAILY_DG_MS (30 min/day default), consistent with restrict-stage daily budget approach. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ted users Instead of skipping all Deepgram processing when credits exhausted, check daily DG budget (FAIR_USE_FREE_DAILY_DG_MS). Allows up to 30 min/day of cloud STT for free users, consistent with live transcription throttle. Records DG usage after segment processing for budget tracking. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace TestSyncCreditsExhaustedEarlyReturn with TestSyncDgBudgetGate. Add TestDgBudgetFreeTier (limit_ms parameter, budget exhaustion checks). Add TestRecordDgUsageMsFreeTier (recording with only free budget configured). 23 tests total. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…exhausted_restrict (#6083) Free-exhausted users now go directly to restrict stage (skipping graduated escalation and LLM classifier). Auto-clears when credits return. Removes FAIR_USE_FREE_DAILY_DG_MS — reuses FAIR_USE_RESTRICT_DAILY_DG_MS. Simplifies is_dg_budget_exhausted (no limit_ms parameter). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace dual-path DG budget logic with ensure_free_exhausted_restrict(). Remove all FAIR_USE_FREE_DAILY_DG_MS and is_free_credits_exhausted refs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use ensure_free_exhausted_restrict() instead of dual-path DG budget logic. Remove is_free_credits_exhausted and FAIR_USE_FREE_DAILY_DG_MS imports. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tests cover: ensure_free_exhausted_restrict (set/auto-clear/noop), classifier skip for free-exhausted, escalation score gate always required, simplified DG budget (no limit_ms), record_dg_usage_ms guards. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
200 could confuse the mobile app into thinking the sync succeeded. 429 (Too Many Requests) correctly signals a rate/budget limit hit, and the app handles non-200/207 gracefully by stopping the batch. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…usted (#6083) - ensure_free_exhausted_restrict: respect FAIR_USE_KILL_SWITCH and FAIR_USE_EXEMPT_UIDS - is_hard_restricted: skip restrict_reason='free_exhausted' (DG budget only, not hard block) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…6083) - Only record DG usage when user is in restrict stage (not unrestricted users) - Derive credits_exhausted from real subscription state (should_lock) not hardcoded True Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…6083) Tests: kill-switch returns none, exempt UID returns none, free-exhausted restrict not hard-blocked, abuse-restrict still hard-blocked. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…#6083) Add fair_use_track_dg_usage flag set only when user is in restrict stage. Prevents unrestricted users from pre-consuming the daily DG budget before becoming restricted later the same UTC day. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Pre-charge the request's speech duration to the DG budget before processing segments. If the pre-charge busts the budget, return 429 immediately. Prevents arbitrarily long uploads from overshooting the daily cap. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…eeded (#6083) Check lock first (cheap Redis op), then do Firestore read for free-exhausted. Prevents expensive Firestore/subscription reads when lock rejects the run. Release lock after early return for free-exhausted path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…6083) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…cycle (#6083) Save previous_stage when setting restrict for free-exhausted. Restore to previous_stage (not 'none') when credits return, so abuse escalation (warning/throttle) is not wiped by credit exhaustion cycle. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…6083) Fix inlined import rule violation: import subscription helpers at module top level instead of lazy getter pattern. Keep classify_user_purpose and send_notification lazy due to import-time side effects. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Patch subscription helpers on fair_use_mod directly instead of _subscription_mod, matching the new top-level import binding. Remove stale _has_transcription_credits/_is_paid_plan resets. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ct (#6083) Remove ensure_free_exhausted_restrict() and the separate free-exhausted path. Free users who hit monthly limits now get a synthetic classifier score of 1.0 in trigger_classifier_if_needed, following the same graduated escalation (none → warning → throttle → restrict) as abuse detection. Remove restrict_reason/previous_stage fields and is_hard_restricted free_exhausted bypass. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace ensure_free_exhausted_restrict with get_enforcement_stage since free-exhausted users now go through normal graduated escalation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace ensure_free_exhausted_restrict with get_enforcement_stage since free-exhausted users now go through normal graduated escalation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace ensure_free_exhausted_restrict tests with tests verifying synthetic score 1.0 flows through normal escalation pipeline. Test full graduation: none→warning, warning→throttle, throttle→restrict. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…#6083) Eliminate all in-function imports per CLAUDE.md rule. Tests stub these modules via sys.modules.setdefault before importing fair_use. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move record_dg_usage_ms after successful processing to prevent charging budget on rejected/retried requests. Check exhaustion first, process, then record usage. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When classifier fires (soft cap triggered), immediately enable DG tracking so usage during the escalation window is captured. Fixes budget-accounting gap where mid-session restrict escalation wouldn't start tracking until next 5-min poll. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add stubs for utils.llm.fair_use_classifier and utils.notifications. Patch classify_user_purpose and send_notification directly on fair_use module instead of through getter functions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tern (#6083) fair_use_classifier.py constructs ChatOpenAI at import time, which raises openai.OpenAIError without OPENAI_API_KEY. Restore deferred getters (_get_classify_user_purpose, _get_send_notification) so the import only happens when the function is actually called in async runtime paths. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Patch _get_classify_user_purpose and _get_send_notification instead of the direct function names, matching the deferred import pattern. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Patch _get_classify_user_purpose instead of setting classify_user_purpose directly, matching the deferred import pattern in fair_use.py. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Test is_free_credits_exhausted fail-open on DB exception - Test synthetic score payload stored in fair-use event - Reset create_fair_use_event mock in setup_method for test isolation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CP9 Changed-Path Coverage Checklist
L1 SynthesisChanged paths P1-P3 proven via direct function execution with mocked dependencies (import + call + assert). P4-P6 proven via source structure verification and unit test coverage (147 tests across 7 files). Non-happy-path behavior proven for P1 (exception fail-open), P2 (LLM not called), P4 (budget exhaustion). P4-P6 cannot be integration-tested at L1 because by AI for @beastoin |
CP9B — L2 Integration Test EvidenceReal Redis integration tests (24 + 46 tests)Updated checklist — L2 results
L2 SynthesisP1-P3 proven end-to-end with real Redis (70 integration tests pass). P4-P6 cannot be integration-tested at L2 because by AI for @beastoin |
|
lgtm |
Summary
Fixes #6083: Free-tier users who exhaust their 1200 min/month credits now follow the same graduated enforcement pipeline as abuse-detected users.
Architecture: In
trigger_classifier_if_needed, free-exhausted users get a synthetic score of 1.0 (instead of the LLM classifier) and proceed through the normalnone → warning → throttle → restrictescalation path. No separate enforcement path — everything flows through the existingescalate_enforcementfunction.Changes:
utils/fair_use.py: Addis_free_credits_exhausted()check → synthetic score 1.0 for free-exhausted users, skipping LLM classifier. Uses deferred import pattern forclassify_user_purpose(ChatOpenAI constructed at import time) andsend_notification(Firebase init required).routers/sync.py: Check DG budget before processing, record usage after. Return 429 when budget exhausted.routers/transcribe.py: Start DG tracking proactively when soft caps trigger (before classifier result returns).Deployment Steps
Deploy backend to production:
Verify deployment:
kubectl rollout status deployment/backend -n prodkubectl logs -l app=backend -n prod --tail=50Enable fair-use for free-tier (if not already enabled):
FAIR_USE_ENABLED=truemust be set on the backend deploymentFAIR_USE_RESTRICT_DAILY_DG_MS=1800000(30 min daily DG budget for restrict-stage users)Post-deploy verification:
fair_use: free-exhaustedentries confirming synthetic score path is activefair_use: error checking free creditslog linesRollback (if needed):
FAIR_USE_ENABLED=falseto disable all fair-use enforcementFAIR_USE_KILL_SWITCH=trueas emergency kill switchTest plan
Closes #6083
🤖 Generated with Claude Code
by AI for @beastoin