Skip to content

[codex] Add client failure reporting and visible surfacing#27

Merged
anantham merged 4 commits intomainfrom
fix/codex-telemetry-ux
Apr 4, 2026
Merged

[codex] Add client failure reporting and visible surfacing#27
anantham merged 4 commits intomainfrom
fix/codex-telemetry-ux

Conversation

@anantham
Copy link
Copy Markdown
Owner

@anantham anantham commented Apr 2, 2026

Summary

This PR closes the immediate silent-failure gaps in the reader and lays down the first client telemetry path for production debugging. It makes known and unexpected translation failures visible to users, records the failure type at the store boundary, and adds a Vercel callback proof for best-effort client telemetry.

Root Cause

Translation failures could disappear in production because three paths were incomplete:

  • the shared trial banner visibility check was inverted, so trial users could miss the banner entirely
  • store-backed notifications were written but never rendered
  • auto-translate fired a promise without a catch path, so unexpected failures could look like a hang instead of a surfaced error

A post-review regression surfaced at the async boundary:

  • MainApp assumed handleTranslate() always returned a thenable and chained .catch() directly
  • the integration test exposed that sync or mock implementations returning undefined broke that assumption

Changes

  • MainApp.tsx: mount NotificationToast once at the app root, catch unexpected auto-translate failures, and normalize the handleTranslate() boundary through Promise.resolve().then(...)
  • components/DefaultKeyBanner.tsx: fix shared-trial banner visibility and emit visible render telemetry
  • components/NotificationToast.tsx: add the missing notification renderer
  • components/ChapterView.tsx and components/chapter/ChapterContent.tsx: carry error telemetry into the reader surface and emit ui_error_rendered
  • services/clientTelemetry.ts and types/telemetry.ts: add the v1 client telemetry contract, redacting transport, and callback/analytics routing
  • services/telemetryService.ts: forward early boot errors, uncaught exceptions, and unhandled rejections into the client telemetry path
  • services/ai/apiKeyValidation.ts, services/translationService.ts, store/slices/translationsSlice.ts, store/slices/uiSlice.ts, store/slices/chaptersSlice.ts: classify translation failures and propagate structured telemetry metadata through the store
  • api/client-telemetry.js: add the Vercel callback proof handler
  • tests/store/appScreen.integration.test.tsx: align the mocked handleTranslate() contract with the async store API
  • Rebased branch onto current main to pick up ff5d821 and 54d4279
  • tests + docs: add regression coverage and document deployed callback verification plus post-review verification notes

Testing

  • export PATH="$HOME/.nvm/versions/node/v22.17.0/bin:$PATH" && npx vitest run tests/store/appScreen.integration.test.tsx
  • export PATH="$HOME/.nvm/versions/node/v22.17.0/bin:$PATH" && npx vitest run tests/db/migrations/fresh-install.test.ts
  • export PATH="$HOME/.nvm/versions/node/v22.17.0/bin:$PATH" && npx vitest run tests/services/navigationService.test.ts
  • export PATH="$HOME/.nvm/versions/node/v22.17.0/bin:$PATH" && npx vitest run tests/components/NotificationToast.test.tsx tests/components/DefaultKeyBanner.test.tsx tests/components/chapter/ChapterContent.test.tsx tests/current-system/translation.test.ts tests/services/clientTelemetry.test.ts tests/services/api-key-validation.test.ts tests/api/client-telemetry.test.ts
  • export PATH="$HOME/.nvm/versions/node/v22.17.0/bin:$PATH" && npx vitest run tests/smoke/critical-components.smoke.test.tsx
  • export PATH="$HOME/.nvm/versions/node/v22.17.0/bin:$PATH" && npm run build
  • export PATH="$HOME/.nvm/versions/node/v22.17.0/bin:$PATH" && npx vercel build
  • export PATH="$HOME/.nvm/versions/node/v22.17.0/bin:$PATH" && npx vercel curl /api/client-telemetry --deployment https://codex-telemetry-fa9kmjyhm-adityas-projects-9c03351d.vercel.app
  • export PATH="$HOME/.nvm/versions/node/v22.17.0/bin:$PATH" && npx vercel curl /api/client-telemetry --deployment https://codex-telemetry-fa9kmjyhm-adityas-projects-9c03351d.vercel.app -- --request POST --header 'content-type: application/json' --data '{"event_type":"translation_failed"}'
  • Mixed-suite note: the large combined Vitest command still produced timeout-shaped failures in appScreen.integration and the smoke import, but both files passed immediately in isolated reruns after rebase. fresh-install and navigationService were green in the combined run, confirming the original branch-skew failures are resolved.

Review Checklist

  • Non-trivial change stays on feature branch fix/codex-telemetry-ux
  • Diff is scoped to the telemetry/visibility slice plus the post-review async-boundary fix
  • No vercel.json rewrite change was made because the deployed proof showed /api/client-telemetry is not shadowed by the SPA rewrite
  • Branch is rebased onto current main
  • Ready for Codex review

@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 2, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
lexicon-forge Ready Ready Preview, Comment Apr 4, 2026 8:25pm

@anantham
Copy link
Copy Markdown
Owner Author

anantham commented Apr 3, 2026

Code review

Note: This PR is still a draft. Review based on current state.

Found 3 issues:

  1. translationsSlice.ts is now 1,168 LOC after adding the emitTranslationFailure telemetry helper (lines 85-157). This telemetry emission concern has a different change cadence from translation orchestration and meets the split criteria in CLAUDE.md: "The file has more than one reason to change" and all four friction signals are present (high surprise radius, bad locality, poor findability, weak testability). Consider extracting to services/translationTelemetry.ts. (CLAUDE.md says ">1000 LOC: Friction is likely -- investigate and propose a split if confirmed")

const emitTranslationFailure = ({
chapterId,
origin,
provider,
model,
failureType,
errorMessage,
expected,
}: {
chapterId: string;
origin: TranslationOrigin;
provider: AppSettings['provider'];
model: string;
failureType: Extract<
TelemetryFailureType,
'trial_limit' | 'missing_api_key' | 'timeout' | 'provider_malformed_response' | 'unknown'
>;
errorMessage: string;
expected: boolean;
}): TelemetryErrorContext => {
const sourceEventType = failureType === 'trial_limit' ? 'known_limit_reached' : 'translation_failed';
clientTelemetry.emit({
eventType: sourceEventType,
failureType,
surface: origin,
severity: sourceEventType === 'known_limit_reached' || expected ? 'warning' : 'error',
expected,
userVisible: null,
provider,
model,
chapterId,
errorMessage,
dedupeCallback: !expected,
});
return {
sourceEventType,
failureType,
surface: origin,
expected,
provider,
model,
chapterId,
};
};
export const createTranslationsSlice: StateCreator<
any,
[],
[],
TranslationsSlice
> = (set, get) => ({
// Initial state
activeTranslations: {},
pendingTranslations: new Set(),
feedbackHistory: {},
amendmentProposals: [],
translationProgress: {},
// Translation operations
handleTranslate: async (chapterId, origin = 'manual_translate') => {
const state = get();
if (state.pendingTranslations.has(chapterId)) {
debugLog('translation', 'summary', '[Retranslate] Already translating, ignoring click', { chapterId });
return;
}
const context: TranslationContext = {
chapters: (state as any).chapters || new Map(),
settings: (state as any).settings,
activePromptTemplate: (state as any).activePromptTemplate
};

  1. getBuildId() falls back to VERCEL_URL (a deployment URL like my-app-abc123.vercel.app) as a build identifier. This produces misleading telemetry -- every deployment gets a different "build_id" that looks like a URL, not a commit SHA. The fallback should be null, not a deployment URL.

const getBuildId = (): string | null => {
const env = getRuntimeEnv();
return env.VERCEL_GIT_COMMIT_SHA || env.VITE_APP_BUILD_ID || env.VERCEL_URL || null;
};

  1. In the auto-translate .catch() block, requestedRef.current.delete(currentChapterId) removes the deduplication guard that prevents re-triggering. On persistent failures (bad API key, network down), the next render cycle will re-trigger auto-translate since the guard was deleted, creating a retry loop. The pendingTranslations check in handleTranslate provides a secondary guard, but the ref deletion defeats the primary intent. Consider keeping the ref intact on error so the chapter is not re-requested with the same settings.

LexiconForge/MainApp.tsx

Lines 219 to 232 in 7900adb

.catch((error) => {
console.error('[AutoTranslate] Unexpected translation failure:', error);
requestedRef.current.delete(currentChapterId);
const store = useAppStore.getState();
const message = error instanceof Error ? error.message : 'Unexpected translation failure';
clientTelemetry.emit({
eventType: 'translation_failed',
failureType: 'unknown',
surface: 'auto_translate',
severity: 'error',
expected: false,
userVisible: null,
provider: settings.provider,
model: settings.model,

Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

Copy link
Copy Markdown
Owner Author

anantham commented Apr 3, 2026

Follow-up pushed in 44de755 to address the two concrete review items only:

  • services/clientTelemetry.ts: build_id no longer falls back to VERCEL_URL; it now resolves to VERCEL_GIT_COMMIT_SHA, VITE_APP_BUILD_ID, or null
  • MainApp.tsx: the auto-translate catch path no longer clears requestedRef, so the same chapter/settings pair is not auto-requested again after an unexpected failure
  • Added focused regression coverage in tests/services/clientTelemetry.test.ts and tests/store/appScreen.integration.test.tsx

The translationsSlice.ts size/splitting concern is valid, but I left that out of this PR to keep the fix scoped to the concrete behavioral bugs.

anantham added 4 commits April 4, 2026 16:23
Context:
- translation failures could disappear in production because banner visibility, notification rendering, and fire-and-forget auto-translate handling were incomplete
- the Vercel callback path needed proof before expanding client telemetry coverage

Changes:
- fix shared trial banner visibility and add a notification toast renderer with a single app-level mount
- add the v1 client telemetry contract, redacting transport, global error forwarding, and classified translation failure emission
- add a Vercel /api/client-telemetry proof handler with deployment notes and route verification
- add regression and telemetry coverage for banner visibility, inline error rendering, callback validation, and translation classification

Impact:
- known and unexpected client failures are visible and measurable instead of failing silently
- unexpected auto-translate promise failures now surface in UI and telemetry
- the current Vite catch-all rewrite is proven not to shadow /api/client-telemetry

Tests:
- export PATH="/Users/aditya/.nvm/versions/node/v22.17.0/bin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/Library/Apple/usr/bin:/opt/homebrew/bin:/Users/aditya/.codex/tmp/arg0/codex-arg0pDiON6:/Users/aditya/.antigravity/antigravity/bin:/Users/aditya/bin:/Users/aditya/.claude/bin:/Applications/Sublime Text.app/Contents/SharedSupport/bin:/Users/aditya/.local/bin:/Users/aditya/.nvm/versions/node/v22.17.0/bin:/Applications/Codex.app/Contents/Resources" && npx vitest run tests/components/NotificationToast.test.tsx tests/components/DefaultKeyBanner.test.tsx tests/components/chapter/ChapterContent.test.tsx tests/current-system/translation.test.ts tests/services/clientTelemetry.test.ts tests/services/api-key-validation.test.ts
- export PATH="/Users/aditya/.nvm/versions/node/v22.17.0/bin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/Library/Apple/usr/bin:/opt/homebrew/bin:/Users/aditya/.codex/tmp/arg0/codex-arg0pDiON6:/Users/aditya/.antigravity/antigravity/bin:/Users/aditya/bin:/Users/aditya/.claude/bin:/Applications/Sublime Text.app/Contents/SharedSupport/bin:/Users/aditya/.local/bin:/Users/aditya/.nvm/versions/node/v22.17.0/bin:/Applications/Codex.app/Contents/Resources" && npm run build
- export PATH="/Users/aditya/.nvm/versions/node/v22.17.0/bin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/Library/Apple/usr/bin:/opt/homebrew/bin:/Users/aditya/.codex/tmp/arg0/codex-arg0pDiON6:/Users/aditya/.antigravity/antigravity/bin:/Users/aditya/bin:/Users/aditya/.claude/bin:/Applications/Sublime Text.app/Contents/SharedSupport/bin:/Users/aditya/.local/bin:/Users/aditya/.nvm/versions/node/v22.17.0/bin:/Applications/Codex.app/Contents/Resources" && npx vercel build
- export PATH="/Users/aditya/.nvm/versions/node/v22.17.0/bin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/Library/Apple/usr/bin:/opt/homebrew/bin:/Users/aditya/.codex/tmp/arg0/codex-arg0pDiON6:/Users/aditya/.antigravity/antigravity/bin:/Users/aditya/bin:/Users/aditya/.claude/bin:/Applications/Sublime Text.app/Contents/SharedSupport/bin:/Users/aditya/.local/bin:/Users/aditya/.nvm/versions/node/v22.17.0/bin:/Applications/Codex.app/Contents/Resources" && npx vercel curl /api/client-telemetry --deployment https://codex-telemetry-fa9kmjyhm-adityas-projects-9c03351d.vercel.app
- export PATH="/Users/aditya/.nvm/versions/node/v22.17.0/bin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/Library/Apple/usr/bin:/opt/homebrew/bin:/Users/aditya/.codex/tmp/arg0/codex-arg0pDiON6:/Users/aditya/.antigravity/antigravity/bin:/Users/aditya/bin:/Users/aditya/.claude/bin:/Applications/Sublime Text.app/Contents/SharedSupport/bin:/Users/aditya/.local/bin:/Users/aditya/.nvm/versions/node/v22.17.0/bin:/Applications/Codex.app/Contents/Resources" && npx vercel curl /api/client-telemetry --deployment https://codex-telemetry-fa9kmjyhm-adityas-projects-9c03351d.vercel.app -- --request POST --header 'content-type: application/json' --data '{"event_type":"translation_failed"}'

Docs:
- docs/WORKLOG.md
- docs/guides/DEPLOYMENT.md
Context:
- MainApp assumed handleTranslate returned a thenable and chained .catch() directly
- appScreen.integration.test.tsx exposed that the boundary throws when a mock or sync implementation returns undefined

Changes:
- normalize the auto-translate call through Promise.resolve().then(...) so sync throws, undefined returns, and real promises share the same catch path
- update the integration test mock to reflect the intended async handleTranslate contract

Impact:
- auto-translate error handling is robust at the async boundary
- appScreen integration tests no longer fail on the same regression pattern

Tests:
- export PATH="/Users/aditya/.nvm/versions/node/v22.17.0/bin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/Library/Apple/usr/bin:/opt/homebrew/bin:/Users/aditya/.codex/tmp/arg0/codex-arg0pDiON6:/Users/aditya/.antigravity/antigravity/bin:/Users/aditya/bin:/Users/aditya/.claude/bin:/Applications/Sublime Text.app/Contents/SharedSupport/bin:/Users/aditya/.local/bin:/Users/aditya/.nvm/versions/node/v22.17.0/bin:/Applications/Codex.app/Contents/Resources" && npx vitest run tests/store/appScreen.integration.test.tsx
Context:
- the post-review checklist required a real async-boundary fix, a rebase onto current main, and updated verification notes
- WORKLOG needed the rebase result and the distinction between real regressions and mixed-suite timeout noise

Changes:
- record the MainApp promise-normalization fix and the aligned appScreen integration mock
- document the successful rebase onto origin/main and the single worklog conflict resolution
- log the post-rebase targeted test outcomes, including isolated reruns for the timeout-shaped mixed-suite failures

Impact:
- future review can distinguish resolved regressions from test-runner contention artifacts without rediscovering them

Tests:
- export PATH="/Users/aditya/.nvm/versions/node/v22.17.0/bin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/Library/Apple/usr/bin:/opt/homebrew/bin:/Users/aditya/.codex/tmp/arg0/codex-arg0pDiON6:/Users/aditya/.antigravity/antigravity/bin:/Users/aditya/bin:/Users/aditya/.claude/bin:/Applications/Sublime Text.app/Contents/SharedSupport/bin:/Users/aditya/.local/bin:/Users/aditya/.nvm/versions/node/v22.17.0/bin:/Applications/Codex.app/Contents/Resources" && npx vitest run tests/store/appScreen.integration.test.tsx
- export PATH="/Users/aditya/.nvm/versions/node/v22.17.0/bin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/Library/Apple/usr/bin:/opt/homebrew/bin:/Users/aditya/.codex/tmp/arg0/codex-arg0pDiON6:/Users/aditya/.antigravity/antigravity/bin:/Users/aditya/bin:/Users/aditya/.claude/bin:/Applications/Sublime Text.app/Contents/SharedSupport/bin:/Users/aditya/.local/bin:/Users/aditya/.nvm/versions/node/v22.17.0/bin:/Applications/Codex.app/Contents/Resources" && npx vitest run tests/smoke/critical-components.smoke.test.tsx
- export PATH="/Users/aditya/.nvm/versions/node/v22.17.0/bin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/Library/Apple/usr/bin:/opt/homebrew/bin:/Users/aditya/.codex/tmp/arg0/codex-arg0pDiON6:/Users/aditya/.antigravity/antigravity/bin:/Users/aditya/bin:/Users/aditya/.claude/bin:/Applications/Sublime Text.app/Contents/SharedSupport/bin:/Users/aditya/.local/bin:/Users/aditya/.nvm/versions/node/v22.17.0/bin:/Applications/Codex.app/Contents/Resources" && npm run build
Context:
- PR review correctly flagged two concrete telemetry bugs: build_id was falling back to VERCEL_URL, and the auto-translate catch path cleared the requested fingerprint after unexpected failures
- both issues affected telemetry correctness and could cause misleading or repeated client behavior in production

Changes:
- remove VERCEL_URL from getBuildId() so callback payloads only carry a real build identifier or null
- keep the auto-translate requestedRef fingerprint intact on unexpected failures so the same chapter/settings pair is not auto-requested again
- add focused regression tests for the build_id fallback and the no-auto-retry behavior
- record the follow-up in docs/WORKLOG.md

Impact:
- build_id semantics are now consistent with the field name
- unexpected auto-translate failures no longer weaken the primary dedupe guard

Tests:
- export PATH="/Users/aditya/.nvm/versions/node/v22.17.0/bin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/Library/Apple/usr/bin:/opt/homebrew/bin:/Users/aditya/.codex/tmp/arg0/codex-arg0pDiON6:/Users/aditya/.antigravity/antigravity/bin:/Users/aditya/bin:/Users/aditya/.claude/bin:/Applications/Sublime Text.app/Contents/SharedSupport/bin:/Users/aditya/.local/bin:/Users/aditya/.nvm/versions/node/v22.17.0/bin:/Applications/Codex.app/Contents/Resources" && npx vitest run tests/services/clientTelemetry.test.ts tests/store/appScreen.integration.test.tsx
- export PATH="/Users/aditya/.nvm/versions/node/v22.17.0/bin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/Library/Apple/usr/bin:/opt/homebrew/bin:/Users/aditya/.codex/tmp/arg0/codex-arg0pDiON6:/Users/aditya/.antigravity/antigravity/bin:/Users/aditya/bin:/Users/aditya/.claude/bin:/Applications/Sublime Text.app/Contents/SharedSupport/bin:/Users/aditya/.local/bin:/Users/aditya/.nvm/versions/node/v22.17.0/bin:/Applications/Codex.app/Contents/Resources" && npm run build
@anantham anantham force-pushed the fix/codex-telemetry-ux branch from cab937e to 99419ea Compare April 4, 2026 20:25
@anantham anantham marked this pull request as ready for review April 4, 2026 20:25
@anantham anantham merged commit f28752f into main Apr 4, 2026
5 checks passed
@anantham anantham deleted the fix/codex-telemetry-ux branch April 5, 2026 17:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant