Skip to content

feat: observability — performance tab, country capture, recovery email#2270

Merged
aalemayhu merged 3 commits into
mainfrom
feat/observability
May 15, 2026
Merged

feat: observability — performance tab, country capture, recovery email#2270
aalemayhu merged 3 commits into
mainfrom
feat/observability

Conversation

@aalemayhu
Copy link
Copy Markdown
Contributor

Summary

Bundles items 1–7 from today's action-item list plus the Performance tab under /ops/performance. Single PR per Al's instruction.

  • Country capture at signup from CloudFront-Viewer-Country (with cf-ipcountry / x-vercel-ip-country fallbacks). Stored once on users.signup_country; lazy backfill on getLocals for existing users.
  • Job timing & card count instrumentation. jobs.card_count persisted by CompleteJobUseCase; duration is computed from existing created_atlast_edited_time so no new timing columns.
  • Performance tab on /ops/performance. Job duration percentiles (24h + 7d, p50 / p95 / p99), terminal-status breakdown, top 20 slowest jobs in last 24h, and signup-country breakdown over 7d. 30s auto-refresh.
  • Abandoned-checkout recovery email. New template (mascot + dark-mode + responsive per email-templates.md), EmailService.sendAbandonedCheckoutRecoveryEmail, SendAbandonedCheckoutRecoveryUseCase (dedupes, validates, dry-run-by-default), and POST /api/ops/send-abandoned-checkout-recovery so Al can fire it once the 234 candidates are extracted from the Stripe export.
  • US-localized pricing copy. When signupCountry === 'US' the /pricing intro switches to MCAT / USMLE / bar-exam framing. Identical 100-card cap everywhere — no silent geo variance.
  • VOICE.md amendment. Protected string updated: "100 cards per month" (marketing on /pricing) + "Your monthly limit: N cards" (in-product display).
  • Oct 2024 post-mortem. Documentation/post-mortems/2024-10-subscriber-data-deletion.md retroactively captures the 472 → 206 paid-signup collapse caused by the subscriber-data deletion regression (PR feat: Implement worker threads for upload processing #1591 worker-threads on Sep 1 lifted conversion; missing await on Sep 28–Oct 14 ate the gains).

Why one PR

Per Al: bundle all of items 1–7 plus observability instrumentation. The pieces are independent in code but share one schema migration and one round of Kanel regen — splitting would have made the migration order brittle.

Out of scope (deferred follow-ups)

  • Deletion-volume alerts on the new Performance tab — needs a week of baseline data first.
  • End-to-end test for cleanup-vs-subscriber path (called out in the post-mortem) — separate PR.
  • The actual send of the 234 recovery emails — operational, Al fires the endpoint with the email list.
  • Distribution / channel work toward 786 → 12K — this PR is instrumentation, not acquisition.

Test plan

  • pnpm tsc --noEmit (server) ✓
  • pnpm typecheck (web) ✓
  • pnpm jest (server) — 167 suites passed, 1161 tests passed
  • pnpm --filter 2anki-web test (vitest) — 67 files passed, 469 tests passed
  • pnpm --filter 2anki-web lint (Biome) ✓
  • Open /ops/performance in dev and confirm the four panels render (empty state acceptable until traffic hits new code)
  • Hit /api/ops/send-abandoned-checkout-recovery with a dry-run body and confirm candidate count comes back
  • Hit /pricing while signed in from a US IP (or with a US row in users.signup_country) and confirm the MCAT/USMLE hero renders

Notes

  • Sonar scanner not installed locally; expect a possible Sonar bounce post-push.
  • 786 → 12K is not solved by this PR. This is the instrumentation layer that lets the next reliability win be measured the way Sep 2024's worker-threads spike could only be measured retroactively.

🤖 Generated with Claude Code

aalemayhu and others added 2 commits May 15, 2026 16:17
Bundled per Al: items 1-7 + perf tab on /ops/observability.

Schema
- migration adds users.signup_country (varchar 2) and jobs.card_count (int)
- two indexes for the new query paths

Country capture
- extractCountryFromRequest reads CloudFront-Viewer-Country (with cf-ipcountry
  and x-vercel-ip-country fallbacks); guarded against missing headers
- captured on register/loginWithGoogle/loginWithNotion for new users only
- lazy backfill on getLocals when the user has no stored country
- all writes wrapped in try/catch — country is additive context

Job timing
- CompleteJobUseCase persists cardCount through updateJobStatus
- duration is computed from existing created_at and last_edited_time
  in PerformanceMetricsService — no new timing columns

Performance tab — /ops/performance
- PerformanceMetricsService runs four parallel queries: 24h+7d duration
  percentiles (p50/p95/p99), terminal-status breakdown last 24h, slowest 20
  done jobs last 24h, signup country breakdown last 7d
- new GetPerformanceMetricsUseCase wires it through to OpsController
- /api/ops/performance/metrics behind RequireOpsAccess (404, not 403)
- React tab renders tables + simple bar visualisations, 30s refetch

Abandoned-checkout recovery
- new EmailService.sendAbandonedCheckoutRecoveryEmail + template (mascot,
  dark-mode, responsive, footer — per email-templates rules)
- SendAbandonedCheckoutRecoveryUseCase dedupes + validates emails,
  dry-run by default, records per-email failures
- /api/ops/send-abandoned-checkout-recovery for one-shot triggering with
  { emails: string[], dryRun?: boolean }

US-localized pricing
- PricingPage receives signupCountry from /api/users/debug/locals
- 'US' renders an MCAT/USMLE/bar-exam framed hero; identical 100-card cap

Voice + docs
- VOICE.md protected string moves to "Your monthly limit: 100 cards" for
  in-product display; marketing copy on /pricing keeps "100 cards per month"
- Documentation/post-mortems/2024-10-subscriber-data-deletion.md captures
  the worker-threads → conversion → deletion regression lesson

Notes
- Sonar scanner not installed locally; expect a possible bounce on push
- 786 → 12K is a distribution/channel goal that this PR doesn't solve;
  this is the instrumentation layer that lets the next reliability wins
  be measured the way Sep 2024's worker-threads spike could only be
  measured retroactively

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Extract resolveSignupCountry helper from getLocals to drop cognitive
  complexity from 20 to under 15 (S3776 CRITICAL)
- Flip != null ternaries in PerformanceMetricsService to == null first
  branch — lead-with-positive (S7735, 5 minor)
- replace(/,/g, ' ') → replaceAll(',', ' ') in PerformanceTab formatter
  (S7781 minor)
- Add multicriteria waivers for email-template HTML rules (Web:S1827,
  Web:S5257, Web:S6819, css:S7924) scoped to
  src/services/EmailService/templates/** — table-based layout and
  deprecated presentational attrs are mandatory for email-client
  compatibility (Gmail/Outlook/Apple Mail)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@aalemayhu aalemayhu marked this pull request as ready for review May 15, 2026 14:37
- EMAIL_RE in SendAbandonedCheckoutRecoveryUseCase: cap each segment with
  explicit upper bounds (64/255/63) and length-check the full string
  against RFC 5321's 320-char limit. The original pattern was already
  linear-time (disjoint negated classes + literal anchors), but Sonar's
  S5852 conservatively flags unbounded `+` quantifiers; explicit bounds
  remove the hotspot.

- Lift the duration-ms SQL fragment and the "done jobs since" WHERE
  clause to module constants in PerformanceMetricsService — both the
  percentiles query and the slowest-jobs query were ~30 lines of
  near-identical SQL (Sonar flagged the 20% duplication on new code).

- Sonar cpd.exclusions: add src/data_layer/public/** (Kanel-generated,
  Initializer/Mutator interfaces repeat the same column list 3x) and
  web/src/pages/OpsPage/performanceTypes.ts (intentional duplication
  of the server response shape because web cannot import from server).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@aalemayhu aalemayhu merged commit 0befbe1 into main May 15, 2026
3 checks passed
@aalemayhu aalemayhu deleted the feat/observability branch May 15, 2026 14:42
@sonarqubecloud
Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
6.0% Duplication on New Code (required ≤ 3%)

See analysis details on SonarQube Cloud

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant