Skip to content

fix(offline-queue): watchdog reclaim + idempotent payment INSERTs (#52)#59

Merged
TortoiseWolfe merged 8 commits intomainfrom
052/offline-queue-watchdog-idempotency
Apr 27, 2026
Merged

fix(offline-queue): watchdog reclaim + idempotent payment INSERTs (#52)#59
TortoiseWolfe merged 8 commits intomainfrom
052/offline-queue-watchdog-idempotency

Conversation

@TortoiseWolfe
Copy link
Copy Markdown
Owner

Closes #52.

Summary

Two coupled changes that close the offline-queue retry-safety hole:

  1. Watchdog reclaim in base-queue.ts — stuck processing rows older than processingTimeoutMs (default 60s) reset to pending, so a tab-crash mid-process doesn't leave the user in limbo.
  2. Idempotency on the INSERT side — partial unique index on payment_intents.idempotency_key + upsert(..., { onConflict, ignoreDuplicates: true }). A retry whose prior attempt actually succeeded is now a server-side no-op, not a double-charge.

Shipping reclaim alone would create double-charges; idempotency alone wouldn't fix the limbo case. They land together.

Why this matters (failure mode being closed)

A user clicks "Pay" while offline → IndexedDB queues a payment_intent row. Network restores; sync() starts; tab atomically claims the item (status: pending → processing); tab calls INSERT INTO payment_intents; before the response comes back, the laptop closes.

Today: the row stays processing forever. The claim guard (status === 'pending') blocks any other tab from re-claiming. The user's payment is in limbo. They re-click "Pay" because they don't see it succeeded; now they get charged twice when the original eventually flushes through some other path.

After this PR: the watchdog flips processing back to pending on next sync; ON CONFLICT DO NOTHING makes the retry a no-op against the row the prior attempt actually wrote. The user is charged exactly once, regardless of which side of the network call the crash happened on.

What changed

File Change
supabase/migrations/20251006_complete_monolithic_setup.sql payment_intents.idempotency_key TEXT + partial unique index (only enforced when set)
src/lib/offline-queue/types.ts processingTimeoutMs (default 60_000), SyncResult.conflicted, ProcessItemResult discriminator
src/lib/offline-queue/base-queue.ts Watchdog reclaim sweep at top of sync(); conflict-aware accounting in per-item branch; abstract signature accepts ProcessItemResult | void
src/lib/offline-queue/payment-adapter.ts Generate idempotency_key at queue-time; switch INSERT to upsert(..., ignoreDuplicates: true); return {status: completed | conflicted}
src/lib/offline-queue/__tests__/base-queue.test.ts +3 cases: watchdog reclaim, fresh-processing-not-reclaimed, conflict accounting
src/lib/offline-queue/__tests__/payment-adapter.test.ts New file: +2 cases for idempotency_key flow + conflict-on-zero-row
docs/superpowers/specs/2026-04-27-offline-queue-watchdog-and-idempotency-design.md Design rationale (new)
docs/superpowers/plans/2026-04-27-offline-queue-watchdog-and-idempotency.md Implementation plan (new)

Out of scope

  • Messaging offline queue (src/services/messaging/offline-queue-service.ts) — separate service, separate concerns. If the same shape bites there, file separately.
  • subscription_update operation — UPDATE is implicitly idempotent by primary key.
  • Backfill of pre-existing payment_intents rows — they expire at 24h.

Test plan

  • pnpm vitest run src/lib/offline-queue/ — 21/21 pass (19 base-queue + 2 payment-adapter)
  • pnpm vitest run tests/unit/ — 379/379 pass (no regressions in shared types)
  • pnpm test:rls against cloud — 55/55 pass (RLS policies unaffected by the column addition)
  • pnpm run type-check — clean
  • pnpm exec eslint src/lib/offline-queue/ — clean
  • Cloud schema migrated via Supabase Management API per CLAUDE.md (column + partial unique index verified via information_schema.columns + pg_indexes)

Commit narrative (8 commits)

  1. c5c7083 schema: idempotency_key + partial unique index
  2. 53c3024 types: processingTimeoutMs, conflicted, ProcessItemResult
  3. f310bf6 watchdog reclaim impl + 2 regression tests
  4. 89fa6da conflict-aware accounting + 1 test
  5. 3aec59d adapter: generate idempotency_key at queue-time
  6. 19630de adapter: upsert with ignoreDuplicates
  7. ad83012 adapter dedupe regression tests
  8. b30ef90 docs: spec + implementation plan

🤖 Generated with Claude Code

TurtleWolfe and others added 8 commits April 27, 2026 13:35
…ntents

Supports the offline-queue retry safety work tracked in #52: a queued
INSERT that's retried after a tab crash must not produce a duplicate
payment_intents row. The partial unique index (WHERE idempotency_key IS
NOT NULL) keeps existing rows and direct-server INSERTs valid; only
client-queued INSERTs participate in dedupe.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…g types

Prepares the type surface for the watchdog-reclaim and conflict-aware
accounting that follow in subsequent commits. The three existing
SyncResult return sites in base-queue.ts also gain conflicted: 0 so
type-check stays green per commit; actual conflict counting lands when
the per-item branching is added.

void return from processItem remains valid for backward compatibility.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A tab that crashes after atomically claiming a queue item but before
writing the completion update leaves the row in 'processing' forever.
The claim guard requires status === 'pending', so no other tab will
ever re-claim it. The watchdog runs at the top of sync() and resets
'processing' items older than processingTimeoutMs (default 60s) back
to 'pending'. processItem MUST be idempotent for safe reclaim — that
side of the contract is enforced for payment_intents in a follow-up
commit (idempotency_key partial unique index).

Two regression cases pin the contract:
- Reclaim: 70s-old processing row is recovered and processed.
- Negative: 30s-old processing row is left alone.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
processItem may now return { status: 'conflicted' } to signal that the
work was already done server-side (typical case: an idempotency-key
INSERT that hit ON CONFLICT DO NOTHING). The queue row still marks
completed, but SyncResult tracks the dedupe count separately for
observability. void return remains valid and is treated as
{ status: 'completed' } so existing subclasses don't break.

The abstract processItem signature now accepts ProcessItemResult | void,
so payment-adapter (next commit) can return the discriminator.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stable key per logical payment intent. Retries of a queued payment
INSERT now share the same key, so server-side ON CONFLICT DO NOTHING
makes them no-ops instead of duplicates. The actual upsert switch lands
in the next commit; this one just establishes the data flow.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ine retry safety

Replaces the prior insert() with upsert(..., { onConflict:
'idempotency_key', ignoreDuplicates: true }), which maps to INSERT ...
ON CONFLICT DO NOTHING server-side. A queued retry whose prior attempt
already created the row now returns zero rows; the adapter signals
{ status: 'conflicted' } and the queue row marks completed without
double-charging.

Older queued items (queued before the idempotency_key column shipped)
fall back to generating a fresh key inside executePaymentIntent and
log a warning. That retry chain will dedupe with itself; prior attempts
that ran without a key cannot dedupe against it. Acceptable for a
24-hour-expiry table.

subscription_update keeps void return — UPDATE is implicitly idempotent
by primary key, no dedupe distinction needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…acts

Two regression cases for the #52 retry-safety surface:
1. Queued idempotency_key flows into the upsert payload with the
   correct onConflict + ignoreDuplicates options.
2. A zero-row upsert response is treated as conflicted (the prior
   attempt's row is already there); the queue row still completes,
   but result.conflicted increments instead of result.success.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Captures the design rationale (why watchdog + idempotency must land
together) and the bite-sized task plan that produced the preceding
runtime commits. Future readers can reconstruct the decision boundary
without re-deriving it from PR descriptions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@TortoiseWolfe TortoiseWolfe merged commit 220ade5 into main Apr 27, 2026
28 checks passed
@TortoiseWolfe TortoiseWolfe deleted the 052/offline-queue-watchdog-idempotency branch April 27, 2026 15:17
TortoiseWolfe added a commit that referenced this pull request Apr 27, 2026
…andoff (#61)

Captures end-of-session state after 6 PRs landed (#54, #55, #56, #58,
#59, #60). Family A is empty (both stability hotspots resolved).
Family D1 done. Recommended next pickup: B1 (#43 /payment/result page).
The handoff doc is the load-bearing artifact for the next operator —
it lists open issues by family, sharp edges, and a copy-pasteable
quick-start.

Co-authored-by: TurtleWolfe <TurtleWolfe@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
TortoiseWolfe added a commit that referenced this pull request Apr 28, 2026
…r + cooling, error categorization, offline banner, audit log (#43) (#63)

Closes 5 of 7 gaps in #43. Route /payment-result itself shipped 2026-04-16 (commit ffb33a1); this PR closes the retry-UX gaps obscured by the original 'route missing' framing (corrected in #62).

Closes (gaps 1-5 of #43):
1. Retry button regenerates idempotency_key. retryFailedPayment() now reuses parent's queued key; doubled clicks become server-side ON CONFLICT no-ops (matches PR #59's offline-queue dedupe).
2. No retry attempt counter or cooling. payment_intents gains retry_count + parent_intent_id; service refuses past RETRY_LIMIT=3 (FR-009); enforces COOLING_PERIOD_MS=30s (FR-010); custom error classes carry waitMs for UI countdown (FR-008).
3. No error categorization. New error-categorization.ts maps Stripe + PayPal error_code to one of FR-002's 8 categories with non-technical user message + resolution hint + recoverable flag (FR-001/003/006/NFR-001).
4. No offline error banner. New OfflineRetryBanner 5-file component reads useOfflineStatus + paymentQueue.getCount(); silent in steady state.
5. No audit log on retry. auth_audit_logs CHECK extended with payment_retry; every retry recorded with { original_intent_id, new_intent_id, retry_count, deduped }.

Bonus: PaymentRetryExpiredError refuses retry past parent's 24h provider session.

Deferred (gaps 6-7 to follow-up PR):
- US3 (Update Payment Method, FR-011-FR-015) — has stripe.js + PCI surface
- US4 (Guided Recovery Wizard, FR-016-FR-019) — largest piece

E2E test fixes during CI:
- 1ea0833 — Navigate online before flipping offline (static-export needs network for first load).
- a350833 — Synthesize navigator.onLine + 'offline' event manually. Playwright's setOffline blocks request traffic but doesn't fire the offline browser event in Firefox/WebKit; only Chromium does. Real browsers DO fire the event on real network drops, so synthesizing it is closer to production behavior and works cross-browser.

Verification:
- pnpm test: 3249/3249 across 286 files (60 new for B1)
- pnpm test:rls: 55/55 (new columns inherit existing policies)
- pnpm run type-check, lint: clean
- CI: 29/29 green incl. all three regression-target gen 3/6 shards (chromium, firefox, webkit)

Schema applied locally; cloud apply is reviewer's call (idempotent, safe to re-apply).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Gap-Audit] base-queue.ts: cross-tab process+complete not in single transaction

2 participants