Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions developer/developer-journal.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,31 @@ Chronological record of implementation decisions, changes made, and why. Most re

---

## May 2, 2026 — M8 Path A demo proven live; two ship-blockers fixed in flight; one customer-facing 404 deferred

**What was done:**

- Closed the M8 demo end-to-end on the live cloud stack: cloud Supabase project `klzznfagrtormretqsgb` provisioned, migrations + seed applied, both apps deployed at `admin.dialtone.menu` + `kitchen.dialtone.menu` with TLS, Stripe Connect platform active with Sui's Sushi as connected account `<stripe-connected-account-id>`, Telnyx 10DLC fully wired (brand `<telnyx-tcr-brand-id>` verified, campaign `<telnyx-tcr-campaign-id>` Active, messaging profile created, DID `+16296001047` Active), Stripe webhook registered against the cloud Edge Function URL. Path A demo path executed live: admin manual card order → Telnyx SMS → Stripe Checkout (test mode) → `checkout.session.completed` webhook → order flipped to `paid` → Supabase Realtime push → kitchen tablet shows order in **New** column → tap-advance through Preparing → Ready → Completed all worked.
- Fixed two ship-blockers that surfaced only under live conditions and weren't caught by integration tests:
- **CORS preflight** — `_shared/http.ts` was authored in M6 for Vapi-only server-to-server traffic and explicitly omitted CORS handling. M7 added `admin_create_manual_order` which is called from the admin SPA in a browser; the missing OPTIONS preflight response blocked the actual POST silently (no Edge Function invocation, no useful error in the UI). Added a `corsHeaders` constant + `handlePreflight()` helper in `_shared/http.ts`; `admin_create_manual_order` now calls `handlePreflight(req)` before any other logic. Future browser-callable Edge Functions need the same call. Shipped on `main` directly because the admin app was unusable without it.
- **Auth init bootstrap (PR #1)** — both `apps/admin/src/lib/auth-context.tsx` and `apps/kitchen/src/app.tsx` relied solely on `supabase.auth.onAuthStateChange` firing `INITIAL_SESSION` on mount to flip the `loading` flag to false. In some browser/SDK combinations a restored session never produces that event, leaving the spinner stuck forever after a refresh. Bootstrap now calls `supabase.auth.getSession()` synchronously on mount and runs the same handler; the listener early-returns on `INITIAL_SESSION` to avoid duplicate DB queries. Greptile flagged a P1 (loading flash) and P2 (duplicate queries) on the initial PR — both addressed in the same branch before merge.
- Documented the live-deploy lessons learned in `developer/m8-live-demo-checklist.md` and refreshed `docs/project-status.md` to mark M8 Path A as proven and call out the two M8 fast-follows. Cross-noted the customer-facing redirect work in the sibling `dialtone_menu/` repo's `AGENTS.md` so the team there can pick it up.

**Decisions and rationale:**

- **Three parallel Stripe environments are real and silently break things.** Modern Stripe accounts have live mode, legacy `/test/...` test mode, and Workbench Sandboxes — three independent envs each with their own API keys, webhooks, and event streams. We burned over an hour debugging "why doesn't `checkout.session.completed` deliver" — the answer was that the webhook was registered in one env, the API key was from another, so events fired in env A and looked for a webhook in env B. Resolution: register webhooks via **Stripe Workbench Shell** (`stripe webhook_endpoints create ...`) which runs in whatever env you're currently viewing, guaranteeing key + webhook live in the same env. Captured in the M8 checklist's "Lessons learned" section.
- **Demo path bifurcated into A (admin manual order) and B (Vapi voice).** During Vapi assistant setup we discovered that `vapi_call_start` returns its own JSON shape — `{ status, prompt, tools, first_message, context }` — that does not match Vapi's expected `assistant-request` response contract `{ assistant: {...} }`. Additionally, Vapi sends multiple lifecycle event types (`call-start`, `end-of-call-report`, `function-call`, `status-update`) to a single server URL; our function only handles `call-start`-shaped requests. The 60 integration tests in `packages/shared/test/db/voice.test.ts` pass against our own request/response contract, not Vapi's. Rather than rewrite under demo time pressure, deferred the voice path to a focused fast-follow ("Path B") and proved the same end-to-end plumbing via the admin manual order flow ("Path A") which is fully wired through M7 and uses the exact same Stripe + Telnyx + kitchen-Realtime path.
- **Skip `INITIAL_SESSION` in the listener after `getSession()` bootstrap.** Initial reflex was to let both paths run and rely on the handler being idempotent. Greptile correctly flagged that this doubles DB queries on every mount (admin: 6 instead of 3; kitchen: 4 instead of 2) and creates a possible "board → spinner → board" flash if timing aligns badly. Cleaner pattern: bootstrap from `getSession()`, listener handles only subsequent events (`SIGNED_IN`, `SIGNED_OUT`, `TOKEN_REFRESHED`, `USER_UPDATED`). The original bug we set out to fix still resolves because `getSession()` always runs.

**Follow-ups / known issues:**

- **Stripe `success_url` 404 — customer-facing.** `admin_create_manual_order` builds `success_url = ${DIALTONE_PUBLIC_BASE_URL}/orders/${orderId}/paid` (and `cancel_url` similarly). Default `DIALTONE_PUBLIC_BASE_URL` is `https://dialtone.menu` — the marketing site, which has no `/orders/...` route. A paying customer hits the marketing 404 page after Stripe redirects them. **Webhook fires regardless** so the order itself flips to `paid` correctly; only the customer-facing landing is broken. Architectural decision: customer-facing pages belong in the `dialtone_menu/` repo (proper branding, lightweight bundle, customer-facing domain) — not in the admin app. Two static routes need to be added there: `/orders/:id/paid` (success) and `/orders/:id/cancel` (cancel). Both render static thank-you / cancellation copy; they do **not** hit any database (no cross-repo Supabase access, opaque UUID is just a confirmation token). Cross-noted in `dialtone_menu/AGENTS.md`. **Must ship before any real paying customer.**
- **Path B — Vapi voice integration.** Rewrite `vapi_call_start` to return Vapi's `assistant-request` response shape `{ assistant: { model: { messages, tools }, firstMessage } }`; add `message.type` dispatch so the same function handles `call-start`, `end-of-call-report`, etc. Update integration tests to assert on Vapi's actual webhook contract instead of our internal one. Estimated 4–6 hours of focused work. Until done, the voice path is not live — only the admin manual order path proves the demo end-to-end.
- **The two stuck `pending_payment` orders** from today's debugging (`941926c4-...` and `c3b3d220-...`) live in legacy test mode environments where no webhook is registered to handle their eventual `checkout.session.expired` events. They'll stay `pending_payment` indefinitely. Manual cleanup or wait for them to age out; either way orphan test data, not a correctness concern.
- **Telnyx number provisioning resolved on its own.** The DID `+16296001047` flipped from "Pending" to "Active" within ~2h of being attached to the campaign. SMS delivery worked even during the Pending window — Telnyx allows sends to verified numbers (e.g. the account holder's own phone) before full provisioning completes.

---

## April 19, 2026 — PR #18 review fixes for M7 Stripe + SMS wiring

**What was done:**
Expand Down
80 changes: 70 additions & 10 deletions developer/m8-live-demo-checklist.md
Original file line number Diff line number Diff line change
Expand Up @@ -257,14 +257,74 @@ With everything wired, run the full path:

All items below must be checked before M8 is marked fully complete:

- [ ] Cloud Supabase project provisioned; all 6 migrations applied; seed data present
- [ ] All Edge Function secrets set (`STRIPE_SECRET_KEY`, `STRIPE_WEBHOOK_SECRET`, `TELNYX_API_KEY`, `TELNYX_FROM_NUMBER` and/or `TELNYX_MESSAGING_PROFILE_ID`)
- [ ] GitHub Actions secrets set (6 values)
- [ ] `deploy.yml` workflow ran green; admin + kitchen + Edge Functions all deployed
- [ ] `admin.dialtone.menu` and `kitchen.dialtone.menu` load over HTTPS with valid TLS
- [ ] Stripe webhook registered; `STRIPE_WEBHOOK_SECRET` updated and functions re-deployed
- [ ] Stripe Connect platform account active; Sui's Sushi connected account wired
- [ ] Telnyx DID provisioned (`+16296001047`); 10DLC brand + campaign submitted; messaging profile created and attached
- [ ] Vapi assistant created; all tool webhooks pointing to cloud Edge Function URLs; phone number assigned
- [ ] Full demo path executed end-to-end (voice → order → SMS → Stripe payment → kitchen board)
- [x] Cloud Supabase project provisioned; all 6 migrations applied; seed data present (project `klzznfagrtormretqsgb`)
- [x] All Edge Function secrets set (`STRIPE_SECRET_KEY`, `STRIPE_WEBHOOK_SECRET`, `TELNYX_API_KEY`, `TELNYX_FROM_NUMBER`)
- [x] GitHub Actions secrets set (6 values)
- [x] `deploy.yml` workflow ran green; admin + kitchen + Edge Functions all deployed
- [x] `admin.dialtone.menu` and `kitchen.dialtone.menu` load over HTTPS with valid TLS
- [x] Stripe webhook registered; `STRIPE_WEBHOOK_SECRET` set and functions re-deployed
- [x] Stripe Connect platform account active; Sui's Sushi connected account wired (`<stripe-connected-account-id>`)
- [x] Telnyx DID `+16296001047` provisioned and Active; 10DLC brand verified (`<telnyx-tcr-brand-id>`); campaign Active (`<telnyx-tcr-campaign-id>`); messaging profile created and attached
- [ ] Vapi assistant created; all tool webhooks pointing to cloud Edge Function URLs; phone number assigned (deferred — see Path B note below)
- [x] **Path A demo** executed end-to-end: admin order → Telnyx SMS → Stripe Checkout → webhook → kitchen Realtime board update → tap-advance flow
- [ ] Full **voice** demo path (Vapi inbound call → order → SMS → kitchen) — deferred to Path B fast-follow
- [ ] All three failure modes validated (abandoned, declined, replay)

---

## Lessons learned (May 2, 2026)

Things that bit during the live deploy and aren't obvious from M7's local-stack experience. **Read this before doing the next live deploy** (e.g. when onboarding a second restaurant or spinning up a staging environment).

### Stripe environment proliferation

Modern Stripe has **three** parallel environments per account, each with its own API keys, webhooks, and event streams that do **not** cross-pollinate:

1. **Live mode** — real money, accessed via `dashboard.stripe.com/dashboard`
2. **Legacy test mode** — accessed via `dashboard.stripe.com/test/...` URLs
3. **Sandboxes** — accessed via Workbench, isolated workspaces (each with its own keys)

If your `STRIPE_SECRET_KEY` is from one env and your webhook is in another, events fire in the key's env and never reach the webhook — **silently**, with `pending_webhooks: 0` on the events. Symptom: payments succeed in Stripe Checkout but the order stays at `pending_payment` forever.

**Resolution pattern that worked:** register the webhook **via Stripe Workbench Shell** instead of the dashboard UI. The Shell's CLI runs in whatever env you're currently viewing — guarantees env match. Run:

```bash
stripe webhook_endpoints create \
--url "https://<project-ref>.supabase.co/functions/v1/stripe_webhook" \
--enabled-events "checkout.session.completed" \
--enabled-events "checkout.session.expired" \
--enabled-events "payment_intent.payment_failed"
```

Stripe returns the new webhook with its signing secret in the response. That signing secret pairs with the keys in the same Workbench env.

### CORS preflight on browser-called Edge Functions

M6 designed `_shared/http.ts` for Vapi-only server-to-server traffic and explicitly omitted CORS handling. M7 added `admin_create_manual_order` which is called from the admin SPA — browsers send a CORS preflight OPTIONS that the function didn't answer, blocking the actual POST. Symptom: admin "Create & send SMS link" button never returns; Edge Function logs show no invocation.

**Resolution:** added `corsHeaders` + `handlePreflight()` in `_shared/http.ts`; `admin_create_manual_order` calls `handlePreflight(req)` before any other logic. Future browser-callable Edge Functions need the same call.

### Auth init bootstrap

Both apps relied on `onAuthStateChange` firing `INITIAL_SESSION` on mount to flip the loading state. In some browser/SDK combinations a restored session never fires the event, leaving the spinner stuck forever after a refresh. Symptom: `admin.dialtone.menu` and `kitchen.dialtone.menu` showed "Loading..." indefinitely after a page refresh, despite all REST + WebSocket requests returning 200.

**Resolution (PR #1):** bootstrap the session from `supabase.auth.getSession()` synchronously on mount; subscribe to `onAuthStateChange` for subsequent events but early-return on `INITIAL_SESSION` so we don't double-fire DB queries. Pattern lives in `apps/admin/src/lib/auth-context.tsx` and `apps/kitchen/src/app.tsx`.

### Stripe Checkout success_url 404

`admin_create_manual_order` builds `success_url = ${DIALTONE_PUBLIC_BASE_URL}/orders/${orderId}/paid`. Default `DIALTONE_PUBLIC_BASE_URL` is `https://dialtone.menu` — the **marketing site**, which has no `/orders/...` route. Symptom: customer pays successfully, then lands on the marketing site's 404 page. The order still flips to `paid` (webhook is independent), but the customer-facing experience is broken.

**Open issue.** Two options for the fix:

- Set `DIALTONE_PUBLIC_BASE_URL=https://admin.dialtone.menu` and add a `/orders/:id/paid` route to the admin app showing a "thank you" view. Simplest and uses an existing app.
- Add a `/orders/:id/paid` route to the marketing site (sibling `dialtone_menu/` repo). Better customer UX but needs cross-repo coordination.

Either way, set `DIALTONE_PUBLIC_BASE_URL` in cloud Supabase secrets and ship the matching route. **Do not ship to a real customer with the 404 redirect in place.**

### Vapi voice integration deferred (Path B)

`vapi_call_start` returns its own JSON shape `{ status, prompt, tools, first_message, context }` that does **not** match Vapi's expected `assistant-request` response format `{ assistant: {...} }`. Additionally, Vapi sends multiple lifecycle event types (`call-start`, `end-of-call-report`, `function-call`, `status-update`) to a single server URL — our function only handles `call-start`-shaped requests.

The 60 integration tests pass against our own request/response shape, not Vapi's. **The Vapi/Telnyx voice path was never live-tested.**

For the M8 demo, voice was deferred ("Path A": admin manual order flow proves the same end-to-end plumbing minus the voice front-end). Path B is real engineering work — rewrite `vapi_call_start` to return Vapi-compliant assistant-request responses, add `message.type` dispatch for end-of-call, update integration tests against Vapi's actual webhook contract. Estimated 4–6 hours of focused work.
Loading
Loading