You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Single tracking surface for everything we need before, and after, the first paying customer touches the system. Each item links out to its own implementation Issue. This Issue stays open until all sub-items are complete.
Documented rollback runbook — exact commands per environment. Update developer/m8-runbook.md. File as Issue.
API contract fixtures + tests — Vapi, Stripe, Telnyx response schemas validated by Vitest. Catches "vendor changed validator" the moment it happens. File as Issue.
Tier 2 — high-value, ~half-day each
Synthetic prod smoke test cron — Vapi outbound test call every 15 min against prod, verifies SMS+payment loop. Page on 3 consecutive failures. File as Issue.
Error alerting — Edge Function 5xx → Slack/email webhook. Today errors are invisible until a customer complains. File as Issue.
Real-time SLO dashboard — call success rate, SMS delivery rate, payment completion rate. Could start with a SQL query + cron, evolve into Grafana later. File as Issue.
Tier 3 — medium-value, do after first paying customer
Feature flags per-restaurant — `experimental_*` columns on `restaurants`. Roll out new behavior on Sui's first, watch a week, then enable for others. File as Issue.
E2E test suite against staging — Playwright/Vapi outbound API. Codifies the smoke checklist into automation. File as Issue.
Postmortem template — every prod incident gets a 1-pager. Add to developer/. File as Issue.
Tier 4 — aspirational
Hotfix flow documented — `hotfix/*` branches off `prod`, fixed, merged, cherry-picked back to `main`. File as Issue.
Canary / progressive rollout — for high-risk features, route subset of restaurants first. Probably overkill until 10+ tenants.
Multi-region failover — out of scope for v1.
Anti-patterns we already learned the hard way (don't repeat)
Same person reviewing their own changes — Greptile is the substitute reviewer. Don't merge PRs Greptile flagged P1 without addressing or explicitly accepting risk in the reply.
Deploys triggered by green CI alone — CI catches code regressions, not config or contract regressions. Both need a real call against staging to surface.
Mixing infra changes with feature changes in one PR — `serverMessages` snuck into a feature PR's scope and broke calls. Infra deserves its own PRs.
How to use this Issue
Each unchecked sub-item should have its own Issue when ready to work on.
Update this Issue's checkboxes as sub-Issues land.
When all Tier 1 + Tier 2 are checked, this Issue gets closed.
Definition of done for this umbrella
All Tier 1 and Tier 2 items closed AND verified on staging AND a production smoke checklist has been run end-to-end with no regressions for two consecutive releases.
Goal
Single tracking surface for everything we need before, and after, the first paying customer touches the system. Each item links out to its own implementation Issue. This Issue stays open until all sub-items are complete.
Tier 1 — must-have, cheap, do this week
Tier 2 — high-value, ~half-day each
Tier 3 — medium-value, do after first paying customer
Tier 4 — aspirational
Anti-patterns we already learned the hard way (don't repeat)
How to use this Issue
Definition of done for this umbrella
All Tier 1 and Tier 2 items closed AND verified on staging AND a production smoke checklist has been run end-to-end with no regressions for two consecutive releases.