feat: email outbox#60858
Conversation
|
Size Change: 0 B Total Size: 80.9 MB ℹ️ View Unchanged
|
Prompt To Fix All With AIFix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
products/conversations/frontend/components/Chat/Message.tsx:77
The tooltip text says "It will be retried automatically" but this badge is only shown when `emailDeliveryStatus === 'failed'`, which is set exclusively by `_mark_outbox_failed` — a code path that writes `status = FAILED_PERMANENT`. Rows in that terminal state are filtered out of every sweep query (`status=PENDING`), so no automatic retry ever happens. An agent reading this tooltip would wait indefinitely for a retry that never comes instead of fixing the email config.
```suggestion
<Tooltip title="We couldn't deliver this email reply. Please check the email channel settings and contact support if the issue persists.">
```
### Issue 2 of 2
products/conversations/backend/tasks.py:436-441
This is a blind read-modify-write on a JSONB field. The `item_context` argument is the in-memory value from `select_related` — snapshotted at query time before the Mailgun call. If `item_context` is concurrently changed in the DB (e.g., an agent toggles `is_private` on the comment while the sweeper is sending), this update silently overwrites the intermediate state because it replaces the whole dict instead of merging. Losing an `is_private=True` flip would expose a private note to the customer. A JSONB merge (`||` in SQL) sets only the target key without touching other keys.
```suggestion
def _set_comment_delivery_status(team_id: int, comment_id: str, status_value: str) -> None:
"""Denormalize delivery status onto the comment's item_context so the agent UI can
show a sending/failed badge. Uses a queryset update to avoid re-firing Comment signals.
"""
import json
from django.db.models.expressions import RawSQL
CommentModel.objects.filter(id=comment_id, team_id=team_id).update(
item_context=RawSQL(
"COALESCE(item_context, '{}') || %s::jsonb",
[json.dumps({"email_delivery_status": status_value})],
)
)
```
Reviews (1): Last reviewed commit: "feat: email outbox" | Re-trigger Greptile |
Migration SQL ChangesHey 👋, we've detected some migrations on this PR. Here's the SQL output for each migration, make sure they make sense:
|
🔍 Migration Risk AnalysisWe've analyzed your migrations for potential risks. Summary: 1 Safe | 0 Needs Review | 0 Blocked ✅ SafeBrief or no lock, backwards compatible Last updated: 2026-06-01 11:48 UTC (8b9e42d) |
|
🎭 Playwright report · View test results →
These issues are not necessarily caused by your changes. |
There was a problem hiding this comment.
The PR introduces a new durable outbox data model with a migration, a periodic sweeper task, and refactored email delivery logic — solid work, but it's over the auto-review size ceiling and needs a human to sanity-check the full flow. The two bot P1 concerns (misleading tooltip and JSONB blind write) both appear addressed in the current diff. Please request a review from a team member on the conversations product team before re-requesting auto-approval.
Add delivery status bullet point and new subsection to the email channel docs, reflecting the Postgres-backed email outbox introduced in PostHog/posthog#60858. Covers Sending…, Sent, and Failed to send states, retry behavior (up to 5 days), and common failure causes.
Problem
Outbound support email replies were only as durable as the Celery broker: a reply
was sent via a task with a handful of short-spaced retries. If Mailgun was
unavailable for an extended period — e.g. an account block spanning a weekend —
those retries exhausted within minutes and the agent's reply was silently dropped.
The agent had no signal that delivery failed.
Changes
Introduce a Postgres-backed outbox so outbound delivery survives a multi-day
provider outage:
EmailOutboxMessagemodel — one row per reply comment, created in the sametransaction as the comment, so the reply is recoverable even if the broker never
picks up the immediate send task.
send_email_replyis now an idempotent worker keyed on the outbox row (claims ashort lease, sends, records the outcome). It reuses a stable
Message-IDacrossattempts so threading/dedup stay consistent.
flush_pending_email_repliesperiodic task (every minute) re-drives pendingrows with exponential backoff capped at ~1h, giving up after 5 days
(
failed_permanent). Bounded to one batch per run to keep each invocation short.item_contextand surfaced inthe ticket UI as a "Sending…" / "Failed to send" badge.
Transient errors no longer raise — they leave the row pending with a backed-off
next_attempt_at. Permanent errors (bad recipient, domain not registered, noconfig) mark the row terminally failed. Tradeoff: at-least-once delivery (a crash
between Mailgun accepting and us marking sent can resend), which the stable
Message-ID lets clients collapse.
How did you test this code?
Added/updated automated tests in
products/conversations/backend/api/tests/test_email_endpoints.py(parametrized forthe terminal-error and sweeper variations) covering: happy-path send, transient →
retry scheduling, terminal failures, Message-ID stability across attempts,
idempotency when already sent, outbox row created in the signal, and the sweeper's
send / skip-locked / give-up-after-max-age paths. Ran the
TestSendEmailReplyMultiConfigsuite (12 passing) plus ruff.