Skip to content

fix(lark): durable DB dedup for due-date reminder (stop repeat DMs)#3

Merged
JOBYINC merged 1 commit into
previewfrom
fix/lark-due-reminder-durable-dedup
May 16, 2026
Merged

fix(lark): durable DB dedup for due-date reminder (stop repeat DMs)#3
JOBYINC merged 1 commit into
previewfrom
fix/lark-due-reminder-durable-dedup

Conversation

@JOBYINC
Copy link
Copy Markdown
Owner

@JOBYINC JOBYINC commented May 16, 2026

Problem

lark_due_reminder_task (hourly Celery beat) was DM'ing the same
assignee about the same issue dozens of times. The reported cause —
"missing 已提醒 mark" — is wrong: a mark existed, but it was a 25h
Redis key (cache.get/set). When the deployment's cache is
unavailable / non-shared across workers / evicting, the key never
sticks, so every hourly run re-detects the issue and re-sends.

Fix

Replace the cache-only dedup with a durable LarkDueReminderLog
row, claimed atomically via get_or_create on a unique
(issue, receiver, stage, reminder_date) constraint:

  • Idempotent across the hourly cadence and across concurrent beat
    workers, regardless of cache health — fixed by construction, no DO
    access needed.
  • Mirrors the existing EmailNotificationLog sibling +
    WorkspaceMember's soft-delete-aware unique idiom
    (unique_together incl. deleted_at + partial
    UniqueConstraint(condition=deleted_at__isnull=True)).
  • A claim is released via hard delete on any failure, so a later
    run still retries — the once/stage/day slot is consumed only on a
    confirmed send (same intent as the old Redis check, now durable).
    Hard delete (not soft) because a failed claim has no audit value and
    soft-delete would leave a ghost row + queue a pointless cascade task
    per failure.

Changes

  • LarkDueReminderLog model + migration 0124
  • lark_due_reminder_task: Redis dedup → get_or_create claim; docstring updated
  • 3 pytest cases

Test plan

  • pytest plane/tests/unit/bg_tasks/test_lark_due_reminder_task.py3/3 pass locally:
    • sends once across 3 task runs (not once-per-run) ✅
    • failed send releases the claim, later run retries ✅
    • no-op when LARK_NOTIFICATIONS_ENABLED unset ✅
    • ⚠️ repo CI runs no Python suite — this was verified locally only; reviewer should re-run.
  • manage.py check — 0 issues; makemigrations --check — model fully captured by 0124
  • Deploy note: takes effect once deployed and migration 0124 applied on the environment running the beat.

Reviewer notes

  • Pre-existing automation RenameIndex migration drift (would be 0125) is deliberately excluded — unrelated to this fix, predates it, belongs to the automation-merge owner (same pre-existing-debt family as the documented check:types/check:format items on preview).
  • Pure Python change; no JS/TS touched.

lark_due_reminder_task deduped only via a 25h Redis key. When the
deployment cache is unavailable / non-shared / evicting the key is lost
and the hourly beat re-DMs the same assignee for the same issue every
hour (observed: dozens of repeats). The original "missing mark"
diagnosis was wrong -- a mark existed but was not durable.

Replace the Redis dedup with a durable LarkDueReminderLog row claimed
atomically via get_or_create on a unique (issue, receiver, stage,
reminder_date) constraint -- idempotent across the hourly cadence and
across concurrent beat workers, surviving cache loss/restart by
construction. Mirrors EmailNotificationLog + WorkspaceMember's
soft-delete-aware unique idiom. A claim is released via hard delete on
any failure so a later run still retries (slot consumed only on a
confirmed send -- same intent as before, now durable).

- model LarkDueReminderLog + migration 0124
- task: Redis cache dedup -> get_or_create claim; docstring updated
- 3 pytest cases (dedup across runs, retry-on-failure, disabled),
  verified locally 3/3 (repo CI runs no python suite)

Pre-existing automation RenameIndex migration drift (would be 0125) is
deliberately NOT included -- unrelated, belongs to the automation-merge
owner.
@JOBYINC
Copy link
Copy Markdown
Owner Author

JOBYINC commented May 16, 2026

Admin-merging: CI fully green (this PR touches no JS/TS, so the pre-existing frontend check:types/check:format debt on preview is not in its --affected set; Python side clean).

Reviewer caveat carried from the description: the repo CI runs no Python suite, so the dedup logic was verified locally — pytest 3/3 (dedup-across-runs, retry-on-failure, disabled-noop), not by CI. The spam only actually stops once this is deployed and migration 0124 is applied on the host running the Celery beat.

@JOBYINC JOBYINC merged commit 16f90a1 into preview May 16, 2026
9 checks passed
JOBYINC pushed a commit that referenced this pull request May 17, 2026
Re-based onto the lark-stable line (this branch builds the production
ghcr.io/jobyinc/plane-*:lark-stable images). Same fix as preview PR #3
(commit 1cb3908): replace the non-durable 25h Redis dedup in
lark_due_reminder_task with a durable LarkDueReminderLog row claimed
atomically via get_or_create on a unique
(issue,receiver,stage,reminder_date) constraint; claim released via
hard delete on failure so retry still works. Mirrors
EmailNotificationLog + WorkspaceMember soft-delete-unique idiom.

Migration = 0123_larkduereminderlog depending on
0122_automationrule_automationrulerun (this branch's actual head; it
has no 0123_workitemfield — that is custom-fields/preview only).

Verified locally pytest 3/3 on the source branch. Repo CI runs no
python test suite. Effect requires this image rebuilt + droplet pull +
the 0123 migration applied.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant