fix(staged_tasks): dedup promote_staged_task against active action_items#7092
fix(staged_tasks): dedup promote_staged_task against active action_items#7092builtbycnob wants to merge 3 commits intoBasedHardware:mainfrom
Conversation
Adds a normalized lookup that returns the first active (non-completed, non-deleted) action_item whose description matches a given string. Normalization is case-insensitive and strips legacy `[screen]` markers, mirroring the dedup rule used by `database.staged_tasks.create_staged_task` so the staged → action_item promotion path can avoid creating semantic duplicates. Streams active items (typically a small bounded set per user) rather than relying on a Firestore equality filter — Firestore cannot do case-insensitive matching without a normalized companion field. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Greptile SummaryThis PR closes the active duplicate action_item creation path by adding a case-insensitive description lookup ( Confidence Score: 4/5Safe to merge — the dedup guard is read-only before any write, the fallback is the existing creation path, and the change is well-tested. No P0/P1 issues found. Two P2 findings: staged-task metadata (due_at, priority) is silently dropped on dedup match, and the normalization helper is slightly more aggressive than create_staged_task's own dedup. Neither affects correctness of the primary fix. backend/database/staged_tasks.py — dedup skip path drops staged-task metadata; backend/database/action_items.py — normalization divergence from create_staged_task. Important Files Changed
Sequence DiagramsequenceDiagram
participant Client as Desktop Client (TaskPromotionService)
participant Router as POST /v1/staged-tasks/promote
participant StagedDB as database.staged_tasks
participant ActionDB as database.action_items
participant Firestore
Client->>Router: POST /promote
Router->>StagedDB: promote_staged_task(uid)
StagedDB->>Firestore: query top staged_task (completed=false, asc relevance)
Firestore-->>StagedDB: staged doc
StagedDB->>ActionDB: get_active_action_item_by_description(uid, desc)
ActionDB->>Firestore: query action_items (completed=false)
Firestore-->>ActionDB: stream docs
alt Duplicate found (normalized desc matches)
ActionDB-->>StagedDB: existing action_item
StagedDB->>Firestore: update staged_task {completed, promotion_skipped, promoted_to}
StagedDB-->>Router: existing action_item
Router-->>Client: {promoted: true, promoted_task: existing}
else No duplicate
ActionDB-->>StagedDB: None
StagedDB->>ActionDB: create_action_item(uid, data)
ActionDB->>Firestore: add() new doc id
StagedDB->>Firestore: update staged_task {completed, promoted_at}
StagedDB-->>Router: new action_item
Router-->>Client: {promoted: true, promoted_task: new}
end
|
| existing = action_items_db.get_active_action_item_by_description(uid, staged['description']) | ||
| if existing is not None: | ||
| col.document(staged['id']).update( | ||
| { | ||
| 'completed': True, | ||
| 'promoted_at': datetime.now(timezone.utc), | ||
| 'promotion_skipped': 'duplicate', | ||
| 'promoted_to': existing['id'], | ||
| } | ||
| ) | ||
| logger.info( | ||
| "Skipped promotion of staged task %s for user %s — duplicate of action_item %s", | ||
| staged['id'], | ||
| uid, | ||
| existing['id'], | ||
| ) | ||
| return existing |
There was a problem hiding this comment.
Staged task metadata silently dropped on dedup
When a duplicate is detected, the staged task's richer fields (due_at, priority, category, relevance_score) are discarded — the existing action_item is returned as-is without merging any updated context. This is observable when a subsequent conversation extracts the same task but with a due date or priority the original action_item lacks; the user never sees that scheduling information applied.
Consider a lightweight merge step before returning existing:
update_fields = {}
for field in ('due_at', 'priority', 'category'):
if staged.get(field) is not None and not existing.get(field):
update_fields[field] = staged[field]
if update_fields:
update_fields['updated_at'] = datetime.now(timezone.utc)
action_items_db.update_action_item(uid, existing['id'], update_fields)Or at minimum document that these fields are intentionally not merged.
| def _normalize_description(desc: Optional[str]) -> str: | ||
| """Normalize a task description for case-insensitive duplicate matching. | ||
|
|
||
| Strips whitespace + the legacy ``[screen]`` prefix/suffix marker that the | ||
| AI promotion pipeline used to add to AI-generated tasks (still appears in | ||
| historical data and on tasks that round-tripped through ``migrate_ai_tasks``). | ||
| """ | ||
| if not desc: | ||
| return '' | ||
| s = desc.strip() | ||
| if s.startswith('[screen] '): | ||
| s = s[len('[screen] ') :] | ||
| if s.endswith(' [screen]'): | ||
| s = s[: -len(' [screen]')] | ||
| return s.strip().lower() |
There was a problem hiding this comment.
Normalization diverges from
create_staged_task
create_staged_task deduplicates with description.strip().lower() — no [screen] stripping. _normalize_description goes further by also stripping [screen] prefix/suffix. This means two staged tasks with descriptions "Email John" and "[screen] Email John" can coexist (different strings in create_staged_task's check), but at promotion time both normalize to "email john" and match the same action_item. The second staged task is silently closed as a duplicate rather than promoting its own doc.
In practice the result is correct (both descriptions refer to the same task), but the PR description's claim that this "mirrors create_staged_task's dedup logic" is slightly inaccurate, and the order of promotion determines which description ends up on the live action_item — a [screen]-tagged description could be promoted first, leaving a marker-polluted description in the user's action list.
Greptile review on BasedHardware#7092: P2 — Staged metadata silently dropped on dedup hit. When a later conversation extracts the same task with richer context (a due_at the user mentioned later, a priority assignment), the existing action_item never sees that information because we just return it as-is. Now we gather the staged-task fields the existing item is missing (due_at, priority, category) and call update_action_item before returning. The merge is best-effort — a write failure logs a warning but does not block the dedup itself. P2 — Normalization divergence with create_staged_task. The promote-time helper (_normalize_description) strips [screen] markers, but create_staged_task was using a plain strip().lower(). That meant "[screen] Email John" and "Email John" could coexist as two staged candidates and only one would actually promote — leaving a marker- polluted description on whichever one won the race. Use the same helper in both places so the dedup boundary is uniform. Cross-module use of action_items_db._normalize_description is fine within the database/ package; the underscore is a soft convention. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three new cases: - test_promote_merges_missing_fields_on_dedup: a staged task carrying due_at/priority/category populates the existing action_item only for fields that are MISSING (must not overwrite a priority the user already set). - test_promote_dedup_no_merge_when_existing_already_has_fields: no update_action_item call when the existing item is already populated. - test_create_staged_task_uses_normalized_dedup: regression for the divergence note — "[screen] Email John" matches an existing "Email John" staged task instead of producing a duplicate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Symptom
Users accumulate 5–6 duplicate action_items per task description within a few hours of active use. Confirmed on a production account: 8 distinct task descriptions × 5–6 backend duplicates each = ~50 phantom rows, all with distinct backendIds (the local
UNIQUEconstraint is honoured — these are genuinely separate Firestore documents).The desktop client's
TaskPromotionServiceruns a 5-minute safety-net timer (and fires on task complete/delete + app startup), each tick callingPOST /v1/staged-tasks/promote.Root cause
database.staged_tasks.promote_staged_taskcallsdatabase.action_items.create_action_itemunconditionally:create_action_itemdoesaction_items_ref.add(action_item_data)→ Firestore allocates a fresh document id on every call. There is no idempotency, no content-hash key, no description-based lookup.create_staged_taskalready dedups its own collection (case-insensitive description match), so a single description never spawns more than one staged task at a time. But the promotion handoff loses that property: every conversation that re-mentions the same task → new staged task → eventual promote → fresh action_item, even if an active action_item with the same description already exists.This is the active duplicating path. The reproducible loop:
Fix
Two pieces in
database/:1.
database/action_items.py— new helperStreams active items (Firestore
completed == false) and applies a normalized comparison in Python. Bounded set per user; the streaming approach matches whatcreate_staged_taskalready does for its own collection.2.
database/staged_tasks.py— promote_staged_task guardThe staged task is closed (so it doesn't re-attempt promotion on the next tick) and tagged with breadcrumbs (
promotion_skipped,promoted_to) for forensics.Tests
12 new unit tests in
backend/tests/unit/test_staged_tasks_dedup.py, wired intobackend/test.sh. Covers:_normalize_description: whitespace, case,[screen]prefix/suffix,None/emptyget_active_action_item_by_description: empty set, case-insensitive match, deleted skip, unrelated skip,[screen]-marker normalization on both sides of comparisonpromote_staged_task: skip on duplicate (returns existing, tags staged withpromotion_skipped+promoted_to), happy-path creation, empty staged returnsNoneHeavy deps (
firebase_admin,google.cloud.firestore*) are stubbed at import-time, matching the pattern intests/unit/test_action_item_dedup.py.What this fixes (and what it does not)
TaskPromotionService5-min timer) will stop accumulating new phantom action_items.create_staged_task) are not regressed — the helper is consistent with their normalization.deleted=truefor affected users).create_action_itemitself. Other callers (routers/action_items.py:create_action_item,routers/developer.py:create_action_item,tools/action_item_tools.py:create_action_item_tool) can still produce content-equivalent duplicates if the client retries a request. A follow-up adding optional content-hash idempotency is described in fix(desktop/sync): adopt orphan action_items by description only #7091's body.Companion PR (desktop client): #7091 — fixes the orphan-adoption mismatch that produces the
1 manual + Nshape on the local SQLite.Risk
Low. The dedup is read-only against the same collection that
promote_staged_taskalready reads (it streams active items viaget_action_itemsupstream too). The fallback is the existing path.promotion_skipped/promoted_toare net-new fields, opt-in to read.If two staged tasks with the same description race to promote concurrently, both will see "no existing match" before either commits — at most one extra duplicate slips through compared to the pre-fix state, and it'll be caught on the next tick (one of them now has an active action_item).
Test plan
pytest tests/unit/test_staged_tasks_dedup.py -v(12 passed)black --line-length 120 --skip-string-normalizationclean/v1/staged-tasks/promote; confirm response returns existing item, no new Firestore document, staged doc haspromotion_skipped='duplicate'andpromoted_to=<id>.promotion_skippedis not set.🤖 Generated with Claude Code