Skip to content

feat: add topic dedup check to ingest cron#619

Merged
codercatdev merged 1 commit intodevfrom
feat/ingest-dedup
Mar 5, 2026
Merged

feat: add topic dedup check to ingest cron#619
codercatdev merged 1 commit intodevfrom
feat/ingest-dedup

Conversation

@codercatdev
Copy link
Contributor

Summary

Before creating a new contentIdea, the ingest cron now walks the ranked trends list and skips any topic that's already been covered. This prevents duplicate content from being generated day after day for the same hot topic.

Spec: Content Idea Dedup (RD0ib2Hx)
Task: 00nBkb6e

Changes

app/api/cron/ingest/route.ts

  • extractSearchTerms(title) — Extracts 2-3 meaningful keywords from a topic title, stripping ~100 common stop words
  • isTopicAlreadyCovered(topic, topics) — Queries Sanity for existing contentIdea + automatedVideo docs within the configurable dedup window:
    • GROQ match operator for title prefix matching with wildcard patterns
    • Topic tag overlap check (≥2 matching tags = duplicate)
    • Configurable via dedupWindowDays from contentConfig (default 90 days, 0 to disable)
    • Graceful degradation: query failures log a warning and allow the topic through
  • isSlugTaken(slug) — Checks for existing automatedVideo with the same slug to prevent URL collisions
  • Step 1.5 dedup loop — After trend discovery, walks the ranked list highest-score-first:
    • Skips covered topics with logging
    • Returns early with { success: true, skipped: true } if ALL topics are already covered
    • Tracks skippedCount in response JSON
  • buildPrompt([selectedTrend]) — Now passes only the single selected trend (not the full array)
  • createSanityDocuments — Signature changed from trends: TrendResult[] to selectedTrend: TrendResult

sanity/schemas/singletons/contentConfig.ts

  • Added dedupWindowDays field (number, default 90, validation 0-365)
  • Description explains behavior and how to disable (set to 0)

Files changed (2 only)

  • app/api/cron/ingest/route.ts
  • sanity/schemas/singletons/contentConfig.ts

- Add isTopicAlreadyCovered() with GROQ title match + topic overlap
- Add isSlugTaken() for slug collision detection
- Walk ranked trends list, skip already-covered topics
- Add dedupWindowDays field to contentConfig (default 90, 0 to disable)
- Pass single selectedTrend to buildPrompt and createSanityDocuments
- Graceful degradation: dedup failures don't block pipeline
@vercel
Copy link

vercel bot commented Mar 5, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
codingcat-dev Ignored Ignored Mar 5, 2026 2:17pm

@codercatdev codercatdev merged commit 35fb02c into dev Mar 5, 2026
2 of 3 checks passed
codercatdev added a commit that referenced this pull request Mar 5, 2026
Release: ingest dedup + env var cleanup
@codercatdev codercatdev deleted the feat/ingest-dedup branch March 5, 2026 14:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant