Skip to content

feat: one-time historical backfill for campaigns that pre-date system launch #18

@Jing-yilin

Description

@Jing-yilin

Problem

The nightly cron only captures campaigns sorted by newest, so any campaign that was already live before the system was first deployed is never ingested (it will have scrolled far past page 10 in the newest listing). Users who set keyword alerts on day 1 will get no matches for campaigns that have been running for weeks.

Expected Behaviour

A one-time (or periodic catch-up) backfill run that fetches campaigns across all sort orders and deeper page depths to seed the database with historically active campaigns.

Proposed Fix

  • Add a /admin/backfill endpoint (or a CLI flag) that triggers a deep crawl:
  • Run once after deploy; subsequent nightly crons maintain freshness
  • Rate-limit to avoid hammering ScrapingBee (existing RateLimiter can be reused)

Notes

  • One-time cost estimate: 15 categories × 3 sorts × 25 pages = 1,125 ScrapingBee requests × 5 credits = 5,625 credits (< 3% of monthly allowance)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions