feat: add niche + keyword_niche tables for feed diversification#3911
Conversation
Introduces the data structure for niche-based feed diversification: - `niche`: catalog of niches (ecosystem vs theme groups) used by the diversifier as the mental categories users group posts by. - `keyword_niche`: maps each keyword (tag) to a primary niche and an optional secondary niche, with a per-tag weightMultiplier (for dampening generic tags), confidence score, and labelerVersion for tracking re-labeling passes. The diversifier derives a post's niches from its tags via weighted vote; the taxonomy and derivation rules live in the brain doc `docs/feed-niche-taxonomy.md`. Migration creates the schema only — niche rows and keyword_niche mappings will be populated separately.
|
🍹 The Update (preview) for dailydotdev/api/prod (at 259e7b2) was successful. ✨ Neo ExplanationRoutine deployment rolling all workloads to commit `883ffa71`, with an additive database migration that creates the `niche` and `keyword_niche` tables for feed diversification. ✅ Low RiskThis PR introduces the The deployment follows the standard pattern: new DB and Clickhouse migration Jobs are created for the new commit hash ( 🔵 Info — The Resource Changes Name Type Operation
~ vpc-native-rotate-weekly-quests-cron kubernetes:batch/v1:CronJob update
- vpc-native-api-db-migration-e0909e03 kubernetes:batch/v1:Job delete
~ vpc-native-personalized-digest-cron kubernetes:batch/v1:CronJob update
~ vpc-native-channel-digests-cron kubernetes:batch/v1:CronJob update
~ vpc-native-sync-subscription-with-cio-cron kubernetes:batch/v1:CronJob update
~ vpc-native-generate-search-invites-cron kubernetes:batch/v1:CronJob update
~ vpc-native-clean-stale-user-transactions-cron kubernetes:batch/v1:CronJob update
~ vpc-native-check-analytics-report-cron kubernetes:batch/v1:CronJob update
~ vpc-native-temporal-deployment kubernetes:apps/v1:Deployment update
~ vpc-native-validate-active-users-cron kubernetes:batch/v1:CronJob update
~ vpc-native-update-source-public-threshold-cron kubernetes:batch/v1:CronJob update
~ vpc-native-private-deployment kubernetes:apps/v1:Deployment update
~ vpc-native-user-profile-updated-sync-cron kubernetes:batch/v1:CronJob update
~ vpc-native-calculate-top-readers-cron kubernetes:batch/v1:CronJob update
~ vpc-native-generic-referral-reminder-cron kubernetes:batch/v1:CronJob update
~ vpc-native-clean-old-notifications-cron kubernetes:batch/v1:CronJob update
~ vpc-native-clean-expired-better-auth-sessions-cron kubernetes:batch/v1:CronJob update
~ vpc-native-update-tags-str-cron kubernetes:batch/v1:CronJob update
- vpc-native-api-clickhouse-migration-e0909e03 kubernetes:batch/v1:Job delete
~ vpc-native-expire-super-agent-trial-cron kubernetes:batch/v1:CronJob update
~ vpc-native-user-profile-analytics-clickhouse-cron kubernetes:batch/v1:CronJob update
+ vpc-native-api-db-migration-883ffa71 kubernetes:batch/v1:Job create
+ vpc-native-api-clickhouse-migration-883ffa71 kubernetes:batch/v1:Job create
~ vpc-native-post-analytics-clickhouse-cron kubernetes:batch/v1:CronJob update
~ vpc-native-rotate-daily-quests-cron kubernetes:batch/v1:CronJob update
~ vpc-native-update-achievement-rarity-cron kubernetes:batch/v1:CronJob update
~ vpc-native-clean-channel-highlights-cron kubernetes:batch/v1:CronJob update
~ vpc-native-hourly-notification-cron kubernetes:batch/v1:CronJob update
~ vpc-native-channel-highlights-cron kubernetes:batch/v1:CronJob update
~ vpc-native-deployment kubernetes:apps/v1:Deployment update
~ vpc-native-ws-deployment kubernetes:apps/v1:Deployment update
~ vpc-native-post-analytics-history-day-clickhouse-cron kubernetes:batch/v1:CronJob update
~ vpc-native-materialize-monthly-best-post-archives-cron kubernetes:batch/v1:CronJob update
~ vpc-native-bg-deployment kubernetes:apps/v1:Deployment update
~ vpc-native-user-profile-analytics-history-clickhouse-cron kubernetes:batch/v1:CronJob update
~ vpc-native-clean-zombie-user-companies-cron kubernetes:batch/v1:CronJob update
~ vpc-native-clean-zombie-users-cron kubernetes:batch/v1:CronJob update
~ vpc-native-clean-zombie-opportunities-cron kubernetes:batch/v1:CronJob update
~ vpc-native-clean-zombie-images-cron kubernetes:batch/v1:CronJob update
~ vpc-native-user-posts-analytics-refresh-cron kubernetes:batch/v1:CronJob update
~ vpc-native-update-tag-materialized-views-cron kubernetes:batch/v1:CronJob update
... and 12 other changes |
- niche.id becomes uuid (generated via uuid_generate_v4), matching the convention used by other generated-id tables in daily-api. - adds niche.slug — stable human-readable identifier (e.g. "js_ts") with a unique index. The labeling pipeline references niches by slug; keyword_niche FKs reference id. - keyword_niche.primaryNicheId / secondaryNicheId become uuid.
Summary
Introduces the data structure (schema only) for niche-based feed diversification. Niches are the mental categories a user uses when saying "stop showing me X" — the unit at which the upcoming diversifier penalizes feed repetition.
New tables
niche— catalog of niches, grouped into:ecosystem— stack identity (js_ts,rust,python, …)theme— cross-stack topics (ai_llm,sec_threats,cloud, …)keyword_niche— maps each keyword (tag) to:primaryNicheId(NOT NULL)secondaryNicheId(nullable, must differ from primary)weightMultiplier— per-tag dampening (e.g. 0.3 for generic tags likeprogramming)confidence— 1=guess, 2=likely, 3=clear (for prioritizing audits)labelerVersion— identifier of the labeler/taxonomy version that produced the rowHow the diversifier will use it
A post's niches are derived from the niches of its tags via weighted vote (IDF ×
weightMultiplier, plus ecosystem boost). The MMR-style re-ranker will then penalize candidate posts whose primary or secondary niche overlaps with a higher-ranked post in the same response.Taxonomy spec and derivation rules live in the brain doc
docs/feed-niche-taxonomy.md.Scope of this PR
keyword_nichemappings are inserted; data will be populated separately.Migration
1779878138923-PostNiches.tscreates both tables with FKs tokeyword.valueandniche.id, plus check constraints onbucketGroup,confidence, and the primary ≠ secondary invariant. Indexes onprimaryNicheIdandsecondaryNicheIdfor join-from-niche queries (analytics + future per-niche listings).