Skip to content

feat: add niche + keyword_niche tables for feed diversification#3911

Merged
idoshamun merged 2 commits into
mainfrom
feat/post-niches
May 27, 2026
Merged

feat: add niche + keyword_niche tables for feed diversification#3911
idoshamun merged 2 commits into
mainfrom
feat/post-niches

Conversation

@idoshamun
Copy link
Copy Markdown
Member

Summary

Introduces the data structure (schema only) for niche-based feed diversification. Niches are the mental categories a user uses when saying "stop showing me X" — the unit at which the upcoming diversifier penalizes feed repetition.

New tables

  • niche — catalog of niches, grouped into:
    • ecosystem — stack identity (js_ts, rust, python, …)
    • theme — cross-stack topics (ai_llm, sec_threats, cloud, …)
  • keyword_niche — maps each keyword (tag) to:
    • primaryNicheId (NOT NULL)
    • secondaryNicheId (nullable, must differ from primary)
    • weightMultiplier — per-tag dampening (e.g. 0.3 for generic tags like programming)
    • confidence — 1=guess, 2=likely, 3=clear (for prioritizing audits)
    • labelerVersion — identifier of the labeler/taxonomy version that produced the row

How the diversifier will use it

A post's niches are derived from the niches of its tags via weighted vote (IDF × weightMultiplier, plus ecosystem boost). The MMR-style re-ranker will then penalize candidate posts whose primary or secondary niche overlaps with a higher-ranked post in the same response.

Taxonomy spec and derivation rules live in the brain doc docs/feed-niche-taxonomy.md.

Scope of this PR

  • Schema only. No niche rows or keyword_niche mappings are inserted; data will be populated separately.
  • No API/GraphQL changes yet — those land once we wire the diversifier.

Migration

1779878138923-PostNiches.ts creates both tables with FKs to keyword.value and niche.id, plus check constraints on bucketGroup, confidence, and the primary ≠ secondary invariant. Indexes on primaryNicheId and secondaryNicheId for join-from-niche queries (analytics + future per-niche listings).

Introduces the data structure for niche-based feed diversification:

- `niche`: catalog of niches (ecosystem vs theme groups) used by the
  diversifier as the mental categories users group posts by.
- `keyword_niche`: maps each keyword (tag) to a primary niche and an
  optional secondary niche, with a per-tag weightMultiplier (for
  dampening generic tags), confidence score, and labelerVersion for
  tracking re-labeling passes.

The diversifier derives a post's niches from its tags via weighted vote;
the taxonomy and derivation rules live in the brain doc
`docs/feed-niche-taxonomy.md`.

Migration creates the schema only — niche rows and keyword_niche
mappings will be populated separately.
@pulumi
Copy link
Copy Markdown

pulumi Bot commented May 27, 2026

🍹 The Update (preview) for dailydotdev/api/prod (at 259e7b2) was successful.

✨ Neo Explanation

Routine deployment rolling all workloads to commit `883ffa71`, with an additive database migration that creates the `niche` and `keyword_niche` tables for feed diversification. ✅ Low Risk

This PR introduces the niche and keyword_niche tables to support feed diversification — mapping keywords/tags to niche categories (ecosystem or theme) so the feed diversifier can penalize repetitive content by niche.

The deployment follows the standard pattern: new DB and Clickhouse migration Jobs are created for the new commit hash (883ffa71), the previous migration Jobs (e0909e03) are cleaned up, and all deployments/crons are rolled to the new image tag.

🔵 Info — The PostNiches migration adds two new tables with foreign keys: keyword_niche.keyword references keyword.value (cascade delete), and both niche ID columns reference niche.id (restrict delete). These are additive-only schema changes with no modifications to existing tables, so there is no risk to existing data.

Resource Changes

    Name                                                       Type                           Operation
~   vpc-native-rotate-weekly-quests-cron                       kubernetes:batch/v1:CronJob    update
-   vpc-native-api-db-migration-e0909e03                       kubernetes:batch/v1:Job        delete
~   vpc-native-personalized-digest-cron                        kubernetes:batch/v1:CronJob    update
~   vpc-native-channel-digests-cron                            kubernetes:batch/v1:CronJob    update
~   vpc-native-sync-subscription-with-cio-cron                 kubernetes:batch/v1:CronJob    update
~   vpc-native-generate-search-invites-cron                    kubernetes:batch/v1:CronJob    update
~   vpc-native-clean-stale-user-transactions-cron              kubernetes:batch/v1:CronJob    update
~   vpc-native-check-analytics-report-cron                     kubernetes:batch/v1:CronJob    update
~   vpc-native-temporal-deployment                             kubernetes:apps/v1:Deployment  update
~   vpc-native-validate-active-users-cron                      kubernetes:batch/v1:CronJob    update
~   vpc-native-update-source-public-threshold-cron             kubernetes:batch/v1:CronJob    update
~   vpc-native-private-deployment                              kubernetes:apps/v1:Deployment  update
~   vpc-native-user-profile-updated-sync-cron                  kubernetes:batch/v1:CronJob    update
~   vpc-native-calculate-top-readers-cron                      kubernetes:batch/v1:CronJob    update
~   vpc-native-generic-referral-reminder-cron                  kubernetes:batch/v1:CronJob    update
~   vpc-native-clean-old-notifications-cron                    kubernetes:batch/v1:CronJob    update
~   vpc-native-clean-expired-better-auth-sessions-cron         kubernetes:batch/v1:CronJob    update
~   vpc-native-update-tags-str-cron                            kubernetes:batch/v1:CronJob    update
-   vpc-native-api-clickhouse-migration-e0909e03               kubernetes:batch/v1:Job        delete
~   vpc-native-expire-super-agent-trial-cron                   kubernetes:batch/v1:CronJob    update
~   vpc-native-user-profile-analytics-clickhouse-cron          kubernetes:batch/v1:CronJob    update
+   vpc-native-api-db-migration-883ffa71                       kubernetes:batch/v1:Job        create
+   vpc-native-api-clickhouse-migration-883ffa71               kubernetes:batch/v1:Job        create
~   vpc-native-post-analytics-clickhouse-cron                  kubernetes:batch/v1:CronJob    update
~   vpc-native-rotate-daily-quests-cron                        kubernetes:batch/v1:CronJob    update
~   vpc-native-update-achievement-rarity-cron                  kubernetes:batch/v1:CronJob    update
~   vpc-native-clean-channel-highlights-cron                   kubernetes:batch/v1:CronJob    update
~   vpc-native-hourly-notification-cron                        kubernetes:batch/v1:CronJob    update
~   vpc-native-channel-highlights-cron                         kubernetes:batch/v1:CronJob    update
~   vpc-native-deployment                                      kubernetes:apps/v1:Deployment  update
~   vpc-native-ws-deployment                                   kubernetes:apps/v1:Deployment  update
~   vpc-native-post-analytics-history-day-clickhouse-cron      kubernetes:batch/v1:CronJob    update
~   vpc-native-materialize-monthly-best-post-archives-cron     kubernetes:batch/v1:CronJob    update
~   vpc-native-bg-deployment                                   kubernetes:apps/v1:Deployment  update
~   vpc-native-user-profile-analytics-history-clickhouse-cron  kubernetes:batch/v1:CronJob    update
~   vpc-native-clean-zombie-user-companies-cron                kubernetes:batch/v1:CronJob    update
~   vpc-native-clean-zombie-users-cron                         kubernetes:batch/v1:CronJob    update
~   vpc-native-clean-zombie-opportunities-cron                 kubernetes:batch/v1:CronJob    update
~   vpc-native-clean-zombie-images-cron                        kubernetes:batch/v1:CronJob    update
~   vpc-native-user-posts-analytics-refresh-cron               kubernetes:batch/v1:CronJob    update
~   vpc-native-update-tag-materialized-views-cron              kubernetes:batch/v1:CronJob    update
... and 12 other changes

- niche.id becomes uuid (generated via uuid_generate_v4), matching the
  convention used by other generated-id tables in daily-api.
- adds niche.slug — stable human-readable identifier (e.g. "js_ts")
  with a unique index. The labeling pipeline references niches by slug;
  keyword_niche FKs reference id.
- keyword_niche.primaryNicheId / secondaryNicheId become uuid.
@idoshamun idoshamun merged commit 326ae56 into main May 27, 2026
8 of 9 checks passed
@idoshamun idoshamun deleted the feat/post-niches branch May 27, 2026 10:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant