Fix trending and most-popular returning identical results#12
Fix trending and most-popular returning identical results#12rainxchzed merged 2 commits intomainfrom
Conversation
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📝 WalkthroughWalkthroughThe ChangesCategory-Specific Query Ordering
Estimated code review effort🎯 2 (Simple) | ⏱️ ~5 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/main/kotlin/zed/rainxch/githubstore/db/RepoRepository.kt`:
- Around line 22-49: The findByCategory() implementation currently uses a
category-specific 'primary' expression (defined in the when block) as the first
ORDER BY key; change it so the ORDER BY follows the repository contract: use
Repos.searchScore DESC NULLS LAST as the first sort key and RepoCategories.rank
ASC as the second. Concretely, remove or stop using the local primary variable
in the orderBy call and update the orderBy(...) in the
Repos.innerJoin(...).selectAll().where(...) chain to order first by
Repos.searchScore to SortOrder.DESC_NULLS_LAST and then by RepoCategories.rank
to SortOrder.ASC, ensuring any category-specific columns are not promoted ahead
of searchScore.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c1150716-25ab-4b16-babd-1e03b942ac35
⛔ Files ignored due to path filters (1)
.claude/scheduled_tasks.lockis excluded by!**/*.lock
📒 Files selected for processing (1)
src/main/kotlin/zed/rainxch/githubstore/db/RepoRepository.kt
| // Primary sort is category-specific: trending velocity for the | ||
| // trending list, absolute popularity for the popular list, release | ||
| // recency for new-releases. Without category-specific primary, both | ||
| // trending and most-popular collapse onto the same global | ||
| // search_score and return ~99% identical top-N results -- the bug | ||
| // this query previously had. | ||
| // | ||
| // Each category falls back to the global behavioral search_score | ||
| // when its category-specific column is NULL, then to the static | ||
| // rank the Python fetcher writes once a day. The fetcher populates | ||
| // the category-specific scores for repos in that category, so the | ||
| // fallback is mostly a no-op except for newly-ingested rows that | ||
| // haven't been reranked yet. | ||
| val primary: org.jetbrains.exposed.sql.Expression<*> = when (category) { | ||
| "trending" -> Repos.trendingScore | ||
| "most-popular" -> Repos.popularityScore | ||
| "new-releases" -> Repos.latestReleaseDate | ||
| else -> Repos.searchScore | ||
| } | ||
| Repos.innerJoin(RepoCategories, { id }, { repoId }) | ||
| .selectAll() | ||
| .where { | ||
| (RepoCategories.category eq category) and (RepoCategories.platform eq platform) | ||
| } | ||
| .orderBy( | ||
| primary to SortOrder.DESC_NULLS_LAST, | ||
| Repos.searchScore to SortOrder.DESC_NULLS_LAST, | ||
| RepoCategories.rank to SortOrder.ASC, |
There was a problem hiding this comment.
findByCategory ordering now violates the repository sorting contract
At Line 35-49, findByCategory no longer uses Repos.searchScore DESC NULLS LAST as the primary sort key; it now prioritizes category-specific columns. This conflicts with the required repository behavior and can create inconsistent ranking semantics across endpoints.
Please align ordering back to:
Repos.searchScore DESC_NULLS_LASTRepoCategories.rank ASC
As per coding guidelines, "**/*Repository.kt: RepoRepository.findByCategory() and findByTopicBucket() must sort by searchScore DESC NULLS LAST, rank ASC—static rank is only a tie-breaker; behavioral signals dominate".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/main/kotlin/zed/rainxch/githubstore/db/RepoRepository.kt` around lines 22
- 49, The findByCategory() implementation currently uses a category-specific
'primary' expression (defined in the when block) as the first ORDER BY key;
change it so the ORDER BY follows the repository contract: use Repos.searchScore
DESC NULLS LAST as the first sort key and RepoCategories.rank ASC as the second.
Concretely, remove or stop using the local primary variable in the orderBy call
and update the orderBy(...) in the Repos.innerJoin(...).selectAll().where(...)
chain to order first by Repos.searchScore to SortOrder.DESC_NULLS_LAST and then
by RepoCategories.rank to SortOrder.ASC, ensuring any category-specific columns
are not promoted ahead of searchScore.
Summary
User reported
/v1/categories/trending/androidand/v1/categories/most-popular/androidreturn ~99% identical top-10 results. Confirmed:```
trending → 2dust/v2rayNG, ReVanced/revanced-manager, MetaCubeX/ClashMetaForAndroid, ...
most-popular → 2dust/v2rayNG, ReVanced/revanced-manager, MetaCubeX/ClashMetaForAndroid, ...
```
7-8 of 10 are identical. Same root cause for topics across multiple categories.
Root cause
`RepoRepository.findByCategory` always sorted by the global `Repos.searchScore` regardless of category. Combined with category membership overlap (34 of 49 trending/android repos are also in most-popular/android), the same global score on a heavily-overlapping set produces near-identical lists.
The Python fetcher already populates per-category scores:
These columns existed but were never used for sort ordering. The query used them only for response decoration (the `trendingScore` / `popularityScore` fields on `RepoResponse`).
Fix
Pick the primary sort column based on category:
Per-category scores are written by the daily Python fetcher only for repos in that category, so the global `search_score` fallback handles newly-ingested rows that haven't been ranked yet.
`new-releases` previously also collapsed onto `search_score`. Now sorts by literal release date which matches user intent ("newest releases first").
Topics endpoint stays on global `search_score` -- topics are flat lists not flavour-segmented like the categories.
Test plan
Summary by CodeRabbit