Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .claude/scheduled_tasks.lock
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"sessionId":"7555a767-0d96-4490-86d6-a13b5c13148b","pid":40413,"procStart":"Sun May 3 16:45:03 2026","acquiredAt":1777917963204}
2 changes: 1 addition & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@ RepoRefreshWorker (hourly) — re-fetches passthrough repos by oldest indexed_a
- **Auth is a stateless proxy, not a session.** `/v1/auth/device/*` forwards to `github.com/login/*` with the backend's `GITHUB_OAUTH_CLIENT_ID` injected. The backend must **never** log, cache, or persist the access token returned by a successful poll — it passes through the suspending handler and out to the HTTP response, nothing else. No database table, no in-memory map, no breadcrumb. The client is the only place the token lives. Client is backend-first on these two calls and falls back to direct-to-github.com on 5xx / network errors (only — not on valid-but-negative responses like `authorization_pending` or `access_denied`, which are GitHub's real answer and `github.com` direct would say the same thing).
- **Unified ranking via `SearchScore.compute()`** (`ranking/SearchScore.kt`). Formula: `0.40·log₁₀(stars+1)/6 + 0.30·ctr + 0.20·install_success_rate + 0.10·exp(-days_since_release/90)`. Two callers: `SignalAggregationWorker` (hourly, with real signals) and `GitHubSearchClient` at ingest time (cold-start, signals = 0 — still gives passthrough repos a non-null score so they sort). Weights live in the object only; never inline the formula elsewhere.
- **Meilisearch partial-update gotcha — PUT, never POST.** `MeilisearchClient.addDocuments()` is POST, which on Meili *replaces* the document with whatever fields you send (everything else becomes null). `MeilisearchClient.updateScores()` is PUT, which merges. Pushing just `{id, search_score}` with POST will wipe every other field on 3000+ docs. If you add a new "partial update" path, verify the HTTP verb before deploying.
- **Dynamic category/topic ordering.** `RepoRepository.findByCategory()` / `findByTopicBucket()` sort by `searchScore DESC NULLS LAST, rank ASC`. The Python fetcher's static `rank` is only a tie-breaker now; behavioral signals dominate.
- **Dynamic category/topic ordering.** `RepoRepository.findByCategory()` picks a category-specific primary sort column (`trending_score` for trending, `popularity_score` for most-popular, `latest_release_date` for new-releases), falls back to global `searchScore`, then static `rank` as final tie-breaker. Without category-specific primary, both trending and most-popular collapse onto the same global score — the bug fix in PR #12. `findByTopicBucket()` keeps the simpler `searchScore DESC NULLS LAST, rank ASC` order because topics are flat lists, not flavour-segmented like the categories.
- **Exposed `Repos` table uses `array<String>("topics", TextColumnType())`** for the Postgres `TEXT[]` column. The Python fetcher writes these via psycopg2's automatic list-to-array conversion.
- **Cache headers are set per endpoint**, not globally. Announcements: 600s/3600s. Categories/topics: 60s/600s. Repo detail: 30s/300s. Search: 15s/30s. Readme proxy: 3600s/21600s. User proxy: 86400s/604800s. Badges (fresh): 3600s/3600s with `stale-while-revalidate=86400`; (degraded) 300s/300s. Edge respects `s-maxage`; the larger `s-maxage` lets Gcore's shield/tiered cache topology absorb origin load while browsers stay fresher via the smaller `max-age`. `/internal/metrics` is uncached.
- **HEAD routes to GET** via the `AutoHeadResponse` plugin (`Plugins.kt`). Without it, Ktor 3 returns 404 for HEAD on every GET handler — confusing for `curl -I`, monitoring, and CDN origin probes.
Expand Down
26 changes: 20 additions & 6 deletions src/main/kotlin/zed/rainxch/githubstore/db/RepoRepository.kt
Original file line number Diff line number Diff line change
Expand Up @@ -19,18 +19,32 @@ class RepoRepository {
}

suspend fun findByCategory(category: String, platform: String, limit: Int = 50): List<RepoResponse> = newSuspendedTransaction(Dispatchers.IO) {
// Primary: dynamic behavioral search_score (updated hourly by
// SignalAggregationWorker from clicks / installs / stars / freshness).
// Tie-breaker: the static rank the Python fetcher writes once a day,
// which preserves the category's semantic flavor (trending stays
// velocity-flavored, new-releases stays recency-flavored, etc.) when
// two repos have similar behavioral scores.
// Primary sort is category-specific: trending velocity for the
// trending list, absolute popularity for the popular list, release
// recency for new-releases. Without category-specific primary, both
// trending and most-popular collapse onto the same global
// search_score and return ~99% identical top-N results -- the bug
// this query previously had.
//
// Each category falls back to the global behavioral search_score
// when its category-specific column is NULL, then to the static
// rank the Python fetcher writes once a day. The fetcher populates
// the category-specific scores for repos in that category, so the
// fallback is mostly a no-op except for newly-ingested rows that
// haven't been reranked yet.
val primary: org.jetbrains.exposed.sql.Expression<*> = when (category) {
"trending" -> Repos.trendingScore
"most-popular" -> Repos.popularityScore
"new-releases" -> Repos.latestReleaseDate
else -> Repos.searchScore
}
Repos.innerJoin(RepoCategories, { id }, { repoId })
.selectAll()
.where {
(RepoCategories.category eq category) and (RepoCategories.platform eq platform)
}
.orderBy(
primary to SortOrder.DESC_NULLS_LAST,
Repos.searchScore to SortOrder.DESC_NULLS_LAST,
RepoCategories.rank to SortOrder.ASC,
Comment on lines +22 to 49
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

findByCategory ordering now violates the repository sorting contract

At Line 35-49, findByCategory no longer uses Repos.searchScore DESC NULLS LAST as the primary sort key; it now prioritizes category-specific columns. This conflicts with the required repository behavior and can create inconsistent ranking semantics across endpoints.

Please align ordering back to:

  1. Repos.searchScore DESC_NULLS_LAST
  2. RepoCategories.rank ASC

As per coding guidelines, "**/*Repository.kt: RepoRepository.findByCategory() and findByTopicBucket() must sort by searchScore DESC NULLS LAST, rank ASC—static rank is only a tie-breaker; behavioral signals dominate".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/main/kotlin/zed/rainxch/githubstore/db/RepoRepository.kt` around lines 22
- 49, The findByCategory() implementation currently uses a category-specific
'primary' expression (defined in the when block) as the first ORDER BY key;
change it so the ORDER BY follows the repository contract: use Repos.searchScore
DESC NULLS LAST as the first sort key and RepoCategories.rank ASC as the second.
Concretely, remove or stop using the local primary variable in the orderBy call
and update the orderBy(...) in the Repos.innerJoin(...).selectAll().where(...)
chain to order first by Repos.searchScore to SortOrder.DESC_NULLS_LAST and then
by RepoCategories.rank to SortOrder.ASC, ensuring any category-specific columns
are not promoted ahead of searchScore.

)
Expand Down
Loading