Ranking of records returned by Search and List #1475
Replies: 1 comment 1 reply
-
|
I like the direction, especially exposing component signals instead of returning a single opaque score. One design point I would be careful with is cross-type comparability. A score for a skill, a domain, and a module may not mean the same thing unless the features are normalized per record type. I would keep ranking namespace-aware first, then only merge across namespaces if the response also exposes A useful scoring split could be:
I would also make tie-breaking deterministic: score desc, specificity desc, updated_at desc, stable CID or record id asc. That makes CLI output and tests much easier to reason about. The metadata should probably include both |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Ranking of records returned by Search and List
dirctl searchanddirctl routing searchcurrently return matching records in an order that has no relation to relevance. This RFC proposes adding a ranking layer on top of the existing match logic, exposing the resulting score and its component signals on the wire so the ordering is explainable.Background
How search works today
The repository exposes two search-shaped APIs that share concepts but live in different layers:
agntcy.dir.search.v1.SearchService— local, SQL-backed. Implemented inserver/controller/search.go, indexed inserver/database/gorm/. Used bydirctl search.agntcy.dir.routing.v1.RoutingService.Searchand.List— distributed, KV + DHT. Implemented inserver/routing/. Used bydirctl routing search. Already returns amatch_scoreper result (number ofRecordQueryitems that matched), but does not use it to sort.The only ordering anywhere in the codebase is one line in
server/database/gorm/record.go:Concretely:
SearchService.SearchCIDscreated_at DESC(when the record was added to the local index)SearchService.SearchRecordscreated_at DESCRoutingService.List(local-only)/records/*RoutingService.Search(remote)/skills/*,/domains/*,/modules/*match_scoreis computed for every routing result but only attached as metadata; results are streamed in iteration order.Available ranking signals
The following are persisted today and can drive a ranking score without any new storage or schema changes:
In the SQL index (
server/database/gorm/).name,version,schema_version,oasf_created_at,authors,signedboolean, plus joined tables forskills,locators,modules, anddomains. Verification state lives insignature_verificationsandname_verifications, exposed today as theRECORD_QUERY_TYPE_TRUSTEDandRECORD_QUERY_TYPE_VERIFIEDfilters.Outside SQL. DHT provider count via
server/routing/handler.go::GetProviders, the locally-cached/skills/.../<CID>/<peerID>keys, and the per-querymatch_scorefromserver/routing/query_matching.go.Proposal
Add a per-result ranking score, computed at query time from a small linear combination of normalized signals, and stream results in descending score order. Expose the score and its component sub-scores on the wire so the ordering is explainable in the CLI and UI.
Signals
Each signal is normalized to
[0, 1]. The final score isΣ (w_i · s_i)with weights configurable indaemon.config.yaml. Defaults below are starting points for discussion.match_score / num_queriesserver/routing/query_matching.gorecords.signed, else 0server/database/gorm/record.gosignature_verificationshasstatus='verified'server/database/gorm/signature.goname_verifications.status='verified'server/database/gorm/naming.gomin(1, providers / K), K configurable (default 10)server/routing/handler.go::GetProviders/ KV cachemin(1, (#skills + #domains + #modules + #locators) / N), N default 8exp(-Δdays / τ)onoasf_created_at(fallbackcreated_at), τ default 365 daysrecords.oasf_created_atrecords.schema_versionNotes:
verified,trusted) remain hard filters; the ranking uses them as soft signals when not filtered on.Ranking options
Option A — static signals only
Per-record signals only, no
query_relevanceterm.Option B — static signals with query relevance ("hybrid ranking", recommended)
All signals above, including
query_relevance.match_score, which already exists. Zero new persisted state, zero schema migrations.SearchServiceswitches fromORDER BY ... LIMITin SQL to "fetch candidates → score → sort → page", with a candidate cap to bound work. Pagination becomes more involved (see below).Implementation
API changes
Additive proto changes — no breaking changes for existing clients.
agntcy.dir.routing.v1.SearchResponse:The same
rank_scoreandrank_explanationare added toagntcy.dir.search.v1.SearchRecordsResponseandSearchCIDsResponse.Both endpoints stream results in
rank_score DESCorder with a deterministic tie-break:created_at DESC, thencidlexicographic.Server changes
A new package
server/ranking/with:ranking.go— pureScore(record, query, signals) Result.signals.go— adapters that pull each signal from existing types (types.Record,types.SignatureVerification, etc.).config.go— weights and bounds.Both controllers import this package; scoring logic is not duplicated.
SearchService(SQL-backed)server/controller/search.gocurrently does:The new flow is
GetRecords(filterOptions)(with associations) → score inserver/ranking→ sort → applyLimit/Offset→ stream. Bound the candidate set byN = max(1000, limit·10)before scoring to keep work finite.Note: even Option A cannot precompute a static
rank_scoreSQL column, becausepopularityis a runtime DHT signal. Materialising a partial score (everything except popularity) and combining it with the live signal at query time is possible, but adds complexity for little benefit until profiling shows query-time scoring is a bottleneck.RoutingService.Search(KV / DHT)server/routing/routing_remote.goalready collects all(cid, peer, matchScore)tuples in memory before streaming. Add the ranking pass there:handler.GetProviders(or count cached/skills/.../<cid>/<peer>entries — cheaper, already local).signed,trusted,verified, completeness, freshness, schema), look up the local SQL index by CID. If the CID isn't locally indexed (purely remote), fall back to documented defaults and mark the explanation accordingly.Configuration
If
ranking.enabled = false, the server returns results in today's order withrank_score = 0and an empty explanation. This is the safe-rollout switch.Pagination
Once we sort by computed score, naive
LIMIT/OFFSETover the SQL table doesn't work because the score isn't in the DB. Two options:N = max(1000, limit·10)filtered candidates from SQL, score in Go, sort, then applyoffsetandlimiton the sorted slice. Simple, fine while the underlying filtered set is bounded.offsetwith anext_page_tokenencoding(rank_score, cid)of the last item streamed. The next call says "give me results with(rank_score, cid) < (last_score, last_cid)". More work, but matches gRPC streaming idioms.UX
CLI
dirctl searchanddirctl routing searchkeep their current--format ciddefault, so existing scripts (dirctl search ... | xargs dirctl pull ...) keep working. The ordering changes — that is the point — and the changelog calls this out.A new
--format ranked(richer output):--explainadds a per-row breakdown:--format recordkeeps printing record JSON/YAML;rank_scoreandrank_explanationare added as optional fields.UI
The Directory currently has a separate web GUI. This RFC does not propose UI changes, but
rank_scoreandrank_explanationare designed to be UI-friendly: a "Why this result?" tooltip can be rendered fromrank_explanationwithout server-side help. Coordinate with frontend before implementation lands.Beta Was this translation helpful? Give feedback.
All reactions