feat(engine): demote off-topic and wrong-language results in ranking by ErikChevalier · Pull Request #56 · FlintWave/SearchMob

ErikChevalier · 2026-06-03T22:13:32Z

Ports the desktop relevance signal (shipped in SearchMob-Desktop 26.06.04) to Android, closing the parity gap. Releases as standalone Android GA 26.06.02.

Problem

Ranking was RRF (engine consensus) over de-duplicated results, then sort and the user's domain rules. Nothing asked "does this result actually match the query?", so with mostly single-engine results the fused scores are near-tied and off-topic or wrong-language results slip into the top (users reported results "very far from relevant" and "in different languages than the request").

Change

New engine/aggregate/Relevance.kt — a 1:1 port of desktop engines/relevance.py:
- Lexical query-match: stopword-filtered, lightly (ASCII-gated) stemmed content-term coverage over title + snippet, title-weighted, with a head-term (subject) penalty and a small exact-phrase bonus.
- Script-relative language affinity: demote a result whose dominant alphabet differs from the query's (works in any language; same-script never penalized).
- Demotion-only blend rrf * minOf(1.0, BASE + GAIN*lexical) * affinity (BASE=0.5, GAIN=1.0): a weak/wrong-language match sinks toward a floor; a strong match never outranks engine consensus, so keyword stuffing is not promoted.
engine/aggregate/Aggregator.kt — folds the blend into the final ordering, keeping the existing deterministic tie-breakers. ResultSorter/Personalizer/DomainRanker consume the order positionally, so they inherit the improvement unchanged; pin/raise/lower/block still win.

Multilingual (ahead of the localization pass)

Tokenization scans Unicode code points via Character.isLetterOrDigit rather than regex \w (which is ASCII-only in Java/Kotlin and would silently drop all non-Latin text). English stemming is gated to ASCII so non-Latin words are never corrupted; the stopword list degrades harmlessly for other languages.

Verification

ktlintCheck, lintDebug, testDebugUnitTest, assembleDebug all green.
New RelevanceTest (14 cases, mirrors the desktop suite); AggregatorTest unchanged (7/7).
On the searchmob emulator: the cure disintegration album returns only on-topic results; a Cyrillic query (новости москва) keeps same-script results on top and demotes a stray Latin result to position 11.

Includes the add-relevance-ranking OpenSpec change (validates --strict).

🤖 Generated with Claude Code

Port the desktop relevance signal (shipped in SearchMob-Desktop 26.06.04) to Android. RRF alone trusts each engine's order, so with mostly single-engine results the fused scores are near-tied and an off-topic result one engine ranked highly slips into the top. This adds the missing query-match signal. `engine/aggregate/Relevance.kt` is a 1:1 port of `engines/relevance.py`: a stopword-filtered, lightly (ASCII-gated) stemmed lexical coverage score over title and snippet with a head-term penalty and phrase bonus, plus a script-relative language affinity. The aggregator folds them into its final ordering as a demotion-only blend (factor capped at 1.0), so weak or wrong-language matches sink toward a floor while strong matches keep full engine consensus weight and the existing deterministic tie-breakers are preserved. ResultSorter, Personalizer, and DomainRanker consume the order positionally, so they inherit the improvement unchanged. It is language-agnostic ahead of the localization pass: tokenization scans Unicode code points via Character.isLetterOrDigit rather than regex `\w` (which is ASCII-only in Java/Kotlin and would silently drop all non-Latin text), English stemming is gated to ASCII so non-Latin words are never corrupted, and the stopword list degrades harmlessly for other languages. Verified on the searchmob emulator: an album query returns only on-topic results, and a Cyrillic query keeps same-script results on top while demoting a stray Latin result. RelevanceTest mirrors the desktop suite (14 cases); AggregatorTest is unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

ErikChevalier merged commit b50f047 into main Jun 3, 2026
2 checks passed

ErikChevalier deleted the feat/relevance-ranking-android branch June 3, 2026 22:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(engine): demote off-topic and wrong-language results in ranking#56

feat(engine): demote off-topic and wrong-language results in ranking#56
ErikChevalier merged 1 commit into
mainfrom
feat/relevance-ranking-android

ErikChevalier commented Jun 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ErikChevalier commented Jun 3, 2026

Problem

Change

Multilingual (ahead of the localization pass)

Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant