Skip to content

Commit

Permalink
Boost results with exact order and LCS by default
Browse files Browse the repository at this point in the history
Our current way of ranking results is to sort them by total number
of words in the sentence. A problem with this approach is that the
order of the words is ignored. The top result of searching for "you
go there" is "There you go!" because it’s a shorter sentence than
"You may go there."

Ignoring word order is especially catastrophic on languages without
word boundaries, like Chinese, because the searched characters are
randomly reordered into something totally unrelated. For example, the
results for "可不可" in Chinese are cluttered by irrelevant
"不可something". Same for kana words in Japanese.

This commit address this problem by changing the default ranking
algorithm into something that prioritize, in the following order:

1. sentences that contains an exact match (like if searching for
   ="you go there")
2. sentences having the "longest common subsequence" (LCS, [1])
3. sentences having the least number of words

[1] https://docs.manticoresearch.com/latest/html/searching/search_results_ranking.html?highlight=lcs#field-level-ranking-factors

Note that there is still room for improvement, because sorting by
number of words is not ideal, as very short sentences are usually
not the most relevant.
  • Loading branch information
jiru committed May 15, 2019
1 parent 70cff7d commit 47b56f1
Show file tree
Hide file tree
Showing 3 changed files with 4 additions and 1 deletion.
2 changes: 1 addition & 1 deletion src/Controller/SentencesController.php
Expand Up @@ -100,7 +100,7 @@ class SentencesController extends AppController
'trans_unapproved' => '',
'trans_has_audio' => '',
'trans_filter' => 'limit',
'sort' => 'words',
'sort' => 'relevance',
'sort_reverse' => '',
);

Expand Down
2 changes: 2 additions & 0 deletions src/Model/Table/SentencesTable.php
Expand Up @@ -1339,6 +1339,8 @@ public function sphinxOptions($query, $from, $sort, $sort_reverse)
$sortOrder = $this->orderby('@rank', $sort_reverse);
if ($sort == 'words') {
$rankingExpr = '-text_len';
} elseif ($sort == 'relevance') {
$rankingExpr = '-text_len+top(lcs+exact_order*100)*100';
} elseif ($sort == 'created' || $sort == 'modified') {
$rankingExpr = $sort;
}
Expand Down
1 change: 1 addition & 0 deletions src/Template/Element/advanced_search_form.ctp
Expand Up @@ -209,6 +209,7 @@ echo $this->Form->create(
echo $this->Form->input('sort', array(
'label' => __('Order:'),
'options' => array(
'relevance' => __('Relevance'),
'words' => __('Fewest words first'),
'created' => __('Last created first'),
'modified' => __('Last modified first'),
Expand Down

0 comments on commit 47b56f1

Please sign in to comment.