Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Boost results with exact order and LCS by default
Our current way of ranking results is to sort them by total number of words in the sentence. A problem with this approach is that the order of the words is ignored. The top result of searching for "you go there" is "There you go!" because it’s a shorter sentence than "You may go there." Ignoring word order is especially catastrophic on languages without word boundaries, like Chinese, because the searched characters are randomly reordered into something totally unrelated. For example, the results for "可不可" in Chinese are cluttered by irrelevant "不可something". Same for kana words in Japanese. This commit address this problem by changing the default ranking algorithm into something that prioritize, in the following order: 1. sentences that contains an exact match (like if searching for ="you go there") 2. sentences having the "longest common subsequence" (LCS, [1]) 3. sentences having the least number of words [1] https://docs.manticoresearch.com/latest/html/searching/search_results_ranking.html?highlight=lcs#field-level-ranking-factors Note that there is still room for improvement, because sorting by number of words is not ideal, as very short sentences are usually not the most relevant.
- Loading branch information