Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefix search returns suboptimal ranking #500

Open
mak-dunkelziffer opened this issue Jan 16, 2023 · 0 comments
Open

Prefix search returns suboptimal ranking #500

mak-dunkelziffer opened this issue Jan 16, 2023 · 0 comments

Comments

@mak-dunkelziffer
Copy link

In one of my pg_search_scopes turning on prefix search yields very weird ranks. Apart from slightly adjusting the rank (e.g. by a factor of 0.5 for old items), I don't do anything tricky:

pg_search_scope :search,
  against: :text,
  using: {
    tsearch: {
      tsvector_column: 'search_tsvector',
      prefix: true,
      negation: true,
      dictionary: 'simple',
      normalization: 0,
    }
  },
  ranked_by: <<-SQL
    trunc(
      :tsearch * 1000000 *   
      // slight boosting of results according to certain flags or item age, but never more than by a combined factor of 8.
    )
  SQL

The query gives low ranks to obviously important items (20+ occurrences of the search term) and results in a weird distribution of ranks. I would (for search in general) expect some distribution where ranks between neighboring results differ by maybe 10% on average, but I get ranks like [1'000'000, 30'000, 500, 10, ...].

Obviously, with such huge gaps, any custom rank boosting will have no effect on the order of the results. But more importantly, I would understand such a clear result, if the best match would be on the top, but it isn't.

This huge spread of ranks only happens with prefix: true, any_word: false. For all other three combinations of these flags, the ranks have a saner distribution, are much closer to each other and the obvious best result is on top.

Is there any known problem with this combination? Is this possibly a bug or is there a logical reason, why this combination behaves differently than the others? Also, are there more advanced methods of debugging such a thing than simply displaying the rank in the output?

I would really like to keep the prefix search without messing up all of the ranks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant