Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange behaviour when searching string with hyphen / dash #349

Open
alpracka opened this issue Jan 26, 2017 · 4 comments
Open

Strange behaviour when searching string with hyphen / dash #349

alpracka opened this issue Jan 26, 2017 · 4 comments

Comments

@alpracka
Copy link

Hi, I've found very similar issue #19 and possibly related to #117

I have a simple setup:

class Product < ActiveRecord::Base
  include PgSearch
  pg_search_scope :search_by_name, against: :name , using: {
    tsearch: {
      prefix: true
    }
  }
end

And these troubles:

Product.create(name: 'Hikvision DS-7604NI-E1/4P/A')
Product.search_by_name("Hikvision DS-7604NI-E1/4P/A").any?
# => true
Product.search_by_name("Hik DS-7604NI").any?
# => true
Product.search_by_name("Hik DS-7604").any?
# => false # true expected
Product.search_by_name("DS-7604").any?
# => false # true expected

I've tried several options but nothing works as expected. I also tried replacing hyphen/dash with space character, then Product.search_by_name("DS 7604").any? # => true but Product.search_by_name("Hikvision DS 7604NI E1/4P/A").any? # => false

Tested with pg 9.6, sorry if it's not related with the gem but I don't know how pg tsearch work yet so trying first shot here. Thanks for help.

@kluzny
Copy link

kluzny commented Apr 8, 2017

Few possible things, just an amateur myself here:

The hypen behaviour is based on how PG is splitting up the words. PG uses dictionaries to define how to break up the works into tokens or lexemes: https://www.postgresql.org/docs/9.1/static/textsearch-dictionaries.html

By default pg ships with simple, and possibly based on your locale english. Slashes and hypens aren't tokenized, and I think in the simple dictionary they are used as word boundaries. Using a custom dictionary looks like kind of a hassle, but will let you tokenize differently.

You might want to switch to trigram, might give you better results on similarity instead of prefix matching.

@vpereira
Copy link

vpereira commented Jan 12, 2023

ping, any idea how to solve it?

I'm using pg-search to search through paths like /var/foo and /var/foo-bar and pg_search doesn't return anything

my code looks like:

class Repository < ApplicationRecord
  include PgSearch::Model
  belongs_to :category
  pg_search_scope :search_name, against: %i[name path], using: :trigram
end

calling Repository.search_name "bci" generates the following query

  Repository Load (2.4ms)  SELECT "repositories".* FROM "repositories" INNER JOIN (SELECT "repositories"."id" AS pg_search_id, (ts_rank((to_tsvector('simple', coalesce("repositories"."name"::text, '')) || to_tsvector('simple', coalesce("repositories"."path"::text, ''))), (to_tsquery('simple', ''' ' || 'bci' || ' ''')), 0)) AS rank FROM "repositories" WHERE ('bci' % (coalesce("repositories"."name"::text, '') || ' ' || coalesce("repositories"."path"::text, '')))) AS pg_search_a8038884f8ec5d3389ecea ON "repositories"."id" = pg_search_a8038884f8ec5d3389ecea.pg_search_id ORDER BY pg_search_a8038884f8ec5d3389ecea.rank DESC, "repositories"."id" ASC
(Object doesn't support #inspect)                                                                     
=>                                              

I for sure have objects to be found in my db:

irb(main):018:0> Repository.all.map { |c| c.name.match?(/^bci\//) }.count
  Repository Load (3.9ms)  SELECT "repositories".* FROM "repositories"
=> 48  

On my db, I have theoretically the necessary extensions:

root@4fb9d81eb9c7:/# apt-get install postgresql-common
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
postgresql-common is already the newest version (246.pgdg110+1).
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
root@4fb9d81eb9c7:/# exit

@9mm
Copy link

9mm commented Apr 9, 2023

Also find this highly annoying, I didnt expect to get this far and be stopped by a hyphen... my database is mostly model numbers

@guar47
Copy link

guar47 commented Jun 9, 2023

Has anyone managed to solve this? I haven't tried trigram yet but using a different type of dictionaries don't help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants