How to add indexes #15

Closed
tcocca opened this Issue Oct 13, 2011 · 16 comments

Comments

9 participants

tcocca commented Oct 13, 2011

So I have a model, Property, and I want to use a simple search_scope:

class Property < ActiveRecord::Base
  include PgSearch
  pg_search_scope :search_by_keywords, :against => :description
end

I opened script/console and did a simple: Property.search_by_keywords('test') and it generated the following query:

SELECT "properties".*, (ts_rank((to_tsvector(E'simple', coalesce("properties"."description", ''))), (to_tsquery(E'simple', E''' ' || E'test' || E' ''')))) AS pg_search_rank FROM "properties"   WHERE (((to_tsvector(E'simple', coalesce("properties"."description", ''))) @@ (to_tsquery(E'simple', E''' ' || E'test' || E' '''))))  ORDER BY pg_search_rank DESC, "properties"."id" ASC

How would I go about adding a fulltext index for this search scope so that the query is fast.

Keep in mind eventually I will want to do something like:

Property.search_by_keywords('town house').paginate(:conditions => {:state => 'MA', :price => (0..250000)}, :order => "price asc", :page => 1, :per_page => 25)

Or you get the point, we are going to be combining the search_scope for keywords with some other complex filtering. Any ideas on what the best index syntax is to use to add the index on the fulltext for description?

~ Tom

Collaborator

nertzy commented Oct 13, 2011

You should index the expression to the left of @@

(((to_tsvector(E'simple', coalesce("properties"."description", ''))

You should also use gist as the index strategy. So it will look something like this:

CREATE INDEX properties_search_by_keywords ON properties USING gist (((to_tsvector(E'simple', coalesce("properties"."description", ''));

Let me know if that works well for you! I hope to get PgSearch to generate index migrations at some point in the future. Until then, I should really get something into the Github wiki.

Collaborator

nertzy commented Oct 13, 2011

To address your final question:

PostgreSQL will intelligently use multiple indexes in the query planner, so it's generally good enough to make separate indexes for the various expressions that are expensive to calculate and sort on the fly.

The great thing about the query planner is that if it ever has to do a full table scan on a query that has more conditions on it, then it will start by using the conditions first to cut down the table size before doing the more expensive per-row calculation.

nertzy closed this Oct 13, 2011

tcocca commented Oct 14, 2011

@nertzy Thanks for reply. I had found this on stackoverflow as well http://stackoverflow.com/questions/7656864/full-text-search-on-heroku-using-pg-search-gem

Not sure what the difference between a gist and gin index is. I should probably look into that.

I had added my index as:

CREATE INDEX CONCURRENTLY properties_keywords_idx ON properties USING gin(to_tsvector('english', coalesce("properties"."description", '')));

so the only difference was I was specifying the english dictionary and a gin vs. gist index.

Thanks again, nice gem!

darkleaf commented Jul 9, 2012

It's work:

CREATE INDEX object_models_search_idx ON object_models USING gin( (to_tsvector('simple', coalesce("object_models"."name"::text, '')) || to_tsvector('simple', coalesce("object_models"."short_info"::text, '')) ) );

gin( ( ~~~ || ~~~ ) )

need double parentheses

Collaborator

nertzy commented Sep 23, 2012

Yes, actually I was wrong before when I said you have to use gist. Both gist and gin work well. And you're right, you definitely need to have an additional set of parentheses.

Something like this:

USING gin( ([expression goes here]) )

or if you want to use a special GIN index type (like for trigrams)

USING gin( ([expression goes here]) gin_trgm_ops)
Collaborator

stevenharman commented Dec 11, 2012

Any chance of getting some docs up (README or wiki) on adding and using tsvector indexes? They would be especially useful for multisearch, as it is a bit more complicated.

+1!

+1 to that guide or an index migration generator, this can be tricky.

Just in case it helps some people, here's how I set it up using tsv columns, triggers and indexes. It's super fast, something like 40ms to search on 200K documents.

# app/models/post.rb
class Post < ActiveRecord::Base
  include PgSearch
  pg_search_scope :search_full_body,
                  :against => :body,
                  :using => { :tsearch => { :dictionary => 'english',
                                            :tsvector_column => 'tsv_body' }}
end

# DB migration
def up
  # Adds a tsvector column for the body
  add_column :posts, :tsv_body, :tsvector

  # Adds an index for this new column
  execute <<-SQL
    CREATE INDEX index_posts_tsv_body ON posts USING gin(tsv_body);
  SQL

  # Updates existing rows so this new column gets calculated
  execute <<-SQL
    UPDATE posts SET tsv_body = (to_tsvector('english', coalesce(body, '')));
  SQL

  # Sets up a trigger to update this new column on inserts and updates
  execute <<-SQL
    CREATE TRIGGER tsvectorupdate BEFORE INSERT OR UPDATE
    ON posts FOR EACH ROW EXECUTE PROCEDURE
    tsvector_update_trigger(tsv_body, 'pg_catalog.english', body);
  SQL
end
Collaborator

stevenharman commented Jul 31, 2013

@jaimeiniesta Curious, can you do this and still use the schema.rb, or did you have to switch to a SQL dump?

@stevenharman in order to apply the migration you don't need to switch to a SQL dump, but you should if you want to recreate the DB schema on another system. So yes I recommend switching to SQL.

Collaborator

stevenharman commented Jul 31, 2013

@jaimeiniesta Makes sense for Rails, by default. However, as of ActiveRecord 4.0, you can add indexes and specify the "type" of index:

add_index(:posts, :tsv_body, using: 'gin')

And I've been looking at hair_trigger to add database triggers to ActiveRecord:

create_trigger(compatibility: 1).on(:posts).before(:insert, :update) do
  "tsvector_update_trigger(tsv_body, 'pg_catalog.english', body);"
end

With that I think you could stick with the schema.rb, maybe? :)

Collaborator

nertzy commented Aug 3, 2013

A few months ago I added a page to the wiki called Building Indexes. I'd encourage everyone in this thread to look it over. Please feel free to make any edits or additions that could help people out in the future. Thanks!

@jaimeiniesta I wish I could vote up your comment or something. Thanks for the code example for creating the index and trigger for the separate tsv columns; it was very helpful!

Has this been working well for you? I just wondered if not having the coalesce function for body during the tsvector_update_trigger call has caused any unexpected behavior? If so, I was thinking of perhaps creating the function for the trigger manually to include the coalesce. Something like this:

CREATE FUNCTION tsv_body_update() RETURNS TRIGGER AS $$
BEGIN
    new.tsv_body = to_tsvector('pg_catalog.english', COALESCE(new.body, ' '));
    RETURN new;
END
$$ LANGUAGE 'plpgsql';

CREATE TRIGGER tsvectorupdate BEFORE INSERT OR UPDATE
ON posts FOR EACH ROW EXECUTE PROCEDURE tsv_body_update();

@ewherrmann thanks! yes, it worked fine and no unexpected behavior was found during the time the related feature was active in our project.

@jaimeiniesta this is great - thanks for sharing! how would the syntax on your migration look for multiple columns? would you need one trigger for each column?

kelso commented Oct 1, 2015

@jaimeiniesta and the solution can be complete if you know how to setup tsvector speedup and indexes while using multisearchable approach. Everything above seems to be working with pg_search_scope approach.

While using multisearch the problem is it's looking for tsv_body in table pg_search_documents instead of just posts: 😫

ActiveRecord::StatementInvalid in SearchResults#index
PG::UndefinedColumn: ERROR: column pg_search_documents.tsv_body does not exist

voodoorai2000 referenced this issue in consul/consul Nov 13, 2015

Closed

Mejorar rendimiento de las búsquedas #694

2 of 5 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment