Skip to content

Avoid casting a tsvector column to text and back again#247

Closed
savef wants to merge 1 commit into
Casecommons:masterfrom
savef:master
Closed

Avoid casting a tsvector column to text and back again#247
savef wants to merge 1 commit into
Casecommons:masterfrom
savef:master

Conversation

@savef
Copy link
Copy Markdown

@savef savef commented May 19, 2015

When you have a tsvector column but want to search against both it and other text columns (without having to deal with triggers) the :tsvector_column option is no use as it causes everything in :against to be ignored. Also since a column's type is available I don't see why the need to specify something as a tsvector column. This change allows you to simply pass column names to :against, and if any of those are tsvectors then they will be used directly, otherwise they will be converted to tsvectors as before.

My data set is over 900 records and as well as the document I search 4 additional small text fields, also there is an postgres gin index being used. On a query that matched just 4 results the time improved from 40ms to 3ms. And on a query that matched 397 results the time improved from over 3000ms to 60ms.

I think this essentially solves the problem that the :tsvector_column option tries to solve, but automatically. It takes the columns in :against and if they are a tsvector it does not cast them (to text, and back to tsvector). I think the only slight difference is that with this commit the column will still be coalesced with "".

Here's what the relevant part of an example query looks like:
setweight(to_tsvector('english', coalesce("documents"."title"::text, '')), 'A') || setweight(to_tsvector('english', coalesce("documents"."author"::text, '')), 'B') || setweight(coalesce("documents"."content_tsv", ''), 'C')


Thanks for reading, if this is something you'll consider merging here are a couple of followup questions.

If you agree that this renders the :tsvector_column option obsolete is it worth removing that feature as part of this PR?

Do the changes here apply to either trigram or dmetaphone searching? I don't use those but would look at improving them too if the changes are relevant.

When you have a tsvector column but want to search against both it and other text columns (without having to deal with triggers) the :tsvector_column option is no use as it causes everything in :against to be ignored. Also since a column's type is available I don't see why the need to specify something as a tsvector column. This change allows you to simply pass column names to :against, and if any of those are tsvectors then they will be used directly, otherwise they will be converted to tsvectors as before.

My data set is over 900 records and as well as the document I search 4 additional small text fields, also there is an postgres gin index being used. On a query that matched just 4 results the speed improved from 40ms to 3ms. And on a query that matched 397 results the speed improved from over 3000ms to 60ms.
@savef
Copy link
Copy Markdown
Author

savef commented Jun 16, 2015

@nertzy Hey, would it be possible for me to get any feedback on this? Thanks!

@savef
Copy link
Copy Markdown
Author

savef commented Aug 24, 2015

I want this to point to a different branch now, I'll open a new PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant