Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[#1792] Optimize FTS queries on non-indexed fields
As per the commit aa51630, we can't create full-text search indexes on all field types. Specifically, we're only indexing textual and numeric fields. Even though we don't index other fields, the users are still able to do FTS queries on them, and we also use them on the autocomplete, so we'd like them to be relatively fast. When creating a datastore table, we add a "_full_text" column with the content of all the other columns. This is used when the user wants to do a FTS on the entire resource, regardless of columns. The idea behind the optimization in this patch is to use that "_full_text" column to limit the number of rows Postgres will search. While testing, I've used a sample table with 1 million rows and 540 MB. Among others, it has a column named "timestamp" with (obviously) a timestamp. There're no indexes on this column. I've done 2 tests: (a) do a FTS only on the timestamp column, and; (b) do a FTS on both the timestamp and _full_text columns. The results below are the average of running the tests 5 times. (a) Querying for '06' just on the timestamp field took 5682ms; (b) Querying for '06' both on the _full_text and timestamp fields took 90ms For this test, querying for both _full_text and timestamp made the query ~63 times faster. Unfortunately, the result depends a lot on the query terms and the data. On our test data, every timestamp is from 2014. If instead of searching for '06' we search for '2014', this is what we get (again, repeated 5 times and averaged out): (a) Querying for '2014' just on the timestamp field took 12455ms; (a) Querying for '2014' just on the _full_text and timestamp fields took 14534ms; Using the _full_text field made this query ~17% slower. The problem here is that this optimization only makes sense if quering the _full_text reduces the number of rows that we have to query on the timestamp field. When that's not the case, it makes the query slower. In my opinion, this patch is still worthy.
- Loading branch information
1 parent
aa51630
commit c033efb
Showing
7 changed files
with
178 additions
and
27 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters