Use COLLATE NOCASE instead of LOWER() for Bio::DB::SeqFeature::Store::DBI::SQLite #66
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
After upgrading from BioPerl 1.6.901 to 1.6.923, a significant performance regression was observed that led to GBrowse timeouts when doing keyword searches. This was traced to the change made to Bio::DB::SeqFeature::Store::DBI::SQLite made in commit d3af015.
When executing Bio::DB::SeqFeature::Store->get_features_by_name(), The SQL generated by the 1.6.923 version is:
The 1.6.923 query (which uses the LOWER() function on the "name" column) results in a full table scan, while the 1.6.901 method, which omits LOWER(), allows the use of the index on the "name" column:
A more efficient way of achieving a case-insensitive search that allows the use of indexes is to add the COLLATE NOCASE constraint to the relevant columns. This also allows the "LIKE" optimization ( http://www.sqlite.org/optoverview.html ) as well, which in my benchmark database containing 692300 gene/mRNA/CDS features resulted in an almost 7x speedup of Bio::DB::SeqFeature::Store->search_attributes().
The changes in this pull request add the COLLATE NOCASE option to name.name, attribute.attribute_value, and typelist.tag columns, and (for backwards compatibility with existing SeqFeature databases that don't have the COLLATE NOCASE constraint on those columns) replaces the use of the LOWER() function with the COLLATE NOCASE operator.
These changes pass the t/LocalDB/SeqFeature_SQLite.t regression test.