Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Taxon Search] Search text on any part of the word and sanitize tax_level #2372

Merged
merged 2 commits into from
Jul 1, 2019

Conversation

tfrcarvalho
Copy link
Contributor

@tfrcarvalho tfrcarvalho commented Jun 30, 2019

Description

  • Search any part of the word (as opposed to prefix search for any word)
  • Sanitize tax_levels

Notes

  • Naturally, this change makes the ElasticSearch query slower.
  • Below are a few raw results: with the code run to extract them. However, true impact depends on several factors, namely, number of results return, popularity of search due to ES cache, etc. Despite the below results I saw some instances where search could be ~10x slower than prefix search.
foo = ActionView::Base.new
foo.extend ElasticsearchHelper
foo.taxon_search("ruptri", ["species", "genus"])
foo.taxon_search("enteri", ["species", "genus"])
foo.taxon_search("salmo", ["species", "genus"])

Results for ElasticSearch queries only (species query + genus query)

Prefix search:
(middle of the word)   ruptri: 0.028 + 0.122 = 0.150
(start of second word) enteri: 0.019 + 0.052 = 0.071
(start of first word)  salmo:  0.028 + 0.091 = 0.119 

Free search: 
(middle of the word)   ruptri: 0.126 + 0.212 = 0.328 (2.2x slower)
(start of second word) enteri: 0.123 + 0.253 = 0.376 (5.3x slower)
(start of first word)  salmo:  0.138 + 0.229 = 0.367 (3.1x slower)
  • Since our current bottleneck for most use cases is not ElasticSearch but the filter by samples and projects that happens afterwords, I propose deploying this change and evaluate the performance impact on 3/4 weeks.

Tiago Carvalho added 2 commits June 29, 2019 21:13
Sanitize tax_levels to avoid SQL injection.
Copy link
Contributor

@jshoe jshoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jshoe jshoe merged commit 5e0901a into master Jul 1, 2019
@cdebourcy cdebourcy changed the title [Taxon Search] Search text on any part of the word and avoid SQL injection. [Taxon Search] Search text on any part of the word and refactor tax_level Oct 22, 2019
@cdebourcy cdebourcy changed the title [Taxon Search] Search text on any part of the word and refactor tax_level [Taxon Search] Search text on any part of the word and sanitize tax_level Oct 22, 2019
@jshoe jshoe deleted the tiago/discovery/09-extend-free-text-search branch January 11, 2020 02:06
kislyuk pushed a commit that referenced this pull request Sep 28, 2022
* migrate dropdowns

* clean up migration

* fix es-lint errors

* add props to PortalDropdown

* remove unused variables

* remove more unused variables
kislyuk pushed a commit that referenced this pull request Sep 29, 2022
kislyuk pushed a commit that referenced this pull request Sep 29, 2022
…ew copy (#2389)

* add sample_name to amr wdl

* create bulk download for amr combined results

* Revert "TS Migrate Dropdowns Folder (part of Phase 1) (#2372)"

This reverts commit 973440f.

* fix arguments

* change output to output_path

* remove .headers

* tweak

* tweak .tsv to .csv
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants