-
Notifications
You must be signed in to change notification settings - Fork 16
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[search] Improve clnsig analysis (#486)
* improve clinvarVcf.CLNSIG analysis, so that we can autocomplete pathogen, without matching conflicting_classifications_of_pathogenicity. This is accomplished by blocking terms that start with “conflicting” and applying edge ngram transform in all other values. This gives a good balance between accuracy and sensitivity. With this, `clinvarVcf.CLNSIG:pathogenic` matches `likely_pathogenic` and `pathogenic`, even if the terms are found comma separated. However "conflicting_classifications_of_pathogenicity" will not match, if that is the only term ("conflicting_classifications_of_pathogenicity,pathogenic" would match however). Live on bystro-dev Test queries: 1. `other` matches anything with `other` as a complete term (meaning conflicting_classification_of_pathogenicity,other matches) 2. `pathogenic` https://bystro-dev.emory.edu/results?_id=6637a7caa0e17a1660ba743b&search&q=clinvarvcf.clnsig:pathogenic&size=10&from=0 3. `likely_pathogenic` https://bystro-dev.emory.edu/results?_id=6637a7caa0e17a1660ba743b&search&q=clinvarvcf.clnsig:likely_pathogenic&size=10&from=0 4. `(likely pathogenic)` https://bystro-dev.emory.edu/results?_id=6637a7caa0e17a1660ba743b&search&q=clinvarvcf.clnsig:(likely%20pathogenic)&size=10&from=0 5. `pa`, `pat`, `path`, `pathogen`, etc, all match the same set, because only "pathogenic" starts with these characters: https://bystro-dev.emory.edu/results?_id=6637a7caa0e17a1660ba743b&search&q=clinvarvcf.clnsig:pathog&size=10&from=0 6. `like`, `likel`, `likely` etc., all match terms starting with `likely`, such as `likely_pathogenic` and `likely_benign`: https://bystro-dev.emory.edu/results?_id=6637a7caa0e17a1660ba743b&search&q=clinvarvcf.clnsig:likely&size=10&from=0
- Loading branch information
Showing
3 changed files
with
77 additions
and
45 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters