You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The bad news is that both directly querying Solr and querying it through the NameRes frontend results in significantly worse results than we get with the old system. For example, querying https://name-resolution-sri.renci.org/docs for blood gives us UBERON:0000178, NCIT:C12434 and UMLS:C0851353 (all meaning "blood") followed by UMLS:C0851353 ("bloody"). But running the same query on http://name-resolution-sri-dev.apps.renci.org/docs gives us UMLS:C5169928 ("JWH-073 3-hydroxybutyl (synthetic cannabinoid metabolite) | Blood | Drug toxicology"), UMLS:C5171063 ("Lindane | Blood | Drug toxicology"), UMLS:C0312901 ("Blood group antigen IBH") and a bunch of others.
One possible reason for this is that I've indexed the names field as a multiValued field (since it contains multiple values). Changing it to a non-multiValued field definitely helps with the results in Solr, but it causes NameRes to no longer work. I'll try fixing that and see if that solves this bug. If not, I'll probably need some help with the Solr querying and indexing aspect of all this.
The text was updated successfully, but these errors were encountered:
This seems to be caused by the query being names:{fragment}*. Removing the asterisk fixing this problem, and the query (preferred_name:{fragment}^10 OR names:{fragment} OR names:{fragment}*) works pretty well:
# Using names:{fragment}* causes Solr to prioritize some odd results;
# using names:{fragment} OR names:{fragment}* should cause it to still
# include those results while prioritizing complete fragments.
f"(preferred_name:{fragment}^10 OR names:{fragment} OR names:{fragment}*)"
forfragmentinfragmentsiflen(fragment) >0
]
((preferred_name:{fragment}^10 OR names:{fragment}* still prioritizes odd results over anything that isn't a preferred-name match, and (preferred_name:{fragment}^10 OR names:{fragment} fails to match when the fragment is incomplete, i.e. Alzheimer disease matches but Alzheimer's disease fails.)
I've set up a NameRes instance on Sterling (accessible in the RENCI VPN only) at http://name-resolution-sri-dev.apps.renci.org/docs using the new synonym format we've built for NameRes (#46, helxplatform/translator-devops#634, TranslatorSRI/Babel#113).
You can also directly access the underlying Solr database by running:
and then accessing http://localhost:8983/ on your computer.
The bad news is that both directly querying Solr and querying it through the NameRes frontend results in significantly worse results than we get with the old system. For example, querying https://name-resolution-sri.renci.org/docs for blood gives us UBERON:0000178, NCIT:C12434 and UMLS:C0851353 (all meaning "blood") followed by UMLS:C0851353 ("bloody"). But running the same query on http://name-resolution-sri-dev.apps.renci.org/docs gives us UMLS:C5169928 ("JWH-073 3-hydroxybutyl (synthetic cannabinoid metabolite) | Blood | Drug toxicology"), UMLS:C5171063 ("Lindane | Blood | Drug toxicology"), UMLS:C0312901 ("Blood group antigen IBH") and a bunch of others.
Searching with Solr gives slightly more relevant results, but not the really good results that https://name-resolution-sri.renci.org/docs gives.
One possible reason for this is that I've indexed the
names
field as amultiValued
field (since it contains multiple values). Changing it to a non-multiValued field definitely helps with the results in Solr, but it causes NameRes to no longer work. I'll try fixing that and see if that solves this bug. If not, I'll probably need some help with the Solr querying and indexing aspect of all this.The text was updated successfully, but these errors were encountered: