autocomplete return inconsistent results #42

sierra-moxon · 2022-11-18T17:36:56Z

see issue #40

cbizon · 2022-12-01T20:23:51Z

I'm not sure if this is totally a name-resolver issue. Basically the issue is that in the ui "x li" with a space doesn't return very many results. When I search in name resolver there are many results coming back, including many diseases. One thing I did notice is that there is a higher proportion of non-disease results for this string as opposed to "x-li". (Note that name-resolver doesn't know about types).

So maybe there some interaction like UI asks for N results, but only 1 of those N is a disease with this little info. If we want to pursue this, I think that there are two ways forward:

Talk to the UI team and see if there's any way to improve the querying of name-resolver
Make name-resolver aware of types so that when you're querying for a specific slot in a question, you'll only get back results of the correct type.

cbizon · 2022-12-01T20:24:36Z

@gprice1129 do you have insight into how the UI talks to name-resolver?

dnsmith124 · 2022-12-12T15:50:56Z

@cbizon You can check out the code for the autocomplete bar here: https://github.com/NCATSTranslator/ui-fe/blob/develop/src/Utilities/autocompleteFunctions.js

The main function is getAutocompleteTerms() on line 4. Essentially we send the input text to the name resolver, then format the returned object into an array of curies to send to the Node Normalizer. Then we throw out any non-diseases that are returned, and that's what the user sees.

cbizon · 2022-12-12T16:04:39Z

Thanks @dnsmith124 . How many results does the autocomplete pull back? Does it go back for more if a bunch get filtered out with the non-disease filter?

dnsmith124 · 2022-12-12T16:31:52Z

@cbizon Right now it only pulls back 20 results, and has no functionality built in to go back for more. The implementation is about as simple as could be due to the time constraints around the initial launch of the MVP.

cbizon · 2022-12-12T17:21:55Z

Sure, that makes sense. So one option would be to ask for a larger number of results, say 100. I'm not too sure what that would do to the time of the call though.

dnsmith124 · 2022-12-12T17:25:13Z

I'm going to do some testing to figure out how effective that change would be. In some cases I think a larger set of results from the name resolver would help, as the problem sometimes lies with too many of the returned results not being diseases. In other cases though we'll have to figure out another solution, as sometimes the name resolver only returns a few results, in which case asking for more won't really help much.

cbizon · 2022-12-12T17:42:08Z

Yep - I wonder if you have some examples of the few results case? It kinds of sounds like maybe in those cases there just isn't a good match?

dnsmith124 · 2022-12-12T17:46:06Z

Yep, I'm working on compiling a series of tests in a spreadsheet now, I'll share it here and in slack when I'm done!

dnsmith124 · 2022-12-12T18:58:32Z

Here's a link to the first several tests: https://docs.google.com/spreadsheets/d/1Xnh9RwSXOZp6rPs1gNXx5Sb_-aWVS72BJBF609a8Ctw/edit?usp=sharing

I set the limit to 100 for these tests, rather than 20. Initially it seems like for very short terms (2-4 characters) the full 100 results often return, but very few if any are diseases and so are thrown out. This sometimes leads to the odd behavior of gradually getting more results as you increase the length of your search term, which feels a bit counter intuitive. For example 'gene' places no results in the autocomplete, whereas 'genetic' returns 13 and 'genetic a' returns 23.

In those cases the ability to request specific types would certainly be helpful, but it would have to happen at the Name Resolver level otherwise I don't think it would work. I'll continue to update that doc as I perform more tests.

cbizon · 2022-12-13T14:48:40Z

I looked into how name-resolver handles punctuation. Basically, it removes any non-alphanumeric characters, but doesn't tokenize on them. So searching for x-linked is equivalent to searching on xlinked not x linked. That explains the results that David shows above, I think, but leaves open the question of whether or not this is the right thing to do. I think in this case that it probably isn't, but that there may be other cases (like chemical names) where it might be the right thing...

@gaurav do you have any insights?

sierra-moxon · 2023-01-04T21:20:37Z

friends and family testing revealed a similar/same issue as the one found here, documented in #85

sierra-moxon · 2023-01-26T21:26:10Z

Hi @gaurav - was this issue handled in the latest release of name-resolver? :). maybe we could close if so?

gaurav · 2023-01-28T00:01:41Z

I think this is fixed on NameRes RENCI and ITRB-CI by TranslatorSRI/NameResolution#33. That should be pushed to ITRB-Test by mid next week and to ITRB-Prod soon thereafter. I'm basing this by trying "beta-sito" on ITRB-Prod (returns no results) and on RENCI/ITRB-CI (returns a bunch of results, including PUBCHEM.COMPOUND:222284 (beta-Sitosterol).

If you know of additional synonyms I can test (and, more importantly, add to my test suite!), please let me know! Otherwise, I think we can close this until someone finds another synonym that's broken.

sierra-moxon · 2023-05-15T19:00:02Z

It looks like Cerebral palsy isn't returned in the autocomplete until I get to the sixth letter. I don't see the very nice "More" link either. Fewer letters than six also return inconsistent results (cere returns just two results with no More links for me to click)

sierra-moxon · 2023-05-26T16:53:03Z

from TAQA:
there are too many "cereb"s in the set (cerebral)
e.g. "meth" would return all the chemicals.

Chris: newest version of NN will let you search by type! :D -- rolling out soon.
We may want to sort the autocomplete results by the number of results in Translator. This would require a shared index of counts/nodes/etc across all our resources.

gaurav · 2023-07-28T06:18:04Z

I think this should be fixed now:

We've replaced the search function we use so that result sorting now boosts exact matches (e.g. "Cerebral palsy") and entire matches ("Cerebral palsy, inherited" or "Inherited cerebral palsy") over token matches out of order (e.g. "Palsy of the cerebral"). Matches are still case-insensitive.
We previously split the search query into tokens (e.g. "Cerebral palsy" -> "cerebral" OR "palsy"), but now we provide the search query to Solr verbatim so it can try to find the best match including special characters (which are now escaped).
"Show more"/"Show less" appears to be fully working on UI CI (https://ui.ci.transltr.io/).

Okay to close @sierra-moxon?

sierra-moxon assigned cbizon Nov 18, 2022

sierra-moxon mentioned this issue Nov 18, 2022

Autocomplete functionality on the home page limits autocomplete results without an indication that there are more matches with more typing. #40

Closed

sierra-moxon added the SRI Tooling label Nov 18, 2022

sierra-moxon mentioned this issue Nov 18, 2022

When I search for "Biermer anemia" (a synonym of "pernicious anemia" in MONDO:0008228), I get the official name of the MONDO term back in the autocomplete. But, as a user, it would be helpful to see the synonym somehow in the autocomplete. #10

Closed

cbizon assigned gprice1129 Dec 1, 2022

gprice1129 assigned dnsmith124 Dec 1, 2022

cbizon mentioned this issue Dec 1, 2022

Make NR type-aware TranslatorSRI/NameResolution#32

Closed

sierra-moxon assigned gaurav Jan 26, 2023

sierra-moxon mentioned this issue Feb 7, 2023

autocomplete not returning for "PASC/Long COVID-19", "multiple sclerosis" returns results with evidence(0) #96

Closed

sstemann added the UI - term selection identification of the specific node and context to be selected for a query label Mar 10, 2023

sierra-moxon closed this as completed Mar 10, 2023

sierra-moxon reopened this May 15, 2023

sierra-moxon changed the title ~~dashes in autocomplete return inconsistent results~~ autocomplete return inconsistent results May 15, 2023

sierra-moxon unassigned cbizon May 26, 2023

sierra-moxon added this to the July 31 milestone Jun 1, 2023

sierra-moxon added the autocomplete label Jun 1, 2023

sierra-moxon modified the milestones: B: July 31 , A: June 30 Jun 6, 2023

sierra-moxon closed this as completed Aug 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

autocomplete return inconsistent results #42

autocomplete return inconsistent results #42

sierra-moxon commented Nov 18, 2022 •

edited

Loading

cbizon commented Dec 1, 2022

cbizon commented Dec 1, 2022

dnsmith124 commented Dec 12, 2022

cbizon commented Dec 12, 2022

dnsmith124 commented Dec 12, 2022

cbizon commented Dec 12, 2022

dnsmith124 commented Dec 12, 2022

cbizon commented Dec 12, 2022

dnsmith124 commented Dec 12, 2022

dnsmith124 commented Dec 12, 2022

cbizon commented Dec 13, 2022

sierra-moxon commented Jan 4, 2023

sierra-moxon commented Jan 26, 2023

gaurav commented Jan 28, 2023

sierra-moxon commented May 15, 2023

sierra-moxon commented May 26, 2023

gaurav commented Jul 28, 2023

autocomplete return inconsistent results #42

autocomplete return inconsistent results #42

Comments

sierra-moxon commented Nov 18, 2022 • edited Loading

cbizon commented Dec 1, 2022

cbizon commented Dec 1, 2022

dnsmith124 commented Dec 12, 2022

cbizon commented Dec 12, 2022

dnsmith124 commented Dec 12, 2022

cbizon commented Dec 12, 2022

dnsmith124 commented Dec 12, 2022

cbizon commented Dec 12, 2022

dnsmith124 commented Dec 12, 2022

dnsmith124 commented Dec 12, 2022

cbizon commented Dec 13, 2022

sierra-moxon commented Jan 4, 2023

sierra-moxon commented Jan 26, 2023

gaurav commented Jan 28, 2023

sierra-moxon commented May 15, 2023

sierra-moxon commented May 26, 2023

gaurav commented Jul 28, 2023

sierra-moxon commented Nov 18, 2022 •

edited

Loading