Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

autocomplete return inconsistent results #42

Closed
sierra-moxon opened this issue Nov 18, 2022 · 17 comments
Closed

autocomplete return inconsistent results #42

sierra-moxon opened this issue Nov 18, 2022 · 17 comments
Assignees
Labels
autocomplete SRI Tooling UI - term selection identification of the specific node and context to be selected for a query
Milestone

Comments

@sierra-moxon
Copy link
Member

sierra-moxon commented Nov 18, 2022

see issue #40

@cbizon
Copy link
Collaborator

cbizon commented Dec 1, 2022

I'm not sure if this is totally a name-resolver issue. Basically the issue is that in the ui "x li" with a space doesn't return very many results. When I search in name resolver there are many results coming back, including many diseases. One thing I did notice is that there is a higher proportion of non-disease results for this string as opposed to "x-li". (Note that name-resolver doesn't know about types).

So maybe there some interaction like UI asks for N results, but only 1 of those N is a disease with this little info. If we want to pursue this, I think that there are two ways forward:

  1. Talk to the UI team and see if there's any way to improve the querying of name-resolver
  2. Make name-resolver aware of types so that when you're querying for a specific slot in a question, you'll only get back results of the correct type.

@cbizon
Copy link
Collaborator

cbizon commented Dec 1, 2022

@gprice1129 do you have insight into how the UI talks to name-resolver?

@dnsmith124
Copy link
Collaborator

@cbizon You can check out the code for the autocomplete bar here: https://github.com/NCATSTranslator/ui-fe/blob/develop/src/Utilities/autocompleteFunctions.js

The main function is getAutocompleteTerms() on line 4. Essentially we send the input text to the name resolver, then format the returned object into an array of curies to send to the Node Normalizer. Then we throw out any non-diseases that are returned, and that's what the user sees.

@cbizon
Copy link
Collaborator

cbizon commented Dec 12, 2022

Thanks @dnsmith124 . How many results does the autocomplete pull back? Does it go back for more if a bunch get filtered out with the non-disease filter?

@dnsmith124
Copy link
Collaborator

@cbizon Right now it only pulls back 20 results, and has no functionality built in to go back for more. The implementation is about as simple as could be due to the time constraints around the initial launch of the MVP.

@cbizon
Copy link
Collaborator

cbizon commented Dec 12, 2022

Sure, that makes sense. So one option would be to ask for a larger number of results, say 100. I'm not too sure what that would do to the time of the call though.

@dnsmith124
Copy link
Collaborator

I'm going to do some testing to figure out how effective that change would be. In some cases I think a larger set of results from the name resolver would help, as the problem sometimes lies with too many of the returned results not being diseases. In other cases though we'll have to figure out another solution, as sometimes the name resolver only returns a few results, in which case asking for more won't really help much.

@cbizon
Copy link
Collaborator

cbizon commented Dec 12, 2022

Yep - I wonder if you have some examples of the few results case? It kinds of sounds like maybe in those cases there just isn't a good match?

@dnsmith124
Copy link
Collaborator

Yep, I'm working on compiling a series of tests in a spreadsheet now, I'll share it here and in slack when I'm done!

@dnsmith124
Copy link
Collaborator

Here's a link to the first several tests: https://docs.google.com/spreadsheets/d/1Xnh9RwSXOZp6rPs1gNXx5Sb_-aWVS72BJBF609a8Ctw/edit?usp=sharing

I set the limit to 100 for these tests, rather than 20. Initially it seems like for very short terms (2-4 characters) the full 100 results often return, but very few if any are diseases and so are thrown out. This sometimes leads to the odd behavior of gradually getting more results as you increase the length of your search term, which feels a bit counter intuitive. For example 'gene' places no results in the autocomplete, whereas 'genetic' returns 13 and 'genetic a' returns 23.

In those cases the ability to request specific types would certainly be helpful, but it would have to happen at the Name Resolver level otherwise I don't think it would work. I'll continue to update that doc as I perform more tests.

@cbizon
Copy link
Collaborator

cbizon commented Dec 13, 2022

I looked into how name-resolver handles punctuation. Basically, it removes any non-alphanumeric characters, but doesn't tokenize on them. So searching for x-linked is equivalent to searching on xlinked not x linked. That explains the results that David shows above, I think, but leaves open the question of whether or not this is the right thing to do. I think in this case that it probably isn't, but that there may be other cases (like chemical names) where it might be the right thing...

@gaurav do you have any insights?

@sierra-moxon
Copy link
Member Author

friends and family testing revealed a similar/same issue as the one found here, documented in #85

@sierra-moxon
Copy link
Member Author

Hi @gaurav - was this issue handled in the latest release of name-resolver? :). maybe we could close if so?

@gaurav
Copy link

gaurav commented Jan 28, 2023

I think this is fixed on NameRes RENCI and ITRB-CI by TranslatorSRI/NameResolution#33. That should be pushed to ITRB-Test by mid next week and to ITRB-Prod soon thereafter. I'm basing this by trying "beta-sito" on ITRB-Prod (returns no results) and on RENCI/ITRB-CI (returns a bunch of results, including PUBCHEM.COMPOUND:222284 (beta-Sitosterol).

If you know of additional synonyms I can test (and, more importantly, add to my test suite!), please let me know! Otherwise, I think we can close this until someone finds another synonym that's broken.

@sierra-moxon
Copy link
Member Author

It looks like Cerebral palsy isn't returned in the autocomplete until I get to the sixth letter. I don't see the very nice "More" link either. Fewer letters than six also return inconsistent results (cere returns just two results with no More links for me to click)

Screen Shot 2023-05-15 at 11 56 17 AM

@sierra-moxon sierra-moxon changed the title dashes in autocomplete return inconsistent results autocomplete return inconsistent results May 15, 2023
@sierra-moxon
Copy link
Member Author

from TAQA:
there are too many "cereb"s in the set (cerebral)
e.g. "meth" would return all the chemicals.

Chris: newest version of NN will let you search by type! :D -- rolling out soon.
We may want to sort the autocomplete results by the number of results in Translator. This would require a shared index of counts/nodes/etc across all our resources.

@gaurav
Copy link

gaurav commented Jul 28, 2023

I think this should be fixed now:

  1. We've replaced the search function we use so that result sorting now boosts exact matches (e.g. "Cerebral palsy") and entire matches ("Cerebral palsy, inherited" or "Inherited cerebral palsy") over token matches out of order (e.g. "Palsy of the cerebral"). Matches are still case-insensitive.
  2. We previously split the search query into tokens (e.g. "Cerebral palsy" -> "cerebral" OR "palsy"), but now we provide the search query to Solr verbatim so it can try to find the best match including special characters (which are now escaped).
  3. "Show more"/"Show less" appears to be fully working on UI CI (https://ui.ci.transltr.io/).

Okay to close @sierra-moxon?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
autocomplete SRI Tooling UI - term selection identification of the specific node and context to be selected for a query
Projects
None yet
Development

No branches or pull requests

6 participants