-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect words are highlighted in complete word quotation search (Hebrew script) #1570
Comments
This is a challenging solr highlighting issue. We have an exact match (unstemmed) field, and exact searches are matched against that. The exact search does bring up the correct results—and attempts to highlight them. But in the highlighting portion of that query, the highlighter uses edismax to search all query fields: geniza/geniza/corpus/solr_queryset.py Lines 161 to 167 in 782fdd9
Also, the Lines 176 to 195 in 782fdd9
This is a problem, because while the results are correct, it's highlighting partial matches within them when it should only be highlighting the exact matches. Simply adding @rlskoeser Any ideas off the top of your head as to how I might approach this? Is there a way we can get the exact match highlighted when partial matches are also present in the result? |
@blms I suspect that you are not getting highlighting back on It does make me feel like we're approaching the problem wrong. It seems to go against how Solr is meant to be used, and it it seems increasingly complicated; although I do remember that it took us a while to arrive at the nostem field as a solution for this problem. Because of this complexity there may be some trade-offs with the exact searching and highlighting that will be difficult to solve. |
That's super helpful, thank you @rlskoeser! I'd totally forgotten about the indexed vs stored concept in Solr. Great point also about the two separate fields for exact matches.
I agree with that, and I do think there are going to be additional problems; advanced users are really looking for exactly what was on PGPv3, a search that uses only exact character sequence matching, so anything we try to do with solr to match that is inevitably going to come up short. I do think the exact matching with double quotes, at least, is something that everyday users might expect to work, but some of the more complex use cases of this might not really be feasible with solr. I'll be curious to hear your thoughts on all this in our meeting Thursday! |
…highlight Highlight nostem matches on exact searches (#1570)
@blms works as it should! Also tested a few other words like "אלקמח" (which I know for a fact appear in the corpus)! Closing! |
testing notes
In the QA site:
"אלממ"
should highlight as expected in search results, when using double quotes.Describe the bug
Quotation search of words I know exist in the corpus brings up inexact matches as well.
To reproduce
Steps to reproduce the behavior:
"אלממ"
in double quotesExpected behavior
Quotation search in any script should bring up exact matches only.
The text was updated successfully, but these errors were encountered: