Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

As a frontend user, I want smart quotes to be converted to normal quotation marks in order to get proper results in the search field. #789

Closed
richmanrachel opened this issue Apr 21, 2022 · 11 comments
Assignees

Comments

@richmanrachel
Copy link

Is your feature request related to a problem? Please describe.
Keyboard conversions of smart quotation marks are inhibiting users from getting proper strings of words in front end searches.

Describe the solution you'd like
The code to automatically convert smart quotation marks to normal quotation marks.

Additional context
Both MR and RR are having trouble with this on both desktop and mobile.

@rlskoeser rlskoeser self-assigned this Aug 18, 2022
@rlskoeser rlskoeser added the 🗜️ awaiting testing Implemented and ready to be tested label Aug 18, 2022
@richmanrachel
Copy link
Author

It is working in English! Not sure it's working properly with Arabic script though... the results I'm getting for this two word phrase ("تعطل شغله") are returning responses with just one of the words - is this expected?
https://test-geniza.cdh.princeton.edu/en/documents/?q=%22%D8%AA%D8%B9%D8%B7%D9%84+%D8%B4%D8%BA%D9%84%D9%87%22&docdate_0=&docdate_1=&sort=relevance

@rlskoeser
Copy link
Contributor

@richmanrachel I suspect that is a flaw with how I implemented the arabic to judaeo-arabic search! The OR syntax I'm using must not be playing nicely with the phrase. It looks like we get the same behavior in production. Do you want to open a bug issue to track it? While I revisit maybe I can also tweak the boosting so arabic matches get higher priority.

side note: when I was testing just now, I clicked into one of the records with a Judaeo-Arabic match and then tried to ctrl-f for the search term! which of course doesn't work (even if the whole phrase was there). Kind of a weird experience

@richmanrachel
Copy link
Author

@rlskoeser - good idea. Sorry we're having to introduce you to the inequalities of the RTL internet!

@richmanrachel
Copy link
Author

@rlskoeser - On the test site, I tried a two-word phrase in Hebrew script (״שגל בית״) and it's also returning phrases with only one of the two words. It looks like the quotes are reverting to smart quotes, but I can't tell...

@rlskoeser
Copy link
Contributor

@richmanrachel thanks for testing more, looks like this is more complicated than I thought! I probably need to look again at the queries being generated in those cases and probably add some automated tests with hebrew and arabic phrases (thanks for the examples). Can you tell if phrase searching is working in production with straight quotes and RTL text? That is, is this a new problem or is it likely related to the other RTL search woes?

@rlskoeser
Copy link
Contributor

wow! two different things happening here:

  • the quote you're using in the Hebrew example above is different from the one I was converting; I'll add that to the converted characters list
  • the arabic or judaeo-arabic regex I wrote breaks the exact phrase (it treats the quotes as part of the alternate forms of the word); this is a bug, but I'll see if I can fix it

@richmanrachel
Copy link
Author

@rlskoeser - how strange!

I tested the phrase "כדמת אלדאר" with unicode quotes and it worked on the production and test sites properly.

@rlskoeser
Copy link
Contributor

@richmanrachel I pushed some changes to the test site, but now I'm not sure it's working at all for hebrew/arabic... the two test cases you shared before aren't returning any results for me. Want to test a bit and see if you can figure anything out about what's going on?

@richmanrachel
Copy link
Author

I'm still getting 1 result for "כדמת אלדאר" (which I guess is all there is, unfortunately for my dissertation, haha).

The Arabic one still isn't working with quotes (you can see here that it does pull one exact phrase without them):
image

@rlskoeser
Copy link
Contributor

@richmanrachel I can't get exact phrase searches to work at all with Arabic content with straight quotes, either on the test site or in production. Does it work at all for you? If it doesn't, then that is a separate problem from this one and we should open a new bug issue.

@richmanrachel
Copy link
Author

@rlskoeser - no, the Arabic content still is not working with quotations at all. I'll open a new bug report.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants