Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transform typographic quotes in searches #282

Closed
3 tasks done
thatbudakguy opened this issue Apr 2, 2019 · 6 comments
Closed
3 tasks done

Transform typographic quotes in searches #282

thatbudakguy opened this issue Apr 2, 2019 · 6 comments
Assignees
Milestone

Comments

@thatbudakguy
Copy link
Contributor

thatbudakguy commented Apr 2, 2019

as noted in this twitter thread, iOS inserts typographic quotes by default when typing. compare:

“common meter” vs "common meter"

this causes them to be stripped out by solr and not have the intended effect of an exact term search (common OR meter vs "common meter"). we need to find a way to transform the typographic quotes to regular quotes, likely with a regex applied by solr.

testing notes

searching with exact phrases in typographic quotes should now result in an exact phrase. should work in:

  • keyword search field
  • title search field
  • author search field
@rlskoeser
Copy link
Contributor

Might be as simple as adding solr.PatternReplaceCharFilterFactory to our query filter with something like “([^“]+)” and replace with "$1"

We don't currently have separate analyzers for query and index; this is probably something that should be applied to queries only, so we'll have to break them out.

@rlskoeser rlskoeser self-assigned this Aug 18, 2022
rlskoeser added a commit that referenced this issue Aug 18, 2022
@rlskoeser rlskoeser added this to the v3.8 milestone Aug 18, 2022
@rlskoeser
Copy link
Contributor

downgrading from 2pts to 1 — simpler to do the transform in python, and do it as part of form cleaning; adapted from similar code that I just added to geniza codebase

@mnaydan
Copy link
Contributor

mnaydan commented Aug 19, 2022

@rlskoeser I'm having trouble testing this because the keyword search is broken - it's returning empty results, I'm guessing because of the clustering logic? I'm not sure if this is expected because you haven't figured out the search problem yet or if this is an unexpected issue...

FWIW on iPad and iPhone the quotation marks LOOK like the straight ones above, not the italicized ones, so if you decide that is sufficient to close the issue that's fine with me.

Screen Shot 2022-08-19 at 2 36 49 PM

@mnaydan
Copy link
Contributor

mnaydan commented Aug 19, 2022

Oh, but I realized I can test title and author search, and the quotations are working there! I searched "art of English" and I searched "Bysshe, Edward" and got expected results (happy that Percy Bysshe Shelley didn't come up, which means quotes worked).

@rlskoeser
Copy link
Contributor

@mnaydan you're right, keyword searching in the main archive search is broken now! forgot about how that would impact testing here. However, it is working once you are searching within a cluster. I think if you could test phrase searching with typographic quotes it would be good enough to close this.

@mnaydan
Copy link
Contributor

mnaydan commented Aug 23, 2022

Didn't realize keyword search was working within clusters, thanks for letting me know! I tested two phrases ("delicacy of taste" and "Pope's Dunciad") and results match exactly so I'm confident it's working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants