Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LUCENE-9750: Hunspell: improve suggestions for mixed-case misspelled words #2332

Merged
merged 2 commits into from Feb 10, 2021
Merged

LUCENE-9750: Hunspell: improve suggestions for mixed-case misspelled words #2332

merged 2 commits into from Feb 10, 2021

Conversation

donnerpeter
Copy link
Contributor

Description

Fix a failing Hunspell repo test

Solution

Replicate Hunspell's logic around suggestion casing, especially mixed-case ones

Tests

i58202 from Hunspell repo, whatever that means

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Solr maintainers access to contribute to my PR branch. (optional but recommended)
  • I have developed this patch against the master branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.
  • I have added documentation for the Ref Guide (for Solr changes only).

@dweiss
Copy link
Contributor

dweiss commented Feb 10, 2021

i58202 from Hunspell repo, whatever that means

Bug report, maybe?

@@ -70,7 +70,7 @@

/** In-memory structure for the dictionary (.dic) and affix (.aff) data of a hunspell dictionary. */
public class Dictionary {
// Derived from woorm/ openoffice dictionaries.
// Derived from woorm/LibreOffice dictionaries.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

return spell(title) ? title : candidate;
if (Character.isUpperCase(original.charAt(0))) {
String title = Character.toUpperCase(candidate.charAt(0)) + candidate.substring(1);
if (title.contains(" ") || spell(title)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just out of curiosity - Hunspell doesn't take into account odd whitespace symbols (like non-breakable space), does it? I've spent a number of hours of my life debugging things that should work looking at the input only to turn out white spaces were not actually " "...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this place, the plain normal space is hardcoded

@dweiss dweiss merged commit c3166e1 into apache:master Feb 10, 2021
@donnerpeter
Copy link
Contributor Author

i58202 from Hunspell repo, whatever that means

Bug report, maybe?

Most likely, yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants