Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finding per-dictionary character classes #17

Closed
rrthomas opened this issue Dec 13, 2016 · 5 comments
Closed

Finding per-dictionary character classes #17

rrthomas opened this issue Dec 13, 2016 · 5 comments
Milestone

Comments

@rrthomas
Copy link
Contributor

Enchant seems to provide no way to find a given dictionary's definition of code points that can appear in a word (lazily, its word character classes).

Emacs's spelling code requires two essential pieces of information:

  1. A class of "case characters".
  2. A class of "otherchars", which are characters not in "case characters" that are allowed in words, e.g. dash, apostrophe.

For most of the underlying spelling engines, such as aspell and hunspell, this information can be found in the underlying dictionaries (and Emacs does so when using them directly).

However, Enchant does not seem to expose this information. This begs the question, what do other Enchant clients do?

@rrthomas
Copy link
Contributor Author

@JMLas, any comments?

@JMLas
Copy link

JMLas commented Dec 19, 2016

As far as our use in LyX is concerned, we have a bug open somewhere telling that we should use this kind of information ;) But it never occured to me that the information was available in the dictionaries. I always thought that emacs used its own per-language list of special characters.

@rrthomas
Copy link
Contributor Author

rrthomas commented Dec 19, 2016

Thanks, that's interesting. Emacs assumes [:alpha:] for ispell and aspell dictionaries, but for hunspell it parses the dictionary files to get the information. However, hunspell does not make this information available via their APIs, as far as I can tell.

[Comment edited to fix a couple of errors.]

@rrthomas
Copy link
Contributor Author

rrthomas commented Dec 20, 2016

Hunspell has get_wordchars, which returns the value of the WORDCHARS keyword in the current affix file (effectively Emacs's "otherchars"). The full class of word characters is that plus those characters considered to be letters.

@rrthomas rrthomas modified the milestone: 2.0 Feb 1, 2017
rrthomas added a commit to rrthomas/enchant that referenced this issue May 4, 2017
Use C99-style declarations.

Remove check of whether text to be checked is in Hebrew, as hspell already
does this (and in fact it’s not what we want: words in non-Hebrew are
treated as “empty” and therefore correct; this will have to be dealt with by
having the Enchant back-end reject words not in Hebrew, but probably it’s
better to have generic code to do this which detects words that contain
non-word characters for the given dictionary; however, that will require the
implementation of issue AbiWord#17).
rrthomas added a commit to rrthomas/enchant that referenced this issue Jul 25, 2017
Add enchant_dict_get_extra_word_characters, which returns a string of
non-letter characters that may occur in words, and
enchant_dict_is_word_character, which checks whether the given character is
valid as the first, last, or internal character in a word.
rrthomas added a commit that referenced this issue Jul 25, 2017
Fix issue #17: add new APIs for per-dictionary character classes
@rrthomas
Copy link
Contributor Author

Fixed by PR #139.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants