Finding per-dictionary character classes #17

rrthomas · 2016-12-13T21:11:26Z

Enchant seems to provide no way to find a given dictionary's definition of code points that can appear in a word (lazily, its word character classes).

Emacs's spelling code requires two essential pieces of information:

A class of "case characters".
A class of "otherchars", which are characters not in "case characters" that are allowed in words, e.g. dash, apostrophe.

For most of the underlying spelling engines, such as aspell and hunspell, this information can be found in the underlying dictionaries (and Emacs does so when using them directly).

However, Enchant does not seem to expose this information. This begs the question, what do other Enchant clients do?

rrthomas · 2016-12-17T18:29:57Z

@JMLas, any comments?

JMLas · 2016-12-19T09:55:01Z

As far as our use in LyX is concerned, we have a bug open somewhere telling that we should use this kind of information ;) But it never occured to me that the information was available in the dictionaries. I always thought that emacs used its own per-language list of special characters.

rrthomas · 2016-12-19T11:06:40Z

Thanks, that's interesting. Emacs assumes [:alpha:] for ispell and aspell dictionaries, but for hunspell it parses the dictionary files to get the information. However, hunspell does not make this information available via their APIs, as far as I can tell.

[Comment edited to fix a couple of errors.]

rrthomas · 2016-12-20T22:08:24Z

Hunspell has get_wordchars, which returns the value of the WORDCHARS keyword in the current affix file (effectively Emacs's "otherchars"). The full class of word characters is that plus those characters considered to be letters.

Use C99-style declarations. Remove check of whether text to be checked is in Hebrew, as hspell already does this (and in fact it’s not what we want: words in non-Hebrew are treated as “empty” and therefore correct; this will have to be dealt with by having the Enchant back-end reject words not in Hebrew, but probably it’s better to have generic code to do this which detects words that contain non-word characters for the given dictionary; however, that will require the implementation of issue AbiWord#17).

Add enchant_dict_get_extra_word_characters, which returns a string of non-letter characters that may occur in words, and enchant_dict_is_word_character, which checks whether the given character is valid as the first, last, or internal character in a word.

Fix issue #17: add new APIs for per-dictionary character classes

rrthomas · 2017-07-25T10:19:40Z

Fixed by PR #139.

rrthomas modified the milestone: 2.0 Feb 1, 2017

rrthomas mentioned this issue Feb 18, 2017

Ensure that Enchant can be used by portable/relocatable applications #86

Closed

rrthomas added a commit that referenced this issue Jul 25, 2017

Merge pull request #139 from rrthomas/master

b870e05

Fix issue #17: add new APIs for per-dictionary character classes

rrthomas closed this as completed Jul 25, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Finding per-dictionary character classes #17

Finding per-dictionary character classes #17

rrthomas commented Dec 13, 2016

rrthomas commented Dec 17, 2016

JMLas commented Dec 19, 2016

rrthomas commented Dec 19, 2016 •

edited

Loading

rrthomas commented Dec 20, 2016 •

edited

Loading

rrthomas commented Jul 25, 2017

Finding per-dictionary character classes #17

Finding per-dictionary character classes #17

Comments

rrthomas commented Dec 13, 2016

rrthomas commented Dec 17, 2016

JMLas commented Dec 19, 2016

rrthomas commented Dec 19, 2016 • edited Loading

rrthomas commented Dec 20, 2016 • edited Loading

rrthomas commented Jul 25, 2017

rrthomas commented Dec 19, 2016 •

edited

Loading

rrthomas commented Dec 20, 2016 •

edited

Loading