-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EN DASH (U+2013) is not ignored by speller #1
Comments
Note that the set of characters considered part of legal words varies a bit from language to language. E.g. is colon ":" not part of words in English, Danish and Norwegian (and presumably Greenlandic), whereas it could or could not be a part of a legal word in Swedish, Finnish and the Sámi languages, where it is used as a separator between a stem and inflectional endings for acronyms, digits etc: CD:s (from SME), TV:n (swe) For these languages it is not a part of the word if it is the last char in the word - in that case it could be an indication of direct speech coming next, just as in e.g. Danish. |
Fixed in latest versions, http://apertium.projectjj.com/spellers/ The concern about which characters are legal where, is already part of the algorithm. The verbatim input is always tested first, before any manipulation to find a valid form is attempted. Whether MS Word cares about it is another matter. I have no control whatsoever over what MS Word decides to send to the speller as a token, nor can I inspect the context of a given token. I get what I get, and I better be happy with it. |
It seems that MS Word is still confused, at least the latest nightly build is still giving red underlines under these characters. MS Office 2010, 13, 16, Win7, 8, 10. |
…pellers#1 git-svn-id: http://svn.code.sf.net/p/hfst/code/trunk/hfst-ospell@4494 941e2c2b-deac-454f-805a-451daa25f33c
…pellers#1 git-svn-id: svn://svn.code.sf.net/p/hfst/code/trunk@4494 941e2c2b-deac-454f-805a-451daa25f33c
The following text will trigger a red underline in MS Word using the SME speller (version: Divvun-sme-2015.292.177.msi, 2015-10-19, 02:57):
– Fertejit čielga njuolggadusat
The words are accepted, but not the initial EN DASH.
The text was updated successfully, but these errors were encountered: