User Dictionary Encoding #183

xylographe · 2019-06-25T12:05:47Z

When trying to add an unknown word to the user dictionary, I get,

“Word cannot be added — Sadly, this word contains symbols out of current dictionary encoding, thus it cannot be added to user dictionary. You can convert this dictionary to UTF8 (don't forget to change the line SET {encoding} in .aff file) or choose the different one with appropriate encoding.”

No real surprise, the language is Dutch, and there is no standard (ISO/Windows) fixed-width 8-bit encoding that can represent all Dutch characters. Hence, UTF-8 (or some other variable-width Unicode Transformation Format) is mandatory.

What I don't understand is, how to convert this user dictionary to UTF-8. There is no .aff file, only a nl_NL.usr file. Inserting a UTF-8 BOM at the start of nl_NL.usr doesn't help. What shall I do?

The text was updated successfully, but these errors were encountered:

Predelnik · 2019-06-25T12:17:58Z

First of all do you download Dutch dictionary from default location e.g. https://github.com/LibreOffice/dictionaries?
It seems to be already in UTF-8 that's why the error seems a bit strange but maybe it's incorrectly reported. Secondly if there's no problem can you tell me which word you are trying to add since it may depend on letters in it, etc.?

Also it's strange that you don't see .dic and .aff files, they should be either in %appdata%\Notepad++\plugins\config\Hunspell\ or directly beside plugin in c:\Program Files\Notepad++\plugins\ or similar.

If I remember correctly this error appears when the dictionary is some non-utf8 encoding and you are adding a word which contains symbols not available in that encoding, for example if you are trying to add word with Chinese characters to English dictionary which uses ISO8859-1 encoding currently.

xylographe · 2019-06-25T13:11:31Z

Thank you for the fast reply.

I'm beginning to understand now. The user dictionary must have the same encoding as the main dictionary. I was looking in %APPDATA%\Notepad++\plugins\Hunspell (with fr-FR.usr, nl-NL.usr, etc.), when I should have been looking in <NPP>\plugins\DSpellCheck\Hunspell. The nl-NL.aff file in the latter directory did have SET ISO8859-1. I could have sworn it was downloaded by DSpellCheck, but apparently not. After removing the existing nl-NL.aff and nl-NL.dic, I had DSpellCheck download new ones from https://github.com/LibreOffice/dictionaries, and this new nl-NL.aff has SET UTF-8. Everything is working fine again, like it has been for quite some time (on previous computers).

BTW the Dutch word was stĳl with U+0133 – latin small ligature ij, which obviously cannot work in an ISO8859-1 encoding. :) The new dictionary recognises it out of the box.

Thank you very much for your support, and, last but not least, for providing and maintaining this great plug-in!

Predelnik · 2019-06-25T14:06:52Z

You're welcome and thank you for putting the work to figure all this out! Maybe adding the absolute path to e.g. the .aff file in error message would actually be helpful for the people with issues like this one in the future.

xylographe · 2019-06-26T01:40:28Z

Yes, agreed, the absolute path to the .aff would have helped.

With Get-ChildItem (powershell) I managed to track down all .aff and .dic files. I found no less than 48 dictionaries for six different languages, and a dozen others for languages I don't even understand. :-)
After examining the .aff files I copied the most recent ones to %PROGRAMFILES%\Common Files\Hunspell, and replaced two of them with updates I found via Google. Finally, I removed all the other dictionaries, replacing them with hard links to the .aff and .dic files in the new Hunspell directory. So now all applications use the same set of good dictionaries, and when a dictionary is updated, the update will be immediately available to all applications.

Though it took nearly three hours to get there, I'm very happy and contented with the result. 😄

endolith · 2020-06-17T16:20:52Z

Shouldn't everything just be UTF8 by default? I am getting the same error for trying to add μs to the dictionary.

Predelnik · 2020-06-18T08:13:24Z

@endolith
That's the question you should address to dictionary owners e.g. LibreOffice (here's an issue in their repo LibreOffice/dictionaries#7)

What I can do however is support some repositories containing utf-8 dictionaries:
Like
https://github.com/titoBouzout/Dictionaries
or
https://github.com/wooorm/dictionaries
Currently they do not share the same directory structure and end up not being parsed correctly by plugin unfortunately.

endolith · 2020-06-18T13:40:12Z

I mean the .aff file and user dictionary in DSpellCheck / notepad++ should be UTF8 by default so we don't have to modify it to add words with special characters

Predelnik · 2020-06-19T06:40:40Z

@endolith I do not provide any dictionaries with plugin, all dictionaries are downloaded from some other source.

xylographe closed this as completed Jun 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

User Dictionary Encoding #183

User Dictionary Encoding #183

xylographe commented Jun 25, 2019

Predelnik commented Jun 25, 2019

xylographe commented Jun 25, 2019

Predelnik commented Jun 25, 2019

xylographe commented Jun 26, 2019

endolith commented Jun 17, 2020

Predelnik commented Jun 18, 2020

endolith commented Jun 18, 2020

Predelnik commented Jun 19, 2020

User Dictionary Encoding #183

User Dictionary Encoding #183

Comments

xylographe commented Jun 25, 2019

Predelnik commented Jun 25, 2019

xylographe commented Jun 25, 2019

Predelnik commented Jun 25, 2019

xylographe commented Jun 26, 2019

endolith commented Jun 17, 2020

Predelnik commented Jun 18, 2020

endolith commented Jun 18, 2020

Predelnik commented Jun 19, 2020