-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hunspell backend provided by node-spellchecker does not respect other dictionary encoding but UTF-8 #50
Comments
I am far away from any Ubuntu machine. Could you maybe please:
|
I had this problem in Portuguese when not using UTF-8 dictionary. Try to use the dictionaries from |
Why this flag in file is ignored? You should use right encoding reading dictionary. I don't want to have dictionary copy for each app. This dict is from ubuntu distr and works without problems in apps. |
Spell Right does not use preexisting hunspell. I am using module which is UTF-8 dependent and cannot use this flags. It may work other way around - maybe you can switch this flag & replace dictionary file in Ubuntu. |
This is not true. Not OS native service nor hunspell on Linux. And loading and using incompatible files without checking and warning is this extension error. |
I understand. I will have a look into this whether it can be resolved better way. |
node-spellchecker uses It's used to convert to lower/uppercase only. Text encoding to correct dictionary encoding is managed at app level. |
node-spellchecker always send text (expecting to be wide string) to hunspell library as utf8 ignoring dictionary encoding. |
Yes, I saw the part which apparently reads SET line some ago. And From some hints (e.g. from your CLI example) I deduce that it CAN be run UTF-8 on the front and native dictionary encoding at the back. The problem is not trivial however as it has not been solved for Atom so far. Still I consider it best back end module for spelling in VSCode on which I have elaborated more in #20266 a while ago. If you could dig in this a bit would be a great help! Thank you! |
(to previous comment) I think this is exactly how it should be: Hunspell is asked in UTF-8 on the front, does the conversion internally and responds in UTF-8 with acknowledgement & suggestions. |
Oh, and I am sorry my comment above was misleading because I have simplified things - I knew about this UTF-8 requirement (my first comment which somehow solves the issue) that's why I have stated that the module 'does not use the flags' which is partially not true as you have discovered on your own. |
If you want to compile patched node-spellchecker code it should be easy: surround each hunspell call in spellchecker_hunspell.cc with if (strcmp(hunspell->get_dict_encoding().c_str(), vscode->current_file_encoding) != 0) {
toDict=iconv_open(hunspell->get_dict_encoding().c_str(), vscode->current_file_encoding);
iconv(toDict,word,size_t,tmp_word,size_t);
hunspell->spell(tmp_word.c_str());
or
hunspell->add(tmp_word.c_str());
or
hunspell->suggest(&slist_tmp, tmp_word.c_str());
fromDict=iconv_open(vscode->current_file_encoding, hunspell->get_dict_encoding().c_str());
iconv(fromDict,slist_tmp[i],size_t,slist[i],size_t);
} else {
hunspell->spell(word.c_str());
or
hunspell->add(word.c_str());
or
hunspell->suggest(&slist, word.c_str());
} You can borrow chenc from hunspell tool. |
Right now it would be a bit of a mystery to me how to get |
I had this problem too when I used "system dictionaries" (on Ubuntu 16.04) by sym linking them to /ushr/share/hunspell (as you described in the readme) as those dictionaries were not in UTF-8. Maybe just add a short warning to the readme that the dictionary files have to be encoded in UTF-8... |
@Karuso33: That's exactly what I did in the very last release (1.1.16) few hours ago. Thanks. |
@bartosz-antosik Oh, my bad. |
@Karuso33: To the contrary! Thank you for supporting this idea! |
@Karuso33: I think I will keep the thread open to try to verify whether it is possible to heal the situation. P.S. As it seems you are using Spell Right on Linux, could you maybe comment on #51? Sorry for this but Linux support is pretty new and I am for some time far away from an Ubuntu machine, plus I do not use it on regular basis, so I would like to know if it has this issue and on which scale? |
I have examined solution suggested by @slodki few posts above and it has serious drawbacks in the shape proposed because it can only work on Linux (plus it is just a suggestion and it does not compile straight on etc.) whereas Hunspell is also used on Windows 7, and there is no iconv in typical node-gyp toolset. Some more code has to be written to support this conversion also on Windows. I would rather stay with the requirement for UTF-8 dictionaries for now as I cannot pass this much time for developing this solution. I would of course welcome every solution/help that could resolve this inconvenience. |
OK. But what about read dictionary encoding from node-spellchecker and display warning to the user when not UTF8? |
Please compare:
with Hunspell output:
Why
ż
injeż
is unknown character? Encoding problem maybe?The text was updated successfully, but these errors were encountered: