Words with accents show as misspelled #241

ventolinmx · 2018-03-29T02:53:28Z

Prerequisites

[X ] Put an X between the brackets on this line if you have done all of the following:
- Reproduced the problem in Safe Mode: http://flight-manual.atom.io/hacking-atom/sections/debugging/#using-safe-mode
- Followed all applicable steps in the debugging guide: http://flight-manual.atom.io/hacking-atom/sections/debugging/
- Checked the FAQs on the message board for common solutions: https://discuss.atom.io/c/faq
- Checked that your issue isn't already filed: https://github.com/issues?utf8=✓&q=is%3Aissue+user%3Aatom
- Checked that there is not already an Atom package that provides the described functionality: https://atom.io/packages

Description

On .md and .txt files spanish words with accents showed as misspelled but they are correct. Using aspell es-ES locales.

Steps to Reproduce

Type spanish words with accents.
Save file as .md or .txt.
Activate spanish locale.
Restart.
Open same file.

Expected behavior: Spell-check should recognize correct words with accents.

Actual behavior: Atom underlines all words with accents, although they are correct.

Reproduces how often: Always.

Versions

Atom : 1.23.3
Electron: 1.6.15
Chrome : 56.0.2924.87
Node : 7.4.0

apm 1.18.12
npm 3.10.10
node 6.9.5 x64
atom 1.23.3
python 2.7.13
git 2.11.0

Debian 9.

Additional Information

Tried checking the same file with aspell on command line and works fine. It recognizes words with accents as correct. Also tried different encodings.

lierdakil · 2018-04-21T12:09:01Z

More likely than not, it's a problem with your dictionary. Atom doesn't deal well with dictionaries that are not UTF-8 encoded.

dvictori · 2018-07-12T13:44:08Z

Probably related to #212 ?

@lierdakil : How to obtain UTF-8 encoded dictionaries?

dmoonfire · 2018-07-12T13:48:33Z

@dvictori: I think #212 is definitely going to cause you problems even with a UTF-8 dictionary. There is a defect on node-spellchecker that is trying to fix that. Until that is resolved, I don't know if we can do much more.

lierdakil · 2018-07-12T13:52:45Z

@dvictori Just find some? e.g. https://github.com/wooorm/dictionaries

lierdakil · 2018-07-12T14:05:52Z

@dmoonfire, I don't have any issues described in #212. Gentoo Linux, Atom 1.28.0, LANG=ru_RU.UTF-8

I probably would if my de-DE dictionary was, say, cp1252-encoded.

dmoonfire · 2018-07-12T14:11:23Z

@lierdakil: I stand corrected. Does it show it spelled correctly if you have Löwen?

lierdakil · 2018-07-12T14:19:35Z

Yes, it does:

Additionally, I've tried converting my dictionary from UTF8 to ISO8859-1 (as is common with extended latin hunspell dictionaries), and here's what I've got:

Looks suspiciously similar to #212 I believe.

dmoonfire · 2018-07-12T14:22:40Z

Oh, I know why you are behaving. I found that the .UTF-8 fixes the problem. However most people don't have that in their language settings so it didn't pick it up correctly. So, my LANG=en_US couldn't handle a UTF-8 dictionary either because of node-spellcheck didn't switch the locale() to UTF-8.

I suspect if you just had LANG=ru_RU it may misbehave.

lierdakil · 2018-07-12T14:31:37Z

If I just had LANG=ru_RU, IIRC my default system encoding would be KOI8-R, which is a chthonic abomination from the dawn of the computer era that must be killed with fire :) So thanks but no thanks, I quite like my UTF-8 terminals that can handle more than two languages.

I was under the impression that modern Linux distributions prefer UTF-8 locales. Pretty sure at least Arch and Gentoo do.

dvictori · 2018-07-12T14:34:56Z

@lierdakil Bingo! I used the dictionaries from wooorm and now atom spell check is working. Just hope it won't break any other program. So far, libreoffice and firefox spell check looks fine.

It would be nice though, for users less technically inclined, to be able to use their native dictionary, that comes with the operating system, without having to change the file.

lierdakil · 2018-07-12T14:42:23Z

@dmoonfire, FWIW, running Atom with env LANG=en_US atom doesn't seem to change the behaviour any. That is, UTF-8 dictionaries are still working.
EDIT: LANG=en_US.ISO-8859-1 doesn't seem to have any effect either.

ventolinmx · 2018-07-12T21:25:19Z

So i installed wooorm's spanish UTF-8 dictionary with npm install dictionary-es and it behaves the same. Do i need to configure this in Atom somewhere to activate the UTF dictionary? I have a special locale mix in Debian, using en_US LANG, but changing this to spanish has the same problem.

lierdakil · 2018-07-13T11:33:05Z

@ventolinmono, you can point Atom to the directory where you installed the dictionary. Check spell-check settings.

dvictori · 2018-07-13T19:56:03Z

I just copied the files from wooorm repository to /usr/share/hunspell and renamed to the correct locale. So dictionaries/pt-BR/index.dic from wooorm became /usr/share/hunspell/pt_BR.dic. A very ugly hack, I might say.

dmoonfire · 2018-07-13T20:24:05Z

I never know about wooorm's dictionaries. They have a MIT license, so that is reasonable. If the UTF-8 is the only thing needed, I'll try creating a couple Atom packages to install specific language dictionaries and see if that behaves; the plugin system for spell-check is designed for that.

wooorm · 2018-07-19T21:56:46Z

@dmoonfire They do not have an MIT license. Every dictionary comes with a different license!

edusantana · 2018-07-28T21:13:38Z

Here's a problem the I have with this. $LANG = pt_BR.UTF-8
Ubuntu 16.04.

elissonmichael · 2018-10-24T16:50:56Z

I just copied the files from wooorm repository to /usr/share/hunspell and renamed to the correct locale. So dictionaries/pt-BR/index.dic from wooorm became /usr/share/hunspell/pt_BR.dic. A very ugly hack, I might say.

@edusantana this worked for me!

ghost · 2018-10-28T12:32:34Z

On archlinux, I solved it by doing: iconv -t UTF-8 -f ISO-8859-1 /usr/share/hunspell/YOURDIC.dic > /usr/share/hunspell/YOURDIC.dic. It's simply an issue of encoding.

ferenczy · 2018-11-08T22:27:47Z

I would really like to avoid converting my dictionaries into UTF-8 encoding. I'm using original dictionaries from LibreOffice, sharing them between multiple applications and I'm not sure they'll be still working after the conversion. Sure, I can try it but I would like to avoid the conversion every time I update the dictionaries anyway.

The .aff file contains the encoding the dictionary is using at the very first line (in my case it's SET ISO8859-2) so it should be easy to read it and use it without any user intervention.

ghost · 2018-11-09T13:13:20Z

@ferenczy Definitely. I found these issues: LibreOffice/dictionaries#7 in the libreoffice repo. And atom/node-spellchecker#89 in atom itself

dmoonfire · 2018-11-09T14:36:04Z

Ideally, a conversion shouldn't be needed because most dictionary files tell you their encoding. I'm trying to get back on this to look at it, I think the underlying problem is at the C++ layer which is no longer my strength, but I have a few obligations that are getting in the way. I want to fix this, mainly because it is driving me nuts too. :)

edusantana · 2018-12-26T20:39:22Z

@dmoonfire any luck with that? Any work around?
I have converted those file to UTF-8 and replaced the SET UFT-8 e added the FLAG UTF-8 but I still have this problem.

dmoonfire · 2018-12-26T21:37:22Z

@edusantana: Over the last week, I worked on a PR for node-spellchecker which should fix the encoding errors that were happening between Hunspell and Javascript. If all goes well, I can get that verified and rolled into Atom. It should handle most of the accented word problems. It also doesn't require dictionaries to be in UTF-8 format either, so dropping them in should hopefully Just Work™.

atom/node-spellchecker#95

It just took me a while to figure out text encoding on C++ on four different platforms.

rbertoche · 2019-01-28T19:43:24Z

converting latin1 files to utf8 and changing the format tag did not work for me, as it somehow gets only a subset of the dictionary so it still shows correct words as misspelled.

Is there any way for me to configure a path for the dictionary in a way that this extension will get it? I don't want to risk losing other spellcheck tools as they are working properly

dmoonfire · 2019-04-12T21:03:17Z

Atom 1.37 has a fix for passing accented characters for spell-checking. It handles dictionaries files that aren't UTF-8 encoded. Could you please check with the beta and see if it solves the problem? Thank you.

edusantana · 2019-04-13T19:12:34Z

@dmoonfire I will try it... Thanks!!! It works now!!! Look!

dmoonfire · 2020-09-04T01:07:21Z

It sounds like this is resolved, so I'm going to close this issue. Feel free to open a new one.

palant mentioned this issue Nov 30, 2018

Words with Polish characters show as misspelled #266

Closed

rsese mentioned this issue May 13, 2019

Problem with czech special symbols #306

Closed

dmoonfire closed this as completed Sep 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Words with accents show as misspelled #241

Words with accents show as misspelled #241

ventolinmx commented Mar 29, 2018 •

edited

Loading

lierdakil commented Apr 21, 2018

dvictori commented Jul 12, 2018

dmoonfire commented Jul 12, 2018

lierdakil commented Jul 12, 2018

lierdakil commented Jul 12, 2018

dmoonfire commented Jul 12, 2018

lierdakil commented Jul 12, 2018

dmoonfire commented Jul 12, 2018

lierdakil commented Jul 12, 2018

dvictori commented Jul 12, 2018

lierdakil commented Jul 12, 2018 •

edited

Loading

ventolinmx commented Jul 12, 2018

lierdakil commented Jul 13, 2018

dvictori commented Jul 13, 2018

dmoonfire commented Jul 13, 2018

wooorm commented Jul 19, 2018

edusantana commented Jul 28, 2018

elissonmichael commented Oct 24, 2018

ghost commented Oct 28, 2018 •

edited by ghost

Loading

ferenczy commented Nov 8, 2018

ghost commented Nov 9, 2018

dmoonfire commented Nov 9, 2018

edusantana commented Dec 26, 2018 •

edited

Loading

dmoonfire commented Dec 26, 2018

rbertoche commented Jan 28, 2019 •

edited

Loading

dmoonfire commented Apr 12, 2019

edusantana commented Apr 13, 2019 •

edited

Loading

dmoonfire commented Sep 4, 2020

Words with accents show as misspelled #241

Words with accents show as misspelled #241

Comments

ventolinmx commented Mar 29, 2018 • edited Loading

Prerequisites

Description

Steps to Reproduce

Versions

Additional Information

lierdakil commented Apr 21, 2018

dvictori commented Jul 12, 2018

dmoonfire commented Jul 12, 2018

lierdakil commented Jul 12, 2018

lierdakil commented Jul 12, 2018

dmoonfire commented Jul 12, 2018

lierdakil commented Jul 12, 2018

dmoonfire commented Jul 12, 2018

lierdakil commented Jul 12, 2018

dvictori commented Jul 12, 2018

lierdakil commented Jul 12, 2018 • edited Loading

ventolinmx commented Jul 12, 2018

lierdakil commented Jul 13, 2018

dvictori commented Jul 13, 2018

dmoonfire commented Jul 13, 2018

wooorm commented Jul 19, 2018

edusantana commented Jul 28, 2018

elissonmichael commented Oct 24, 2018

ghost commented Oct 28, 2018 • edited by ghost Loading

ferenczy commented Nov 8, 2018

ghost commented Nov 9, 2018

dmoonfire commented Nov 9, 2018

edusantana commented Dec 26, 2018 • edited Loading

dmoonfire commented Dec 26, 2018

rbertoche commented Jan 28, 2019 • edited Loading

dmoonfire commented Apr 12, 2019

edusantana commented Apr 13, 2019 • edited Loading

dmoonfire commented Sep 4, 2020

ventolinmx commented Mar 29, 2018 •

edited

Loading

lierdakil commented Jul 12, 2018 •

edited

Loading

ghost commented Oct 28, 2018 •

edited by ghost

Loading

edusantana commented Dec 26, 2018 •

edited

Loading

rbertoche commented Jan 28, 2019 •

edited

Loading

edusantana commented Apr 13, 2019 •

edited

Loading