Skip to content
This repository has been archived by the owner on Dec 15, 2022. It is now read-only.

Words with accents show as misspelled #241

Closed
ventolinmx opened this issue Mar 29, 2018 · 28 comments
Closed

Words with accents show as misspelled #241

ventolinmx opened this issue Mar 29, 2018 · 28 comments

Comments

@ventolinmx
Copy link

ventolinmx commented Mar 29, 2018

Prerequisites

Description

On .md and .txt files spanish words with accents showed as misspelled but they are correct. Using aspell es-ES locales.

Steps to Reproduce

  1. Type spanish words with accents.
  2. Save file as .md or .txt.
  3. Activate spanish locale.
  4. Restart.
  5. Open same file.

Expected behavior: Spell-check should recognize correct words with accents.

Actual behavior: Atom underlines all words with accents, although they are correct.

Reproduces how often: Always.

Versions

Atom : 1.23.3
Electron: 1.6.15
Chrome : 56.0.2924.87
Node : 7.4.0

apm 1.18.12
npm 3.10.10
node 6.9.5 x64
atom 1.23.3
python 2.7.13
git 2.11.0

Debian 9.

Additional Information

Tried checking the same file with aspell on command line and works fine. It recognizes words with accents as correct. Also tried different encodings.

@lierdakil
Copy link
Contributor

More likely than not, it's a problem with your dictionary. Atom doesn't deal well with dictionaries that are not UTF-8 encoded.

@dvictori
Copy link

Probably related to #212 ?

@lierdakil : How to obtain UTF-8 encoded dictionaries?

@dmoonfire
Copy link
Contributor

@dvictori: I think #212 is definitely going to cause you problems even with a UTF-8 dictionary. There is a defect on node-spellchecker that is trying to fix that. Until that is resolved, I don't know if we can do much more.

@lierdakil
Copy link
Contributor

@dvictori Just find some? e.g. https://github.com/wooorm/dictionaries

@lierdakil
Copy link
Contributor

@dmoonfire, I don't have any issues described in #212. Gentoo Linux, Atom 1.28.0, LANG=ru_RU.UTF-8
image

I probably would if my de-DE dictionary was, say, cp1252-encoded.

@dmoonfire
Copy link
Contributor

@lierdakil: I stand corrected. Does it show it spelled correctly if you have Löwen?

@lierdakil
Copy link
Contributor

Yes, it does:
image

Additionally, I've tried converting my dictionary from UTF8 to ISO8859-1 (as is common with extended latin hunspell dictionaries), and here's what I've got:
image
Looks suspiciously similar to #212 I believe.

@dmoonfire
Copy link
Contributor

Oh, I know why you are behaving. I found that the .UTF-8 fixes the problem. However most people don't have that in their language settings so it didn't pick it up correctly. So, my LANG=en_US couldn't handle a UTF-8 dictionary either because of node-spellcheck didn't switch the locale() to UTF-8.

I suspect if you just had LANG=ru_RU it may misbehave.

@lierdakil
Copy link
Contributor

If I just had LANG=ru_RU, IIRC my default system encoding would be KOI8-R, which is a chthonic abomination from the dawn of the computer era that must be killed with fire :) So thanks but no thanks, I quite like my UTF-8 terminals that can handle more than two languages.

I was under the impression that modern Linux distributions prefer UTF-8 locales. Pretty sure at least Arch and Gentoo do.

@dvictori
Copy link

@lierdakil Bingo! I used the dictionaries from wooorm and now atom spell check is working. Just hope it won't break any other program. So far, libreoffice and firefox spell check looks fine.

It would be nice though, for users less technically inclined, to be able to use their native dictionary, that comes with the operating system, without having to change the file.

@lierdakil
Copy link
Contributor

lierdakil commented Jul 12, 2018

@dmoonfire, FWIW, running Atom with env LANG=en_US atom doesn't seem to change the behaviour any. That is, UTF-8 dictionaries are still working.
EDIT: LANG=en_US.ISO-8859-1 doesn't seem to have any effect either.

@ventolinmx
Copy link
Author

So i installed wooorm's spanish UTF-8 dictionary with npm install dictionary-es and it behaves the same. Do i need to configure this in Atom somewhere to activate the UTF dictionary? I have a special locale mix in Debian, using en_US LANG, but changing this to spanish has the same problem.

@lierdakil
Copy link
Contributor

@ventolinmono, you can point Atom to the directory where you installed the dictionary. Check spell-check settings.

@dvictori
Copy link

I just copied the files from wooorm repository to /usr/share/hunspell and renamed to the correct locale. So dictionaries/pt-BR/index.dic from wooorm became /usr/share/hunspell/pt_BR.dic. A very ugly hack, I might say.

@dmoonfire
Copy link
Contributor

I never know about wooorm's dictionaries. They have a MIT license, so that is reasonable. If the UTF-8 is the only thing needed, I'll try creating a couple Atom packages to install specific language dictionaries and see if that behaves; the plugin system for spell-check is designed for that.

@wooorm
Copy link

wooorm commented Jul 19, 2018

@dmoonfire They do not have an MIT license. Every dictionary comes with a different license!

@edusantana
Copy link

problem-with-accent

Here's a problem the I have with this. $LANG = pt_BR.UTF-8
Ubuntu 16.04.

@elissonmichael
Copy link

I just copied the files from wooorm repository to /usr/share/hunspell and renamed to the correct locale. So dictionaries/pt-BR/index.dic from wooorm became /usr/share/hunspell/pt_BR.dic. A very ugly hack, I might say.

@edusantana this worked for me!

@ghost
Copy link

ghost commented Oct 28, 2018

On archlinux, I solved it by doing: iconv -t UTF-8 -f ISO-8859-1 /usr/share/hunspell/YOURDIC.dic > /usr/share/hunspell/YOURDIC.dic. It's simply an issue of encoding.

@ferenczy
Copy link

ferenczy commented Nov 8, 2018

I would really like to avoid converting my dictionaries into UTF-8 encoding. I'm using original dictionaries from LibreOffice, sharing them between multiple applications and I'm not sure they'll be still working after the conversion. Sure, I can try it but I would like to avoid the conversion every time I update the dictionaries anyway.

The .aff file contains the encoding the dictionary is using at the very first line (in my case it's SET ISO8859-2) so it should be easy to read it and use it without any user intervention.

@ghost
Copy link

ghost commented Nov 9, 2018

@ferenczy Definitely. I found these issues: LibreOffice/dictionaries#7 in the libreoffice repo. And atom/node-spellchecker#89 in atom itself

@dmoonfire
Copy link
Contributor

Ideally, a conversion shouldn't be needed because most dictionary files tell you their encoding. I'm trying to get back on this to look at it, I think the underlying problem is at the C++ layer which is no longer my strength, but I have a few obligations that are getting in the way. I want to fix this, mainly because it is driving me nuts too. :)

@edusantana
Copy link

edusantana commented Dec 26, 2018

@dmoonfire any luck with that? Any work around?
I have converted those file to UTF-8 and replaced the SET UFT-8 e added the FLAG UTF-8 but I still have this problem.

@dmoonfire
Copy link
Contributor

@edusantana: Over the last week, I worked on a PR for node-spellchecker which should fix the encoding errors that were happening between Hunspell and Javascript. If all goes well, I can get that verified and rolled into Atom. It should handle most of the accented word problems. It also doesn't require dictionaries to be in UTF-8 format either, so dropping them in should hopefully Just Work™.

atom/node-spellchecker#95

It just took me a while to figure out text encoding on C++ on four different platforms.

@rbertoche
Copy link

rbertoche commented Jan 28, 2019

converting latin1 files to utf8 and changing the format tag did not work for me, as it somehow gets only a subset of the dictionary so it still shows correct words as misspelled.

Is there any way for me to configure a path for the dictionary in a way that this extension will get it? I don't want to risk losing other spellcheck tools as they are working properly

@dmoonfire
Copy link
Contributor

Atom 1.37 has a fix for passing accented characters for spell-checking. It handles dictionaries files that aren't UTF-8 encoded. Could you please check with the beta and see if it solves the problem? Thank you.

@edusantana
Copy link

edusantana commented Apr 13, 2019

@dmoonfire I will try it... Thanks!!! It works now!!! Look!

atom-37-beta-fix-spell-check

@dmoonfire
Copy link
Contributor

It sounds like this is resolved, so I'm going to close this issue. Feel free to open a new one.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants