Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make suggestions better for multiple languages spell-checking. #21

Open
georgthegreat opened this issue Jun 19, 2013 · 22 comments
Open

Comments

@georgthegreat
Copy link

I've french_1990 and standard English (Great Britain dictionaries installed).
I write the single word l'été:
http://slovari.yandex.ru/l%27%C3%A9t%C3%A9/%D0%BF%D0%B5%D1%80%D0%B5%D0%B2%D0%BE%D0%B4/

Here is what I see.
English dictionary fails to validate it (that's OK).
French dictionary validates it (that's OK).
Joined English-French dictionary fails to validate it (that's NOT OK).

untitled-1
untitled-2
untitled-3

@Predelnik
Copy link
Owner

Thank you very much, it was important though very funny bug :D

Basically the first problem that the name of this dictionary wasn't resolved correctly, mostly because zip file with it have name "fr_FR-1990_1-3-2" but dictionary itself have name fr_FR-1990.

It somehow turned out that in this list box items were sorted automatically, though they shouldn't be and that sorting was case insensitive while mine was sensitive so basically correspondence between dictionaries and check boxes was wrong.

Makes me think if I probably should make case insensitive sorting too though...

@georgthegreat
Copy link
Author

Are you going to close this?
Or are you waiting for me to test it?

Works well (I've downloaded new version from some issue above).

@georgthegreat
Copy link
Author

Not finally fixed.
If I turn on all three dictionaries (via multiple languages option), some words wouldn't have any alternatives:
ômbre, räie maybe some more.

If only french is turned on, everything works fine (they will be underlined and alternatives like ombré and raie will be available).

@georgthegreat georgthegreat reopened this Jun 23, 2013
@georgthegreat
Copy link
Author

Sorry, räie word seems to work fine.

@Predelnik
Copy link
Owner

It was actually because of totally different matter, at very beginning I wrote it so Hunspell would have 100% hit on Russian/English dictionaries combination. Hunspell is much better about language guessing done my way, so it could be safely removed. You could check it out at the usual link:
http://goo.gl/OYqRO

@georgthegreat
Copy link
Author

Still isn't working for me.

The word "developpé", which is of French origin, suggests only English alternatives, though accents aren't used in English words.

Why do you need this language definition at all?

@Predelnik
Copy link
Owner

Well the current way of determining language guess is to choose one which have most suggestions, for this word we get 2 suggestions for English and 2 for French, and since English is first -- it's being selected as current.

The good way to solve this, maybe - if multiple languages selected - show another menu item where you can select language for this word, so all suggestions and adding to dictionary would be for this language. Probably it's better to do so for current session only 'cause saving a lot of stuff like that is pain, at least it will add possibility to add such words to dictionary and forget about them for the time being.

@georgthegreat
Copy link
Author

Isn't it possible to simply join the suggestions in one list?

@Predelnik
Copy link
Owner

It's possible but there is the problem when there is a lot of suggestions, in which order feed them to a list. That of course have a solution of just putting one from first language, one from second and so on (if they have them at all) until maximum is reached.

But there's still a problem of determining in which language user dictionary I should put the word to, maybe though it could be solved by doing "Add to Dictionary..." item as a submenu with languages selected as items. Probably with showing how much suggestions from each language there are (in parenthesis)

Ok that seems like a good idea, most likely I'll do it))

@Predelnik Predelnik reopened this Jun 24, 2013
@georgthegreat
Copy link
Author

Hunspell doesn't have any kind of difference between words?

Are you sure that non-unified user dictionary is required? Libre/Open Offices don't have such feature, do they?

@Predelnik
Copy link
Owner

What do you mean by difference? If you mean like distance function between words, well it's not public definitely, I could try to look for it though.

I don't know if it's like required 100%, but it seems to be logical actually, since there are users who switch between languages to check the text rather than use multiple languages.

@georgthegreat
Copy link
Author

Yep, the distance function is what I was talking about.

@georgthegreat
Copy link
Author

Here is one more example of bad usability:
When both English and French dictionaries are turned on, the word reunis suggests English reunion, but not French réunis, which should be much more close to the original.

This also might be caused by wrong utf-8 handling (réunis is something like r'eunis in utf-8).

@Predelnik
Copy link
Owner

Btw if it wouldn't bother you, you can check this preview of next major version http://goo.gl/OYqRO I used Damerau–Levenshtein distance for the words (case-insensitive), it isn't perfect but seems to be actually quite OK, though maybe I'll change some things later. All your example problems from this thread seem to be resolved at least)

Different dictionaries for different languages are preserved for now, but default mode now is different dictionaries for single dictionary mode and one big dictionary for multiple dictionary mode (I didn't test it thoroughly for now though)

Also - not checking of words being written like in Firefox was added in this version also (as an option but turned on by default)

@georgthegreat
Copy link
Author

No problem. I'll look on it, but not right now. I think I'll post the answer in a couple of days.

@georgthegreat
Copy link
Author

Seems that this update isn't working at all.

I entered french word entree (correct is entrée).
List suggests:

  1. Entree
  2. en-tree
  3. en tree
  4. entere
  5. entre

@Predelnik
Copy link
Owner

It's working but all this words sadly has equal distance from entree, which is 1.

@georgthegreat
Copy link
Author

Hm... Then this metric (Damerau–Levenshtein) doesn't fit, does it?
As far, as I see, editing, inserting or deleting single letter — all have the save weight. Seems to be incorrect.
Is it your implementation or some library function? It is possible to edit weights?

@Predelnik
Copy link
Owner

Most likely it's possible but I need to look deeper for now I just copy-pasted some algorithm for my needs))

@Predelnik
Copy link
Owner

Well sadly even if I change the cost of operations to make substitution cheapest there are 3 words with the same distance entrer, entres, entrez, and since I sort them alphabetically it, entrée end up being last of them, while Hunspell manages to successfuly place it first.

Well I think it would be better to have correct weights for each letter ( like to make exchange of similar or close by keyboard letters to be cheapest operation) but not sure that this thing that is very easy to do.

Actually I've had some ideas about slight modifying of Hunspell source to allow me the merge of it's lists of suggestions, maybe I try that also.

Not sure how to test it all though, I only have some tests of common misspellings from the Aspell site, but they are not 100% reliable))

@georgthegreat
Copy link
Author

I think that there is no "correct" method — any algorithm would have exceptions.

@Predelnik
Copy link
Owner

Yeah you're right, but with having good statistics about common misspelling all this stuff could be optimized further and further to nearly perfect)) Well at least it all deserves a little bit more of attention from my side, thanks for an example where all goes wrong)

@Predelnik Predelnik changed the title Multiple language spellcheck issue Make suggestions better for multiple languages spell-checking. Mar 12, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants