-
-
Notifications
You must be signed in to change notification settings - Fork 811
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
french dictionnary (apostrophe case) #109
Comments
Hey Menny, Just started diving through code to look to see how the application was structured and where I could possibly add a check for french language and to treat the ' as a separator. Decided to see if had been listed as an issue or not :) 👍 Regards, Bryan |
Good thinking :-) Actually, the fix is in the French Language Pack, and not in ASK. I'll handle that soon. |
What kind of medium in the French Language Pack are you going to use to fix On Tue, Apr 2, 2013 at 4:07 AM, Menny Even Danan
|
Hey Menny, I have been thinking about this issue quite a bit with word Examples: Currently ASK treats the hyphen as a word separator as the state https://github.com/AnySoftKeyboard/AnySoftKeyboard/blob/master/src/com/anysoftkeyboard/AnySoftKeyboard.java#L1748-L1757 Here are some examples of the usage that complicates the current Examples: Veux-tu (two words question inversion used often ex: veux-tu manger? D'acheter (two words forming a contraction in this case of the So as you can see that with both sets of examples that hyphens and Regards, Bryan Paradis |
OMG! Are you kidding me! Ok, what if: |
Ill have to write down more examples and go through and see what the logic For example: Peut-il m'aider? vs Peut-être Have a check here as well for different compounds in French. Any of them http://french.about.com/od/grammar/a/compoundnouns_2.htm On Wed, Apr 3, 2013 at 11:04 PM, Menny Even Danan
|
Deleted comment submitted through email that destroyed formatting. Have recommented in a formated way. Current functionality Examples: As separator: Simulated as inner letter: Almost all the time the compounds with hyphens are going to be a verb followed by a pronoun as in the French imperative. Meaning the word is going to very simple 1-4 letters. Le, la, les, y, en, moi, toi, il, lui, elle, nous vous etc. Allez-y! Go there! It is probably more important to guess a compound noun then the second word as the length of the compound would make it more beneficial to be auto corrected. I think there is a way to do both. More examples: (Expression compound noun vs verb inversion) (triple compound nouns) (triple compound verb inversion due to vowel phonetic rules va-il = va-t-il) (triple compound expression made of demonstrative pronoun ce, verb est, preposition à, verbe infinitive dire) (I am going to buy that which is a contraction of la or le and the vowel a in acheter) Correction Scenario: (Correctable typing error by key promixity before hyphen in compound noun) (Correctable typing error by key proximity before hyphen in non compound noun but verb inversion) (Auto correction with a contraction. If were to treat apostrophes as separators like hyphens. French Contractions) Conclusions: a) Autocorrections when you press hyphen in french are work ok and may result in a new autocorrect suggestion of a compound noun. Sighted in the Peut-être correction scenario Prut +- = Peut- & Peut- = Peut-être. Works b) Autocorrections when you press apostrophe in french are would be a problem unless you have solved this in your dictionary. It will autocorrect to full form plus the apostrophe if the word is not in your dictionary listed in the alternate form Jusque vs Jusqu' or de and d' que and qu' etc. Jusque = Jusqu'à, jusqu'alors, jusqu'ici, jusqu'où, jusqu'au, etc Problem: Could be potentially fixed with a dictionary that lists the contracted form of the first word. EX: Que and Qu' c) Separators in their current form can be problematic. If there are too many other higher priority suggestions for autocorrect before you get to the hyphen in the word especially if you make a typo. Problem: Could be remedied by creating a different state when encountering a separator instead of just a new state? If you were to create a state that would continue to make suggestions past the separator. Final musings: Separators work better then inner letters as far as autocorrect goes because at least you could autocorrect two words or more in a row and still end up with the compound, although rather inefficiently. This is because almost all compound nouns should be already in the dictionary in single forms except for vowel contractions! Qu', puisqu' etc. Having separators and inner letters causes some problems. Wouldn't it be better to allow the suggestion engine to continue checking at and after the separator against the dictionary for the whole string? Let it decide if there are no suggestions or possibly compound nouns left. When it comes to that conclusion it should start searching for only the section of the string after the separator? Can you think of any reasons that this would cause problems? I mean you could make the autocorrect optionally more intelligent by placing rules depending on which language is loaded. Maybe for another time. I could see a lot of things you could look for and check depending on vowels and all sorts of things! Other improvements Keyboard keys
Acute accent (é)
Anyway! Hope it helps. Maybe I can take a poke at the layout changes or whatever for french usage. Tried to make the email as clear as possible with formatting. Cheers, Bryan Paradis |
Any news on this? I didn't find the French language pack sources, but in the store version the issue is still there. |
On a related note, I'd love to build a newer/better version of the french dictionnary. I find the current one lacking in many ways ; for instance, several common conjugations are missing. Any howto would be helpful ! :-) |
@Brevera If you do, count me in. |
I did a bit of research and here are my findings so far :
So I guess all that's left to do is build a french language pack based on dicollecte's lexique. If only I could get my hands on that howto I found some day… |
That's very interesting! What is your programming and Android Development experience? |
Though I have a developer's degree (mainly Java, C and PHP), it dates What did you have in mind ? 2015-03-15 18:29 GMT+01:00 Menny Even Danan notifications@github.com:
|
… and here's the building guide I was thinking of : https://code.google.com/p/softkeyboard/wiki/BinaryDictionaries :-) |
This SO outdated! Basically, https://github.com/AnySoftKeyboard/LanguagePack is the base of language packs (see branches for concrete implementations) You can have a go at that 😃 |
Yeah, that's what I feared and was about to ask. I guess the least I can do is provide a french words.xml file with dicollecte's data. Would that be enough for you (or anyone else) to update the French language pack ? And BTW, do the elements in words.xml need to be ordered according to their frequency (= "f" value) or not ? |
Yes, words XML is good enough. Attach it here. |
I'm still a beginner regarding GitHub, so… how do you attach files to an issue comment ? |
If you are looking for an extensive dictionary, the Lefff is the most complete there is. It doesn't have frequency data, though. |
Nevermind, I think I found a solution : https://my.owndrive.com/public.php?service=files&t=8e5c75acbf7e8766b2eb6efb09d24fa7 And here is the small awk script I wrote to generate the above file : https://my.owndrive.com/public.php?service=files&t=3486b43d01e665b80f25c9d62f7f1007 |
Thanks Evpok. I've had a (lightning-)quick look at the LEFFF. However, I don't think an extensive dictionnary is the best thing for a phone dictionnary : it's the most common words that are needed, not necessarily the most exotic ones. YMMV, though, if you have specific needs. |
Just saying :) However it still doesn't help with our tokenisation problem. |
Oops, yeah, sorry about hijacking the initial issue… :-/ |
Hi Menny, I just wanted to know if the fr.xml file I provided 3 posts above was enough for you to generate a new french dictionnary ? |
Hello. Is someone currently looking into that issue? |
I'm afraid no one is... 😞 —Reply to this email directly or view it on GitHub. |
Thanks @xavihernandez. Very unfortunately my device isn't supported by CyanogenMod. I use a stock version of android that I have degooglized myself. Google Keyboard was the available keyboard. |
@djibux Oh yeah. Well, if you use Google Keyboard you will have an idea about AOSP keyboard because they have the same base. I don't really know what is closed-source in Google Keyboard tho... |
@menny thanks, hope it get integrated soon =) |
Hi @menny, thanks for looking into this. :-) Shall I provided an updated version of the fr.xml file ? |
Been awhile since I checked back here. Cool stuff! I been too busy working for a long time now (: Good luck with the enhancement! |
That would be amazing that this issue would be solved! Idk if it's the right place, but about the French keyboard lacks, guillemets (the French quotes style: « and ») are missing also. |
@homlett The azerty keyboard has them with a long press on the "/' key, but having the French guillemets by default would probably be best (i.e. a «/'/» key). And maybe I'm a dreamer, but ideally it would automatically insert a non-breaking space after "«" and before "»" 😉 |
Ok I figured out what the difference is: we have the exact same software versions, but I think that you are using the regular "common bottom generic row" whereas I am using the "new generation - testing" (go to User interface > Even more... > Common bottom generic row). And to clarify my previous comment: I was not talking about the / key, but rather using the character as a separator :) |
You're right! Thanks a lot, it's much better now! Btw, the guillemets are also available with the qwerty keyboard.
Have a good day!
|
I’m working on an updated French keyboards + dictionary and I’m running into this issue.
|
See AnySoftKeyboard/LanguagePack#12 |
I have designed French/Belgian and Canadian keyboards (with accents versions), though I'm still waiting for a fix in another issue for long pressed characters. Edit: To answer your question, no, neither AOSP dictionary or Dicollecte dictionary supports correctly the apostrophe case. If all apostrophe cases were to be included in a dictionary, the dictionary would be insanely huge, so the apostrophe case needs to be fixed first. Meanwhile, I have stopped working on the pack, since I'm using the AOSP keyboard which handles it correctly, ASK being unusable without this fix. |
Sorry, I wanted to update the fr.xml file yesterday night but got busy.
I guess the frequency can be converted from Dicollecte's 9-1 range, to ASK
255-1 using a scaling formula (not a math guy myself, but I'll figure
something out). Like multilplying by 28.333 (= 255/9) for frequencies > 1,
or something. ^^
2017-02-17 11:37 GMT+01:00 Julien Papasian <notifications@github.com>:
… I have designed French/Belgian and Canadian keyboards (with accents
versions), though I'm still waiting for a fix in another issue for long
pressed characters.
I have a working updated Dicollecte dictionary. However, I tried with the
AOSP dictionary and it is buggy, words with accentuated letters are not
shown, we should not use it, but Dicollecte dictionary is better anyway so
I don't really care for the moment. I also tried to merge AOSP english+AOSP
french dictionaries for people using both languages like me, but since AOSP
french doesn't work well, we can't use it. I can't merge AOSP english with
Dicollecte french because frequency is not on the same scale.
I will release sources when I'm home but it is still very experimental.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#109 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AB-a7AB45eaFRA_Hk4qDopsgzhej40DQks5rdXhXgaJpZM4AhiMs>
.
|
That's not how the frequency works in Dicollecte. |
Let's concentrate the discussion about a new French Dictionary at AnySoftKeyboard/LanguagePack#12 |
This issue is stale because it has been open 400 days with no activity. Remove stale label or this will be closed in 8 days |
Hey there, did anybody make any progress on this issue? This is still a very annoying issue when writing French (together with #1332 and "ca" being corrected to "cA" instead of "ça": #540 ). To give you a better idea of the scope of the issue, here is a screenshot of (a small part of) my user dictionary after using ASK for about a month on a new phone: This is especially annoying when swiping, as one of the apostrophed words is usally short, I tend to swipe both in one go, but I am unsure if the apostrophe is recognized as a character, even when the swiped word exists in the user dictionary. As a side note, maybe this could be somewhat sidestepped by making use of the "typographic apostrophe |
Yep, also having issue with this... could somebody tell us how we could contribute to improving this? Would sharing user dict be helpful or is this a matter of adding some kind of grammar rule in a parser? |
I experience the same kind of issues, it is very annoying that the apostrophes are not correctly predicted. I would like that when I type a word by omitting the apostrophe, it recognises that there should be one, for instance jai → j'ai, aujourdhui → aujourd'hui, and so on. For now it does not. Is a fix possible ? |
Hello @PrSunflower =. No, it hasn't, unfortunately. You have probably typed "l'avertir" often enough that it is in your user dictionary, so that ASK suggests it. |
Hi @MayeulC , Oh too bad. Thanks for the suggestion, but I have checked my AZERTY dictionary and |
https://code.google.com/p/softkeyboard/issues/detail?id=573
The comment #7 explain everything : the apostrophe and hyphen should be concider as a normal word separator.
The text was updated successfully, but these errors were encountered: