Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Chinese - BOUNTY OF $100 #515

Closed
JuliusSweetland opened this issue Oct 25, 2018 · 87 comments
Closed

Add support for Chinese - BOUNTY OF $100 #515

JuliusSweetland opened this issue Oct 25, 2018 · 87 comments

Comments

@JuliusSweetland
Copy link
Member

Add new keyboard(s) with the constituent symbols for Chinese, plus the logic to combine symbols (as required), and solve any other Chinese specific challenges (e.g. new logic to handle how symbols combine/break apart when backspacing, etc).

I am unfamiliar with the writing styles, so someone who knows how Chinese is more commonly captured and spoken (via text to speech) would be required.

@JuliusSweetland JuliusSweetland changed the title Add support for Chinese Add support for Chinese - BOUNTY OF $100 Oct 25, 2018
@gitcoinbot
Copy link

Issue Status: 1. Open 2. Started 3. Submitted 4. Done


This issue now has a funding of 100.0 DAI (100.0 USD @ $1.0/DAI) attached to it.

@seichris
Copy link

seichris commented Nov 20, 2018

@JuliusSweetland great project you are building!

I think I can help you with the logic a bit here - Been studying Chinese for some years now and been to China a couple of times. I am not sure if I can help you to implement though.

So , most Chinese write in English letters, to type what they hear. The romanized Chinese - PinYin. So they simply use the English keyboard.

Using the google pinyin input keyboard, it looks like this:

untitled-1

On input, the keyboard recommends words. This is essential, because one romanized word e.g. shuo, can represent lots of different words.

So the user has to choose the Characters he wants from a list.

Keyboards usually recommend Characters on context and patterns the user often uses. I guess that's the same with western keyboards, too.

This list of PinYin to Character mapping, we could take from open libraries. I think MDBG and the pleco app use an open library.

Does this make sense?

Summarizing:

  • No need for a special keyboard
  • Need for a library to recommend Chinese characters. Can do more research here.
  • Do you want to enable speech input, too? Can research approaches on this, too.

@seichris
Copy link

seichris commented Nov 20, 2018

Need for a library to recommend Chinese characters. Can do more research here.

The library mdbg.net uses is https://cc-cedict.org/wiki/ . There is a PHP parser for this dictionary: https://github.com/mdsills/cccedict

This should be the most comprehensive library for ENG - CN. There is also sister projects for German, French, Hungarian - CN

@JuliusSweetland
Copy link
Member Author

JuliusSweetland commented Nov 20, 2018

Hi @seichris - glad to have some help on Chinese! So am I correct in understanding that the latin alphabet approximation of the Chinese symbols is called "Pinyin"? A conversion from Pinyin to Chinese ideograms/logograms would then be performed and a number of Chinese logograms would then be presented to the user. When they select one it would replace the current word. Sound correct?

If so this article has some useful discussion using C# libraries (and NuGet packages): https://stackoverflow.com/questions/9535408/how-to-convert-a-pinyin-string-to-chinese-in-c-sharp

I would like to avoid using PHP, or other languages as OptiKey is written exclusively in C#.

@seichris
Copy link

seichris commented Nov 20, 2018

Correct!

I would like to avoid using PHP, or other languages as OptiKey is written exclusively in C#.

In that case, using the microsoft library sounds most straight forward. Or would that affect the OptiKey GPL license?

Anyway, I guess from here on, I cannot help much. Hope I could clarify the basics.

@JuliusSweetland
Copy link
Member Author

@seichris License should be ok, but I'll look into that.

From here I would mainly need support making sure what I write actually behaves as it should. Can I follow up with questions as I make progress?

@seichris
Copy link

Sure. Glad to help. Just ping me in here.

@JuliusSweetland
Copy link
Member Author

@seichris Thanks Chris.

@gitcoinbot
Copy link

@seichris Hello from Gitcoin Core - are you still working on this issue? Please submit a WIP PR or comment back within the next 3 days or you will be removed from this ticket and it will be returned to an ‘Open’ status. Please let us know if you have questions!

  • reminder (3 days)
  • escalation to mods (6 days)

Funders only: Snooze warnings for 1 day | 3 days | 5 days | 10 days | 100 days

@gitcoinbot
Copy link

Issue Status: 1. Open 2. Started 3. Submitted 4. Done


@seichris due to inactivity, we have escalated this issue to Gitcoin's moderation team. Let us know if you believe this has been done in error!

  • reminder (3 days)
  • escalation to mods (6 days)

Funders only: Snooze warnings for 1 day | 3 days | 5 days | 10 days | 100 days

@rmshea
Copy link

rmshea commented Dec 10, 2018

Hey @seichris, what's the latest here? Ryan from Gitcoin here.

@seichris
Copy link

Hi @ryan-shea . I gave some input on Chinese keyboard logic, but @JuliusSweetland will implement himself.

Do you want to keep the bounty running, Julius?

@JuliusSweetland
Copy link
Member Author

JuliusSweetland commented Dec 10, 2018 via email

@gitcoinbot
Copy link

@seichris Hello from Gitcoin Core - are you still working on this issue? Please submit a WIP PR or comment back within the next 3 days or you will be removed from this ticket and it will be returned to an ‘Open’ status. Please let us know if you have questions!

  • reminder (3 days)
  • escalation to mods (6 days)

Funders only: Snooze warnings for 1 day | 3 days | 5 days | 10 days | 100 days

@rmshea
Copy link

rmshea commented Dec 16, 2018

@JuliusSweetland, reopened the bounty to the community

@enginnerFrankLiu
Copy link

@JuliusSweetland Have you ever get solution for this issue ?

@JuliusSweetland
Copy link
Member Author

JuliusSweetland commented Dec 26, 2018 via email

@pacamara
Copy link

pacamara commented Jan 3, 2019

@JuliusSweetland Hi. my application to work on this project got approved today, but then the approval was almost immediately revoked! Was that intentional?

@JuliusSweetland
Copy link
Member Author

JuliusSweetland commented Jan 4, 2019 via email

@enginnerFrankLiu
Copy link

enginnerFrankLiu commented Jan 4, 2019

@JuliusSweetland

Firstly I really appreciate you create this project and plan to extend language package for Chinese als people; it actually help Chinese als patient to express themself;so please keep coding and improving;I am c# developer(mainly on web) from china,so I will support you from following point

1.I will keep searching some library/tool/api for "pinyin convert characters",If I get some,I let you know;

2.If I cannot find some library,I try to write small but helpful library implement "pinyin convert characters", mapping pinyin to characters; including high frequently used Chinese words about 25000
that is enough for daily conversation;

3.help you do test;

here is my some my suggestion about this project;
there is no need expect als people communication like normal people, expressing basic emotion and ideas is enough;you can prepare some simple sentence for als (no need to typing all words in eye) ;I called "shortcut expression" or use pic to express such as:
1."I need water"
2."I need foods"
3."I need sleep"
4."I need go to restroom"
5."I need hang out outside"
6."I love you"

by the way : I am not freelance; I have to code for work and life; estimated to take one or two month to achieve;

@JuliusSweetland
Copy link
Member Author

@pacamara could you please try and apply again? I didn't get a notification last time. Thanks. https://gitcoin.co/issue/OptiKey/OptiKey/515/1702

@JuliusSweetland
Copy link
Member Author

@enginnerFrankLiu hi, that all sounds great. Please can you apply for the bounty and I will approve when i am notified. If you don't hear anything please let me know as i don't seem to be receiving all notifications.

@enginnerFrankLiu
Copy link

@JuliusSweetland thanks, there is no need to pay me; I am free of charge; some of my friends get this bad disease (damn it);I hope I can do some to help OptiKey;OptiKey help my friend~

@JuliusSweetland
Copy link
Member Author

@enginnerFrankLiu very kind of you. Do you know what you need to do?

@JuliusSweetland
Copy link
Member Author

JuliusSweetland commented Jan 30, 2019 via email

@pacamara
Copy link

@JuliusSweetland @enginnerFrankLiu @gitcoinbot Here it is: https://github.com/pacamara/ImmGetCandidateListDemo

Note one significant workaround! Couldn't get ImmSetCompositionString to work, so am using hacky SendKeys to set the composing string. Have included my attempt at using ImmSetCompositionString, perhaps with more eyes we can figure it. Anyway the program works, you can get the candidate list from pinyin programmatically. Let me know if anything's not clear in the README, or you have more general questions about invoking it, general conceptual questions, etc. 🍻

@JuliusSweetland
Copy link
Member Author

@pacamara Thanks! I've taken a look at the main class here: https://github.com/pacamara/ImmGetCandidateListDemo/blob/master/ImmGetCandidateListDemo/ImmForm.cs

Is there any chance you could describe the steps that are occurring, i.e. I start with the string "nihao", then exactly what is called and why? It's difficult to follow the Win32 calls and special handling for different versions of Windows.

Apologies - I know this sounds like I'm being too lazy to look up each call, but I want to understand an overview of why you are doing things and what you are trying to achieve at each step.

Thanks again!

@pacamara
Copy link

pacamara commented Jan 31, 2019

@JuliusSweetland You're more than welcome! No problem, flow is:

  • Swap the text edit's WndProc for a custom one, to receive IME events
  • On initialization, the text edit gets a WM_IME_SETCONTEXT: trap this in ImmEnabledTextEditWndProc to clear ISC_SHOWUICANDIDATEWINDOW flag, otherwise ImmGetCandidateList won't work on >=Vista.
  • Wait for button press
  • Use SendKeys to send pinyin to the text edit
  • The IME tries to generate hanzi candidates
  • If it generates some, IME sends WM_IME_NOTIFY with subtype IMN_CHANGECANDIDATE
  • Received in ImmEnabledTextEditWndProc
  • ...and in ImmForm.getCandidateList call ImmGetCandidateListW twice;
    ** First time: sizing call to get the size of the buffer needed
    ** Second time: get CANDIDATELIST struct and candidate hanzi strings in the buffer
    ** CANDIDATELIST is a variable sized struct whose last member is array of offsets to the candidate strings. The strings are immediately after those offsets in the buffer.
  • Add the candidate hanzi strings to the output text box.
  • Done!

special handling for different versions of Windows.

To clarify, there's no handling for different versions of Windows, It's only definitively known to work on Windows 10. I guess it should be fine on Vista and above. For below Vista, a version check needs to be put around the code that clears the ISC_SHOWUICANDIDATEWINDOW flag. Possibly there are other version-specific issues too.

@JuliusSweetland
Copy link
Member Author

@pacamara Thanks. I'll have to play with this code and see if I can get it working in my WPF context.

These 2 links looks interesting:
https://social.msdn.microsoft.com/Forums/vstudio/en-US/f49b2738-d0dc-433e-8139-df5e331bca50/control-ime-for-wpf-textbox?forum=wpf
https://www.codeproject.com/Questions/69500/Using-IME-mode-in-WPF

@pacamara
Copy link

pacamara commented Feb 1, 2019

@JuliusSweetland Wasn't aware of that extra complication. The codeproject.com solution seems a non-starter since you won't be able to access the control's window handle? So hosting the Forms control per the 2nd answer on msdn link is better? I also see https://docs.microsoft.com/en-us/dotnet/framework/wpf/advanced/walkthrough-hosting-a-windows-forms-control-in-wpf

@JuliusSweetland
Copy link
Member Author

@pacamara Solution 3 in the codeproject link you mean? Not sure yet, possibly!

I'll have to do some experimenting. I'll try to find time as soon as possible.

@JuliusSweetland
Copy link
Member Author

@JuliusSweetland
Copy link
Member Author

@pacamara And this line (https://github.com/pacamara/ImmGetCandidateListDemo/blob/4616c90acd654d28b92972bf30008c2296da5129/ImmGetCandidateListDemo/ImmForm.cs#L108) is this workaround just attempting to simulate typing the pinyin string into the textbox, which then triggers Windows to generate candidates in which can be detected in the callback function (and in the IME if it were not suppressed by https://github.com/pacamara/ImmGetCandidateListDemo/blob/4616c90acd654d28b92972bf30008c2296da5129/ImmGetCandidateListDemo/ImmForm.cs#L49

Is that correct?

@JuliusSweetland
Copy link
Member Author

@JuliusSweetland
Copy link
Member Author

@pacamara Last question from me for tonight; I've looked through these NuGet packages (https://www.nuget.org/packages?q=pinyin) but they all seem to be converting Chinese symbols to pinyin, not the other way around. Or am I wrong?

Why would you even want pinyin from Chinese symbols?

@JuliusSweetland
Copy link
Member Author

Traditional Chinese <-> Simplified Chinese conversion library (might be useful): https://www.nuget.org/packages/ChineseConverter/

@pacamara
Copy link

pacamara commented Feb 1, 2019

@JuliusSweetland:

Solution 3 in the codeproject link you mean?

Yes.

Is this line (https://github.com/pacamara/ImmGetCandidateListDemo/blob/4616c90acd654d28b92972bf30008c2296da5129/ImmGetCandidateListDemo/ImmForm.cs#L49) preventing the IME popup window from showing?

Correct. And on >=Vista that's required to make ImmGetCandidateList work.

And this line (https://github.com/pacamara/ImmGetCandidateListDemo/blob/4616c90acd654d28b92972bf30008c2296da5129/ImmGetCandidateListDemo/ImmForm.cs#L108) is this workaround just attempting to simulate typing the pinyin string into the textbox

Also correct. As noted, it would be better to use the ImmSetCompositionString api. But SendKeys works.

This WPF specific approach sounds interesting: https://stackoverflow.com/questions/9535408/how-to-convert-a-pinyin-string-to-chinese-in-c-sharp

I tested and rejected ChineseChar.GetChars(string pinyin) in CHSPinyinConv.msi early on, because it only knows how to convert individual syllables to individual hanzi. Will attach the code. Same discussion @enginnerFrankLiu and I had earlier: without a frequency table that knows about compound words and short phrases, a Chinese IME is very tedious to use. For example, try changing PINYIN in ImmGetCandidateListDemo to something much longer, e.g. wozhunishengdankuaile (I wish you a merry Xmas). The IME figures out the correct hanzi for the whole phrase, by looking at which decomposition into snippets has the greatest frequency product, and puts that top of the candidates list. That would not be possible with CHSPinyinConv.

Why would you even want pinyin from Chinese symbols?

It's definitely a less common use case! But there are a bunch of reasons: preparing teaching material for beginners, or even building a frequency table for an IME (a bit meta that one!)

Traditional Chinese <-> Simplified Chinese conversion library (might be useful): https://www.nuget.org/packages/ChineseConverter/

I don't think a separate library is needed. Chinese users typically have the language support for the variant they need installed, and the InputMethodManager code will use that. In case they need to work in both, there is an InputMethodManager api for switching between the two, which you could invoke.

@pacamara
Copy link

pacamara commented Feb 1, 2019

Here's the test code I wrote a few weeks back for CHSPinyinConv.msi. It's in Visual Studio International Pack 1.0.

using System.Windows.Forms;
using Microsoft.International.Converters.PinYinConverter;

namespace TestCHSPinYinConv2
{
    static class Program
    {
        static void Main()
        {
            string pinyin = "wo3";
            string hanziList = string.Join(", ", ChineseChar.GetChars(pinyin));
            MessageBox.Show(hanziList);
        }
    }
}

@JuliusSweetland
Copy link
Member Author

@pacamara Thanks for your details answer. I'm still amazed that there is no pinyin -> symbols library. I guess, as you mention, that would require a large frequency table, which is not commonly available. Who knows?!

I'll do some experimentation around your code soon. Thanks again.

@pacamara
Copy link

pacamara commented Feb 1, 2019

@JuliusSweetland It's a pleasure :)

@gitcoinbot
Copy link

@pacamara Hello from Gitcoin Core - are you still working on this issue? Please submit a WIP PR or comment back within the next 3 days or you will be removed from this ticket and it will be returned to an ‘Open’ status. Please let us know if you have questions!

  • reminder (3 days)
  • escalation to mods (6 days)

Funders only: Snooze warnings for 1 day | 3 days | 5 days | 10 days | 100 days

@gitcoinbot
Copy link

Issue Status: 1. Open 2. Started 3. Submitted 4. Done


@pacamara due to inactivity, we have escalated this issue to Gitcoin's moderation team. Let us know if you believe this has been done in error!

  • reminder (3 days)
  • escalation to mods (6 days)

Funders only: Snooze warnings for 1 day | 3 days | 5 days | 10 days | 100 days

@pacamara
Copy link

pacamara commented Feb 9, 2019

@gitcoinbot I'm done working on this, have supplied @JuliusSweetland gratis with a demo C# implementation of programatically converting pinyin->hanzi. Not looking to be paid.

@gitcoinbot
Copy link

@pacamara Hello from Gitcoin Core - are you still working on this issue? Please submit a WIP PR or comment back within the next 3 days or you will be removed from this ticket and it will be returned to an ‘Open’ status. Please let us know if you have questions!

  • reminder (3 days)
  • escalation to mods (6 days)

Funders only: Snooze warnings for 1 day | 3 days | 5 days | 10 days | 100 days

@Trung0246
Copy link

Trung0246 commented Apr 15, 2019

I wonder if this can be used?

https://github.com/LingDong-/rrpl

@pacamara
Copy link

@Trung0246 The problem with a radical-based input method is there are 214 radicals. So you'd need 214 keys in your onscreen keyboard. Not practical. That looks like a cool library though.

@enginnerFrankLiu
Copy link

hi @JuliusSweetland ~
Does OptiKey support chinese pinyin input currently? I am in ASL group, I can recommend OptiKey to them ;

@JuliusSweetland
Copy link
Member Author

JuliusSweetland commented Oct 14, 2019 via email

@enginnerFrankLiu
Copy link

hi @JuliusSweetland I wish I could,
https://github.com/sogou chinese Pinyin input giant company;
you can looking for some help from this rep
@litao-buptsse
@qzhangsogou
may them could offer some help

if OptiKey support chinese, I will donate,and recommend it to other patient

@JuliusSweetland
Copy link
Member Author

@enginnerFrankLiu Thank you. I will definitely need help to be able to try and solve this problem. One solution may just be for OptiKey to present a slightly adapted QWERTY keyboard layout and expect the user to type from OptiKey via an external IME (e.g. Sogou, Google, or Microsoft's pinyin IME) and have that output the chinese characters?

https://en.wikipedia.org/wiki/Sogou_Pinyin
https://en.wikipedia.org/wiki/Google_Pinyin
https://en.wikipedia.org/wiki/Microsoft_Pinyin_IME

@enginnerFrankLiu
Copy link

you means: help you do unit test for chinese input function?

@JuliusSweetland
Copy link
Member Author

JuliusSweetland commented Oct 16, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants