Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

correct bstr conversion #6

Closed
wants to merge 1 commit into from
Closed

Conversation

windtail
Copy link

@windtail windtail commented Sep 6, 2012

hi davidm,

When i use luacom, i found that i cannot used it with Chinese characters: either filename with Chinese characters nor string parameters with Chinese characters.

i tried to save the lua source file to UTF-8 format, but that could not solve the filename with Chinese characters, because i cannot change the system's encoding.

Later, i found it works with Cygwin, then i realized that the newest Cygwin internally convert filename to UTF-8.

i download the source code today, and i think in function bstr2string()/string2bstr() CP_UTF8 should be changed to CP_ACP when not using with Cygwin (see commit changes). The changes work for me, but i do not known if it will cause other errors.

luojiejun

- in none-english environment (i am Chinese) we DO NOT use CP_UTF8, CP_ACP should be used
- while Cygwin default convert filename internally to UTF-8
- we have to use ASCII format for our lua source code, if you prefer UTF-8, you need luaiconv to convert UTF-8 to your ASCII format (GBK or other)
@ignacio
Copy link
Collaborator

ignacio commented Sep 6, 2012

Can you provide an example of non-working code? As it seems, this change will fix LuaCom for you but will break it for every other user running it with Cygwin.

@windtail
Copy link
Author

windtail commented Sep 6, 2012

i do add a "#ifdef _CYGWIN " to keep the behavior in Cygwin. Here is an example that not work:

require "luacom"

wordApp = luacom.CreateObject("Word.Application")
wordApp.Visible = true

wordDoc = wordApp.Documents:Add()
wordApp.Selection:TypeText("中文真的可以吗,我也不知道啊!")
wordDoc:SaveAs2("F:\\中文的文件名哦还挺长的.docx")

wordDoc:Close(0)
wordApp:Quit(0)

Save the above code as ASCII format(that GBK encoding for me), it will produce a file "F:\ĵļŶͦ.docx" and its content is "ĿҲ֪",the code is tested on Windows XP(Simplified Chinese Edition)

@ignacio
Copy link
Collaborator

ignacio commented Sep 7, 2012

Well, the thing is LuaCom expects its input strings to be encoded as utf-8. So you need to change the encoding of your script and not change the codepage used by LuaCom in its conversion routines.

With LuaCom as it is, I can work with strings in different languages (spanish and portuguese, with accented characters and so on) regardless of what language I have configured in Windows. If LuaCom used CP_ACP, my spanish scripts woud only work if I run them while having Spanish as the current language.

@windtail
Copy link
Author

windtail commented Sep 7, 2012

i am not quite understand codepage of Windows, but let me explain my situation. i am in China, mose of us are using Windows xp Simplified Chinese edition, the codepage is 936, our filenames are encoded in gb2312, then how to deal with these files ?

if the filename encoding is changed to utf-8, it turns out to be a mess, this kind of filename cannot be managed by windows explorer any more.

as i mentioned, if i save lua source file as utf-8 format, luacom could understand the filenames and strings, and successfully convert the string to widechar, but others cannot, i.e. MS Word would complain "file not exists", cause the file with that UTF-8 encoded name DOES not exists(they are encoded in gb2312).

in my situation, what's worse, the content of MS Word file is encoded in gb2312, i have to deal with these files using luacom, and i find i should let luacom to use CP_ACP.

Any suggestions of using the existing version of luacom ? and i wonder what's your situation in a Spanish version of Windows ?

@ignacio
Copy link
Collaborator

ignacio commented Sep 7, 2012

Ah, I see. I think your main problem has to do with filenames. I heard that's a tricky thing on Windows.
I had to do something like you did (use ACP instead of utf8) but to cope with a COM component that was improperly converting from utf-16 strings.

What I ended up doing was adding a couple of functions to allow changing the codepage on the fly.
I never sent those changes upstream because I wasn't happy with that hack.

I see your situation is completely different with mine. I didn't have to deal with filenames and I don't fully understand all the tiny little details of codepages on Windows.

@windtail
Copy link
Author

windtail commented Sep 7, 2012

Thank you for your reply! May be i should close the pull request, it seems not a common problem.

@windtail windtail closed this Sep 7, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants