Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use ansi codepage for internal multibyte strings on windows (fixes #976, fixes #974, fixes #444) #979

Merged
merged 1 commit into from
Dec 30, 2020

Conversation

albertony
Copy link

@albertony albertony commented Dec 9, 2020

Fixes #976
Fixes #974
Fixes #444

Probably conflicts with #972, but if you accept both PRs then just "ping" me and I can quickly fix it (either if I can merge the PR branches together, or you can merge one of them to master first and I rebase the other one, or something else..?)

@albertony albertony force-pushed the cp_acp branch 2 times, most recently from 5302efd to 78d30bc Compare December 9, 2020 08:43
Copy link
Member

@p12tic p12tic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! I had just one comment, looks good otherwise.

src/lib/common/win32/KnownFolderPaths.h Outdated Show resolved Hide resolved
@albertony albertony changed the title Use ansi codepage for multibyte strings (fixes #976) Use ansi codepage for internal multibyte strings on windows (fixes #976, fixes #974, fixes #444) Dec 10, 2020
@albertony
Copy link
Author

Better than ever? :)

@albertony albertony changed the title Use ansi codepage for internal multibyte strings on windows (fixes #976, fixes #974, fixes #444) WIP: Use ansi codepage for internal multibyte strings on windows (fixes #976, fixes #974, fixes #444) Dec 11, 2020
@albertony
Copy link
Author

albertony commented Dec 11, 2020

Tracked down a couple of more issues.. There were multiple similar problems, in client, server and deamon, so it took several attempts to track them down and fix them. Not entirely sure all are properly handled yet, I hope so, and think so, but let's wait a little bit and see if @cheese, reporter of #974, will agree...

In general I guess the main solution is to be explicit about use of UTF-8 for strings sent over IPC, and to be sure to use current locale encoding (normally "ANSI" on Windows) for std::string's. E.g. QString::toStdString uses UTF-8, which is somewhat annoying (although UTF-8 Everywhere is a thing, its not a short term solution here).

Edit: Wait a minute.. For the QString conversions I can probably use Qt's to/fromLocal8Bit. I considered those before, but wasn't sure they were matching the MultiByte/ANSI/codepage/whatever-you-call-it kind of strings on Windows, but then found this:

The local 8-bit codec can convert from unicode to the character set specified in the locale and vice versa.
This codec is called the "System" codec. This codec can be obtained using QTextCodec::codecForLocale()
or QTextCodec::codecForName("System").

On Windows, the "System" QTextCodec uses MultiByteToWideChar and WideCharToMultiByte (with
CP_ACP - ANSI code page) to convert to and from unicode.

On Unix, coversion is done using iconv. However, Qt can be compiled without iconv support, in which
case Qt inspects the LANG, LC_CTYPE, LC_ALL environment variables to determine the codec. The codec
detected has to be a part of the Qt built-in codec list (Otherwise, it defaults to latin-1).

Will rewrite (again) and post back...

@albertony albertony changed the title WIP: Use ansi codepage for internal multibyte strings on windows (fixes #976, fixes #974, fixes #444) Use ansi codepage for internal multibyte strings on windows (fixes #976, fixes #974, fixes #444) Dec 11, 2020
@albertony albertony changed the title Use ansi codepage for internal multibyte strings on windows (fixes #976, fixes #974, fixes #444) WIP: Use ansi codepage for internal multibyte strings on windows (fixes #976, fixes #974, fixes #444) Dec 11, 2020
@albertony albertony changed the title WIP: Use ansi codepage for internal multibyte strings on windows (fixes #976, fixes #974, fixes #444) Use ansi codepage for internal multibyte strings on windows (fixes #976, fixes #974, fixes #444) Dec 12, 2020
@albertony
Copy link
Author

albertony commented Dec 12, 2020

Done.

  • Consistent use of std::string with standard locale-specific encoding.
  • IPC messages always encode strings with UTF-8, which are converted back and forth at each end.
  • In GUI layer, using Qt's QString::fromLocal8Bit/QString::toLocal8Bit to convert between (unicode) QString and (locale-specific) std::string, and QString::fromUtf8/QString:toUtf8 for reading/writing UTF-8 encoded strings for IPC communication. So encoding is quite explicit. Not using toStdString/fromStdString, since it returns std::string that are UTF-8 encoded and leads to confusion and problems.
  • In kernel: Locale encoded std::string, using the existing Unicode::UTF8ToText/Unicode::textToUTF8 for UTF-8 specific IPC strings (except in this one place, in DataDirectories, since it is in common lib and do not have access to Unicode class in base lib).

PS: Would still be nice to see #974 being tested OK, since there were so many iterations with different issues here..

@p12tic
Copy link
Member

p12tic commented Dec 30, 2020

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants