Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tool: support UTF-16 command line on Windows #3784

Conversation

@MarcelRaad
Copy link
Member

MarcelRaad commented Apr 15, 2019

This makes it possible to pass parameters with characters outside of
the current locale on Windows, which is required for some tests, e.g.
the IDN tests.

Out of the box, this currently only works with the Visual Studio
project files, which default to Unicode. I'll look into building
Unicode versions with the other build tools later.

Ref: #3747

@jay
Copy link
Member

jay commented Apr 16, 2019

Refer to #637 and #345 particularly this post. The problem is basically the Microsoft C runtime does not support UTF-8 locale. Lets say you have an outfile name extracted from the UTF-8 encoded URL or a separate UTF-8 encoded argument then you can't pass that to fopen. We're left with a problem where we have to somehow keep track of the encoding of strings because we can't set the locale to UTF-8.

Anyway I think this is possible to support I just feel like it's going down a rabbit hole.

@MarcelRaad
Copy link
Member Author

MarcelRaad commented Apr 16, 2019

Thanks @jay , I hadn't thought about that! Reading these issues now. I wish there were char8_t in C.

@bagder
Copy link
Member

bagder commented Oct 12, 2019

hi @MarcelRaad, is this something you still intend to finalize?

@MarcelRaad
Copy link
Member Author

MarcelRaad commented Oct 13, 2019

Sorry for the long delay - yes, I plan to resume work on this when I have a couple of days, hopefully soon.

MarcelRaad added a commit to MarcelRaad/curl that referenced this pull request Jan 2, 2020
This will also be needed in the tool and tests.

Ref: curl#3758 (comment)
Closes curl#3784
MarcelRaad added a commit to MarcelRaad/curl that referenced this pull request Jan 2, 2020
- use wmain instead of main when _UNICODE is defined [0]
- define argv_item_t as wchar_t * in this case
- use the curl_multibyte gear to convert the command-line arguments to
  UTF-8

This makes it possible to pass parameters with characters outside of
the current locale on Windows, which is required for some tests, e.g.
the IDN tests. Out of the box, this currently only works with the
Visual Studio project files, which default to Unicode.

[0] https://devblogs.microsoft.com/oldnewthing/?p=40643

Ref: curl#3747
Closes curl#3784
@MarcelRaad MarcelRaad force-pushed the MarcelRaad:windows_unicode_command_line_arguments branch from fea657a to 3b458c9 Jan 2, 2020
MarcelRaad added a commit to MarcelRaad/curl that referenced this pull request Jan 2, 2020
Use them only if `_UNICODE` is defined, in which case command-line
arguments have been converted to UTF-8.

Closes curl#3784
MarcelRaad added a commit to MarcelRaad/curl that referenced this pull request Jan 2, 2020
It only applies to non-Unicode builds now.
Also merge 5.10 into it as it's effectively a duplicate.

Closes curl#3784
MarcelRaad added a commit to MarcelRaad/curl that referenced this pull request Jan 2, 2020
Closes curl#3784
MarcelRaad added a commit to MarcelRaad/curl that referenced this pull request Jan 3, 2020
Use them only if `_UNICODE` is defined, in which case command-line
arguments have been converted to UTF-8.

Closes curl#3784
MarcelRaad added a commit to MarcelRaad/curl that referenced this pull request Jan 3, 2020
It only applies to non-Unicode builds now.
Also merge 5.10 into it as it's effectively a duplicate.

Closes curl#3784
MarcelRaad added a commit to MarcelRaad/curl that referenced this pull request Jan 3, 2020
Closes curl#3784
@MarcelRaad MarcelRaad force-pushed the MarcelRaad:windows_unicode_command_line_arguments branch from 8577e97 to cd29484 Jan 3, 2020
MarcelRaad added a commit to MarcelRaad/curl that referenced this pull request Jan 3, 2020
This will also be needed in the tool and tests.

Ref: curl#3758 (comment)
Closes curl#3784
MarcelRaad added a commit to MarcelRaad/curl that referenced this pull request Jan 3, 2020
- use wmain instead of main when _UNICODE is defined [0]
- define argv_item_t as wchar_t * in this case
- use the curl_multibyte gear to convert the command-line arguments to
  UTF-8

This makes it possible to pass parameters with characters outside of
the current locale on Windows, which is required for some tests, e.g.
the IDN tests. Out of the box, this currently only works with the
Visual Studio project files, which default to Unicode.

[0] https://devblogs.microsoft.com/oldnewthing/?p=40643

Ref: curl#3747
Closes curl#3784
MarcelRaad added a commit to MarcelRaad/curl that referenced this pull request Jan 3, 2020
Use them only if `_UNICODE` is defined, in which case command-line
arguments have been converted to UTF-8.

Closes curl#3784
MarcelRaad added a commit to MarcelRaad/curl that referenced this pull request Jan 3, 2020
It only applies to non-Unicode builds now.
Also merge 5.10 into it as it's effectively a duplicate.

Closes curl#3784
MarcelRaad added a commit to MarcelRaad/curl that referenced this pull request Jan 3, 2020
Closes curl#3784
MarcelRaad added a commit to MarcelRaad/curl that referenced this pull request Jan 3, 2020
These are cheap as they don't build tests.

Closes curl#3784
@MarcelRaad MarcelRaad force-pushed the MarcelRaad:windows_unicode_command_line_arguments branch from cd29484 to 26651df Jan 3, 2020
MarcelRaad added a commit to MarcelRaad/curl that referenced this pull request Jan 3, 2020
This will also be needed in the tool and tests.

Ref: curl#3758 (comment)
Closes curl#3784
MarcelRaad added a commit to MarcelRaad/curl that referenced this pull request Jan 3, 2020
- use wmain instead of main when _UNICODE is defined [0]
- define argv_item_t as wchar_t * in this case
- use the curl_multibyte gear to convert the command-line arguments to
  UTF-8

This makes it possible to pass parameters with characters outside of
the current locale on Windows, which is required for some tests, e.g.
the IDN tests. Out of the box, this currently only works with the
Visual Studio project files, which default to Unicode.

[0] https://devblogs.microsoft.com/oldnewthing/?p=40643

Ref: curl#3747
Closes curl#3784
MarcelRaad added a commit to MarcelRaad/curl that referenced this pull request Jan 3, 2020
Use them only if `_UNICODE` is defined, in which case command-line
arguments have been converted to UTF-8.

Closes curl#3784
MarcelRaad added a commit to MarcelRaad/curl that referenced this pull request Jan 3, 2020
It only applies to non-Unicode builds now.
Also merge 5.10 into it as it's effectively a duplicate.

Closes curl#3784
@MarcelRaad
Copy link
Member Author

MarcelRaad commented Apr 25, 2020

If there are no objections, I'd like to merge this at the beginning of the upcoming feature window. It could still be improved to handle non-Unicode builds later as done by @jay and @captain-caveman2k in their branches. But I don't see any reason for non-Unicode builds on Windows (pre-Windows 95 compatibility and compatibility for Windows 9x without UnicoWS isn't an issue, is it?), and I'm going to look into adding options for supporting Unicode Windows builds to the CMake and autotools builds after this has landed.

@mback2k
Copy link
Member

mback2k commented Apr 25, 2020

I would like to see the CI build failures fixed before a merge, e.g.:
https://curlbuild.uxnr.de/builders/curl_winssl_cross_x64/builds/16906/steps/compile/logs/stdio

vtls/schannel.c: In function 'schannel_connect_step1':
vtls/schannel.c:636:9: error: implicit declaration of function 'Curl_unicodefree' [-Werror=implicit-function-declaration]
         Curl_unicodefree(cert_path);
         ^~~~~~~~~~~~~~~~
cc1: some warnings being treated as errors
#define Curl_unicodefree(ptr) \
#define curlx_convert_UTF8_to_tchar(ptr) curlx_convert_UTF8_to_wchar((ptr))
#define curlx_convert_tchar_to_UTF8(ptr) curlx_convert_wchar_to_UTF8((ptr))
#define curlx_unicodefree(ptr) \
do { \
if(ptr) { \
free(ptr); \

This comment has been minimized.

Copy link
@jay

jay Apr 25, 2020

Member

curl_free now that it's a curlx function. libcurl may be using a CRT different from the application. If curlx.h is available publicly for any application I don't know if including this file will work like this, because it is an internal lib file.

This comment has been minimized.

Copy link
@MarcelRaad

MarcelRaad Apr 26, 2020

Author Member

Good point! I think it's the other way around though: curlx functions should never use curl's memory functions as they're not included in curlx. That's what the previous curlx functions dealing with memory did, like curlx_strdup before commit b387560.

Note that the curlx functions are not public, but instead the tool and tests build their own copies of them, so using the curl memory functions would result in linker errors.

Therefore, I changed the macro to always use the non-macro version of free as that's what's used for malloc in curl_multibyte.c.

This comment has been minimized.

Copy link
@jay

jay Apr 26, 2020

Member

Note that the curlx functions are not public, but instead the tool and tests build their own copies of them, so using the curl memory functions would result in linker errors.

Fair enough, I did not know that. I think @bagder could review the curlx handling since I'm not entirely grokking it.

This comment has been minimized.

Copy link
@bagder

bagder Apr 26, 2020

Member

I'm with @MarcelRaad here. curlx functions are used in the tool's domain so they would not use libcurl's CRT.

#if defined(USE_WIN32_LARGE_FILES) || defined(USE_WIN32_SMALL_FILES)

FILE *curlx_win32_fopen(const char *filename, const char *mode)
{

This comment has been minimized.

Copy link
@jay

jay Apr 25, 2020

Member

this is another instance where we are passing something opened from libcurl possibly to an application using a different CRT

@jay
Copy link
Member

jay commented Apr 25, 2020

Aside from the curlx_ issues I think should be carefully examined (I just posted some comments; admittedly I don't understand curlx that well), I don't have an objection if you are going to follow up with maintenance on this. I just don't think it's going to be this easy, but let's find out.

@MarcelRaad
Copy link
Member Author

MarcelRaad commented Apr 25, 2020

@mback2k

I would like to see the CI build failures fixed before a merge

Sure! They're new after today's rebase. Admittedly, I haven't compiled the branch locally between rebasing and pushing.

MarcelRaad added a commit to MarcelRaad/curl that referenced this pull request Apr 26, 2020
This will also be needed in the tool and tests.

Ref: curl#3758 (comment)
Closes curl#3784
MarcelRaad added a commit to MarcelRaad/curl that referenced this pull request Apr 26, 2020
- use `wmain` instead of `main` when `_UNICODE` is defined [0]
- define `argv_item_t` as `wchar_t *` in this case
- use the curl_multibyte gear to convert the command-line arguments to
  UTF-8

This makes it possible to pass parameters with characters outside of
the current locale on Windows, which is required for some tests, e.g.
the IDN tests. Out of the box, this currently only works with the
Visual Studio project files, which default to Unicode, and winbuild
with the `ENABLE_UNICODE` option.

[0] https://devblogs.microsoft.com/oldnewthing/?p=40643

Ref: curl#3747
Closes curl#3784
MarcelRaad added a commit to MarcelRaad/curl that referenced this pull request Apr 26, 2020
Use them only if `_UNICODE` is defined, in which case command-line
arguments have been converted to UTF-8.

Closes curl#3784
MarcelRaad added a commit to MarcelRaad/curl that referenced this pull request Apr 26, 2020
It only applies to non-Unicode builds now.
Also merge 5.10 into it as it's effectively a duplicate.

Closes curl#3784
@MarcelRaad MarcelRaad force-pushed the MarcelRaad:windows_unicode_command_line_arguments branch from d337aeb to 1577c33 Apr 26, 2020
@MarcelRaad MarcelRaad force-pushed the MarcelRaad:windows_unicode_command_line_arguments branch from 1577c33 to cacc3c4 May 9, 2020
MarcelRaad added a commit to MarcelRaad/curl that referenced this pull request May 9, 2020
This will also be needed in the tool and tests.

Ref: curl#3758 (comment)
Closes curl#3784
MarcelRaad added a commit to MarcelRaad/curl that referenced this pull request May 9, 2020
- use `wmain` instead of `main` when `_UNICODE` is defined [0]
- define `argv_item_t` as `wchar_t *` in this case
- use the curl_multibyte gear to convert the command-line arguments to
  UTF-8

This makes it possible to pass parameters with characters outside of
the current locale on Windows, which is required for some tests, e.g.
the IDN tests. Out of the box, this currently only works with the
Visual Studio project files, which default to Unicode, and winbuild
with the `ENABLE_UNICODE` option.

[0] https://devblogs.microsoft.com/oldnewthing/?p=40643

Ref: curl#3747
Closes curl#3784
MarcelRaad added a commit to MarcelRaad/curl that referenced this pull request May 9, 2020
Use them only if `_UNICODE` is defined, in which case command-line
arguments have been converted to UTF-8.

Closes curl#3784
MarcelRaad added a commit to MarcelRaad/curl that referenced this pull request May 9, 2020
It only applies to non-Unicode builds now.
Also merge 5.10 into it as it's effectively a duplicate.

Closes curl#3784
@MarcelRaad
Copy link
Member Author

MarcelRaad commented May 9, 2020

Any idea how I can see what went wrong with the "CI / macos normal (pull_request)" job? All I see there is a big white cross and "This check failed" without any visible link or button.

Also, it seems like 148534d broke the AppVeyor tests when both OpenSSL and WinSSL are off. 36 tests are failing with "curl: (48) An unknown option was passed in to libcurl".

@mback2k
Copy link
Member

mback2k commented May 10, 2020

Any idea how I can see what went wrong with the "CI / macos normal (pull_request)" job? All I see there is a big white cross and "This check failed" without any visible link or button.

This happens randomly with macOS builds on GitHub Actions (similar to the random failures on travis-ci before), because all CIs use MacStadium for hosting their macOS runners. If there is an issue with the runner it will show up as a failed CI run, but GitHub Action will retry running it. This is why you can see a 2nd result of "CI / macos normal (pull_request)" showing up as successful.

MarcelRaad added 4 commits Apr 13, 2019
This will also be needed in the tool and tests.

Ref: #3758 (comment)
Closes #3784
- use `wmain` instead of `main` when `_UNICODE` is defined [0]
- define `argv_item_t` as `wchar_t *` in this case
- use the curl_multibyte gear to convert the command-line arguments to
  UTF-8

This makes it possible to pass parameters with characters outside of
the current locale on Windows, which is required for some tests, e.g.
the IDN tests. Out of the box, this currently only works with the
Visual Studio project files, which default to Unicode, and winbuild
with the `ENABLE_UNICODE` option.

[0] https://devblogs.microsoft.com/oldnewthing/?p=40643

Ref: #3747
Closes #3784
Use them only if `_UNICODE` is defined, in which case command-line
arguments have been converted to UTF-8.

Closes #3784
It only applies to non-Unicode builds now.
Also merge 5.10 into it as it's effectively a duplicate.

Closes #3784
@MarcelRaad MarcelRaad force-pushed the MarcelRaad:windows_unicode_command_line_arguments branch from cacc3c4 to 045d10c May 12, 2020
MarcelRaad added a commit that referenced this pull request May 14, 2020
- use `wmain` instead of `main` when `_UNICODE` is defined [0]
- define `argv_item_t` as `wchar_t *` in this case
- use the curl_multibyte gear to convert the command-line arguments to
  UTF-8

This makes it possible to pass parameters with characters outside of
the current locale on Windows, which is required for some tests, e.g.
the IDN tests. Out of the box, this currently only works with the
Visual Studio project files, which default to Unicode, and winbuild
with the `ENABLE_UNICODE` option.

[0] https://devblogs.microsoft.com/oldnewthing/?p=40643

Ref: #3747
Closes #3784
MarcelRaad added a commit that referenced this pull request May 14, 2020
Use them only if `_UNICODE` is defined, in which case command-line
arguments have been converted to UTF-8.

Closes #3784
MarcelRaad added a commit that referenced this pull request May 14, 2020
It only applies to non-Unicode builds now.
Also merge 5.10 into it as it's effectively a duplicate.

Closes #3784
@MarcelRaad MarcelRaad deleted the MarcelRaad:windows_unicode_command_line_arguments branch May 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants
You can’t perform that action at this time.