-
-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better error checking for Win32 IDN functions #637
Conversation
By analyzing the blame information on this pull request, we identified @pierrejoye, @yangtse and @bagder to be potential reviewers |
Ref 5d7c937#commitcomment-15913422 There's no path for it to be NULL afaict. Personally I'd prefer an early-exit pattern in a function like this but your changes will be fine. Also, unrelated to your changes, that |
That cast in |
It might be useful in the future I'd rather fix it and then leave it be. Weird, I don't recall seeing an unused function warning when I built for WinIDN, maybe that warning is disabled somewhere. |
- Fix a conversion bug in the unused function curl_win32_ascii_to_idn()
I have updated the pull request with a bugfix for |
.. also fix a conversion bug in the unused function curl_win32_ascii_to_idn(). And remove wprintfs on error (Jay). Bug: #637
Modified in 3 ways, otherwise it's good:
Landed in 9e7fcd4. Thanks Michael, and thanks Gisle for getting the ball rolling. |
Jay> The responsibility is on the caller to show the error from a function like that via warnf or something. The warnings specific to WinIDN are non-existent; no help in case the Active CodePage is wrong. Also it seems UTF8 input-encoding is assumed throughout libcurl/WinIDN (?). But where is the call to |
You are right WinIDN does not work properly from URLs provided by the curl tool, that is probably related to #345. From libcurl UTF-8 is expected. In Visual Studio as far as I know this can be expressed one of two ways. If you have saved your source as UTF-8 with BOM Visual Studio will do translations to the local codepage for literal char strings but if you have saved your source as UTF-8 without BOM it will not do any translation. Therefore either of these is acceptable:
and WinIDN will convert that UTF-8 string to xn--h1alffa9f.net and make the connection. So one issue is whether a URL passed to CURLOPT_URL is supposed to be in UTF-8 format or the local codepage. It doesn't appear to be documented either way, actually. Another issue is being able to access Unicode from the command prompt using the curl tool. Frankly I've been hoping someone would come along and work on that, but it's been open too long we'll probably have to put it in the TODO. |
In which Viktor Szakats proposed to build curl.exe with
(which are simple to fix). But I still have problems with e.g.
|
Hi, On Feb 8, 2016 7:30 PM, "Michael Kaufmann" notifications@github.com wrote:
I do not think there is a bug per se in this specific part. Eventually in Please do not replace UTF8 with ACP there as any valid code using curl via To get a UTF8 console on windows one can use: CHCP 65001 in a batch file and create a new console with cmd.exe /k
|
@pierrejoye The issue though is this is undocumented and the behavior appears to be different depending on the IDN library. For libidn we're calling idna_to_ascii_lz which does the current locale. One would expect that WinIDN (aka normaliz.lib) would do the same but instead as Gisle pointed out we're using UTF-8 regardless of locale. Currently UTF-8 on the command line isn't compatible with the curl tool in Windows, see my comments in #345. Personally I'd rather WinIDN behave the same as libidn. At this point in order to preserve backwards compatibility I don't know what we can do, maybe UTF-8 detection or something. Also: There is a mothballed function I wrote which may be useful, utf8_strict_codepoint_count |
Yes, I saw it :) As they seem related there different issues. One is how curl command line manages the input argument encoding. The 2nd issue is the usage of the various windows APIs, as far as I The 3rd issue or question is how curl API should work. To me it should keep In any case, I am in the process of making php on windows Unicode friendly
|
Sure. But what do you think of the idea I mentioned where for Windows we do an autodetect like
IIRC libidn when not built with libiconv will internally default to UTF-8 encoding even when idna_to_ascii_lz is used (I last read it about 8 months ago, that may have changed). So how many libidn's are built with libiconv, how many are actually getting locale specific translations and not UTF-8? I really don't know. |
@pierrejoye Using PS. I use 4NT 5 (Unicode) as my shell. From it's help-file:
|
@gvanem Sorry if I was not clear. In any case, what should be used by curl should be detected during the arguments processing. The underlying curl APIs can or should remain using UTF-8 for portability and ease of use reasons. That's at least what I can see in many other libraries and it works quite well in a portable manner. It leaves the responsibility to the caller to provide UTF-8 or ASCII. For example, in our case, the caller is the curl.exe command. @jay I think it is nice to do yes, checking if it is ASCII or not. I am not sure about actually validating UTF8 (or other) as it should be ideally done at a later stage and fails accordingly. |
@gvanem by the way, about building with -DUNICODE -D_UNICODE: I would rather avoid it and uses the *W APIs accordingly. It is by far easier and cleaner to maintain and less magic happens. It also makes crystal clear what is being called and why. |
Yes, built using
On that I agree. |
@gvanem We agree then :) I will try to work on a proposal/PR for the command line tool(s) and underlying APIs. I need it for PHP anyway (the APIs part). |
I envision this spiraling out of control with a whole lot of supporting wide character functions that would be more work to maintain. We have to pass |
@jay For example, the idea of using native windows IDN functions was to reduce the amount of dependency (which requires quite some work to keep up with :). I think we agree or talk about the same thing. On windows there is no other solution than working with the wide chars APIs when it comes to Unicode. Even using the _UNICODE build mode is about that. They are only tricks to redefine native functions to use the A or W version of an API. Per default on linux, UTF8 is considered as available, with some checks if a string is pure ASCII or not. It can be exactly the same for Windows. The only annoyance, inevitable, is the conversion between wide char and UTF8 before calling a windows function, but that's easy to do and not very intrusive, only utf8towidechar and widechartoutf8 being used. Is it what you have in mind too? |
Right. |
@pierrejoye I made a draft but frankly I don't really feel like taking it on for assignment. Refer to my latest comment in 345. |
Curl_convert_UTF8_to_wchar()
fails