idn: fix libidn2 with windows unicode builds #7246

vszakats · 2021-06-11T23:59:40Z

Unicode Windows builds use UTF-8 strings internally in libcurl, so make sure to call the UTF-8 flavour of the libidn2 API.

Reported-by: dEajL3kA on github
Fixes #7228

vszakats · 2021-06-12T00:01:52Z

@dEajL3kA: Can you give a test to this patch?

dEajL3kA · 2021-06-12T17:07:16Z

@dEajL3kA: Can you give a test to this patch?

Thank you. Works fine for me.

But I would like to suggest a small change to avoid compiler warning about pointer type:

diff --git a/lib/url.c b/lib/url.c
index edcdf54b1a07..2bfe7631c1d7 100644
--- a/lib/url.c
+++ b/lib/url.c
@@ -62,6 +62,12 @@
 #ifdef USE_LIBIDN2
 #include <idn2.h>
 
+#if defined(WIN32) && defined(UNICODE)
+#define IDN2_LOOKUP(name, host, flags) idn2_lookup_u8((const uint8_t*)name, (uint8_t**)host, flags)
+#else
+#define IDN2_LOOKUP(name, host, flags) idn2_lookup_ul((const char*)name, (char**)host, flags)
+#endif
+
 #elif defined(USE_WIN32_IDN)
 /* prototype for curl_win32_idn_to_ascii() */
 bool curl_win32_idn_to_ascii(const char *in, char **out);
@@ -1576,12 +1582,12 @@ CURLcode Curl_idnconvert_hostname(struct Curl_easy *data,
 #else
       int flags = IDN2_NFC_INPUT;
 #endif
-      int rc = idn2_lookup_ul((const char *)host->name, &ace_hostname, flags);
+      int rc = IDN2_LOOKUP(host->name, &ace_hostname, flags);
       if(rc != IDN2_OK)
         /* fallback to TR46 Transitional mode for better IDNA2003
            compatibility */
-        rc = idn2_lookup_ul((const char *)host->name, &ace_hostname,
-                            IDN2_TRANSITIONAL);
+        rc = IDN2_LOOKUP(host->name, &ace_hostname,
+                         IDN2_TRANSITIONAL);
       if(rc == IDN2_OK) {
         host->encalloc = (char *)ace_hostname;
         /* change the name pointer to point to the encoded hostname */

Regards.

Ref: curl#7246 (comment)

vszakats · 2021-06-12T17:48:56Z

@dEajL3kA: Nice catch, I've updated the patch with your fix.

jay

While it is true of the curl tool unicode builds they will use utf-8 it is not necessarily true of libcurl since it takes whatever the application passes. There is already a fallback for failed translation, and I wonder if that could also be done for utf8, like for example

diff --git a/lib/url.c b/lib/url.c
index 27ba7d6..0eb364f 100644
--- a/lib/url.c
+++ b/lib/url.c
@@ -1585,7 +1585,12 @@ CURLcode Curl_idnconvert_hostname(struct Curl_easy *data,
 #else
       int flags = IDN2_NFC_INPUT;
 #endif
-      int rc = idn2_lookup_ul((const char *)host->name, &ace_hostname, flags);
+      int rc;
+#if defined(WIN32) && defined(UNICODE)
+      rc = idn2_lookup_u8((const char *)host->name, &ace_hostname, flags);
+      if(rc != IDN2_OK)
+#endif
+        rc = idn2_lookup_ul((const char *)host->name, &ace_hostname, flags);
       if(rc != IDN2_OK)
         /* fallback to TR46 Transitional mode for better IDNA2003
            compatibility */

dEajL3kA · 2021-06-14T08:40:52Z

While it is true of the curl tool unicode builds they will use utf-8 it is not necessarily true of libcurl since it takes whatever the application passes. There is already a fallback for failed translation, and I wonder if that could also be done for utf8, like for example

Shouldn't the API define which encoding is expected for char* string inputs, rather than leaving this "undefined" and having to make guesses afterwards? And, in (Windows) Unicode builds, anything but UTF-8 doesn't really make sense to me.

Also, a problem I see is that it is quite possible that the exact same char* string decodes "correctly" (seemingly) as both, UTF-8 and ANSI (e.g. Windows-1252). Still it won't necessarily decode to what was intended by the caller! In other words: If we always try idn2_lookup_u8() first and use idn2_lookup_ul() as a fallback, then idn2_lookup_u8() may succeed even if the input is actually an ANSI-encoded string. But, of course, in that case, the result wouldn't be what the user had intended. Trying idn2_lookup_ul() first and using idn2_lookup_u8() as a fallback would probably be even worse, though. IMO, it would be better to just not support ANSI inputs at all (in the Unicode build), instead of implementing an "unpredictable" behavior.

Either that, or make the API offer a function that allows the application to set the character encoding (UTF-8 being the default).

jay · 2021-06-14T17:45:38Z

There's probably a slight risk of the wrong conversion. I'm interested in supporting the user's existing code while making it possible to fix this issue. We've had a lot of discussions around Unicode in Windows and what we have now mostly works without breaking anything.

dEajL3kA · 2021-06-14T18:10:11Z

IMHO, there are only two "clean" ways to do this: Either define which encoding your API expects for char* input strings – which can only be UTF-8 for the Unicode version of the library – and strictly assume that encoding in your implementation; or offer two separate functions, one for UTF-8 strings and one for strings encoded in the "local" ANSI codepage – just like libidn2 does.

It's just not possible to know from the char* string alone which encoding is used. The exactly same sequence of bytes can be a "valid" UTF-8 string and a "valid" ANSI string at the same time, but we don't know what the user intended. So I think the chance of "wrong conversion" in a function that tries to accept UTF-8 and ANSI strings is very real. You can try to make an educated guess. And, in some cases, you will be lucky and things works as the user had expected. But, in other cases, the user will see "weird" results. This makes debugging very hard, because it first appears to wor,k but then suddenly it breaks. So I think it is much preferible to consitently fail when the string is delivered in the "wrong" encoding, instead of implementing a "fix" that cannot be relied on.

Just my two cents. As long as UTF-8 is always tried first (in the Unicode version), I will probably be fine with that 😏

jay · 2021-06-14T18:18:01Z

Sure you've made that point, and I just don't think it's as big a deal. We already do this for the file api and probably other stuff, we try utf8 and if that fails fall back on local encoding. In other words, we prefer UTF-8 in Unicode builds but in order not to disrupt other clients we will try the local encoding. This seems like a pretty good compromise and so far we don't have any reports of a wrong encoding.

dEajL3kA · 2021-06-14T18:28:40Z

Just one more thing I would like to add: This even could have some security implications. I cannot give you a concrete exploit, but suppose some user relies on the ANSI fallback behavior to open a certain file (or URL). This user probably isn't even aware that the given string will be interpreted as UTF-8 first, before falling back to ANSI. Now, if an ANSI string is interpreted as UTF-8, you probably get some "garbage" file name that is unlikely to exist – normally. Still, an attacker who knows which ANSI file name the user is trying to open can easily figure out the corresponding "garbage" UTF-8 name and place a malicious file there...

(Loading the .curlrc from your "home" directory could be a possible attack vector, e.g. if the user account name contains some "foreign" chars. This may offer the possibility to redirect the file open operation to another user's home directory)

jay · 2021-06-14T18:35:48Z

Still, an attacker who knows which ANSI file name the user is trying to open can easily figure out the corresponding "garbage" UTF-8 name and then place a "malicious" file there...

IMO this is an acceptable security risk to continue supporting legacy clients. If the user has strings that are encoded locally then we try to support it but you'd run the risk of that if there were such an example.

vszakats · 2021-06-14T20:06:17Z

So, if I understand correctly, in libcurl the internal representation of strings is undefined (and incidentally it is UTF-8 inside Unicode curl binary builds). If it is undefined, it will be impossible to properly handle it in API boundaries that make any codepage assumptions regarding the strings passed around.

As a possible solution, libcurl could get a new API where the string representation can be specified by the caller environment (e.g. in case of a Windows Unicode curl.exe this would be UTF-8) and decide which libidn2 API to call, based on this information (as opposed to the UNICODE build-time flag, which this patch is using for this purpose).

This leads very far, so I suggest closing this PR before the above issue is cleared up.

jay · 2021-06-14T20:12:20Z

That's a lot of work and we have a best effort method that seems to work well enough, by overriding fopen, stat, access to try unicode first. We could do the same here.
https://github.com/curl/curl/blob/curl-7_77_0/lib/curl_multibyte.c#L115-L130

dEajL3kA · 2021-06-14T20:26:26Z

Just for the notes, curl_win32_idn_to_ascii() and curl_win32_ascii_to_idn() already do assume UTF-8 encoding:
https://github.com/curl/curl/blob/master/lib/idn_win32.c#L71-L88

vszakats · 2021-06-14T20:28:37Z

@jay: Hm, this is interesting though it also seems quite risky and opens a chance for security problems.

Regarding this PR: Is there a way to decide if the UTF-8 conversion failed when calling the libidn2 API? and if so retry in raw mode? This looks quite a bit risky solution as well. Another approach would be to test the string if it is UTF-8, and call the libidn2 API accordingly. Though detecting UTF-8 isn't a 100% matter, especially when someone wants to specifically trick the detection method to misdetect.

vszakats · 2021-06-14T21:27:59Z

I tend to think it'd be best to document that Windows UNICODE builds are expecting UTF-8 strings and expect callers to provide string in this encoding. This was an implicit (and not at all obvious) breaking change that is already made. Mostly for libcurl calls, but even "curl.exe" calls aren't always behaving like they were before. And, "best effort" falling back to raw 8-bit mode isn't (always) feasible or secure, and even less so any kind of auto-detection.

[ Passing/retrieving UTF-8 strings isn't the most natural thing for Windows apps to do, but once implemented, it's a well defined, two-way conversion and there is a long list of other open-source projects which expect UTF-8 strings, so chances are good that this is already a familiar path. ]

Other alternatives require extreme efforts, such as introducing new API variants (with well-defined encoding requirements) where strings are involved, or adding the ability to tell curl the encoding/codepage strings are passed to (and received from) it and convert them internally to the formats requested, or adding a way to query the encoding/codepage curl is expecting and allow the caller to convert them to it before passing to curl. None of these is trivial and there is an effort to be made on both caller and curl side in all cases. Not to mention the possible errors in implementing any of these solutions.

For the first alternative the only bit required (from libcurl) is a way for a caller to detect a UNICODE libcurl build. This can already be done with: (curl_version_info(CURLVERSION_NOW)->features & CURL_VERSION_UNICODE) != 0, though it could probably made simpler.

MarcelRaad · 2021-06-14T22:10:53Z

Note that the only supported way to get a Unicode libcurl build that expected ANSI strings in some places before version 7.71.0, when #3784 landed, was by using the Visual Studio project files or, starting with version 7.67.0, winbuild. Both support only MSVC and not e.g. MinGW. All SSPI and Schannel code in libcurl has always expected UTF-8 in Unicode builds, and e.g. using ANSI-encoded passwords never worked with them. Given all this, I don't think breaking backwards compatibility in this case and just documenting UTF-8 for Windows Unicode builds would be dramatic.

I'm not strongly opposed to a best-effort fallback like @jay proposed in #7246 (review) though.

- For Windows Unicode builds of libcurl that use libidn, attempt to convert the hostname from UTF-8 and if that fails use local encoding as a fallback. Prior to this change Windows Unicode builds that use libidn attempted to convert the hostname from the local encoding only. The curl tool passes the command-line URL (and therefore the hostname) in Unicode's UTF-8 encoding, so the conversion would fail. Since other applications may pass locally encoded URLs to libcurl we'll keep local encoding as a fallback. Reported-by: dEajL3kA@users.noreply.github.com Assisted-by: Viktor Szakats Fixes curl#7228 Closes curl#7246 Closes #xxxx

jay · 2021-06-14T22:30:12Z

I'm not strongly opposed to a best-effort fallback like @jay proposed in #7246 (review) though.

Now #7256, as an alternative to this.

dEajL3kA · 2021-06-14T22:35:05Z

If not breaking "legacy" applications which use ANSI encoding is a priority, then why not make all public API functions consitently assume ANSI (local Codepage) encoding by default and add a new API function to enable UTF-8 support. Unicode-aware applications, inclduing curl CLI program, will call that new function early on. And, once that function has been called, all public API functions consitently assume UTF-8 encoding (instead of ANSI). This would ensure consistent behavior for both, legacy and Unicode-aware applications and, at the same time, eliminate the need for any "best effort" workarounds.

In the library implementation a pair of functions curlx_multibyte_to_wchar() and curlx_wchar_to_multibyte(), which check the global "UTF-8 mode enabled" flag and perform the proper conversions, could be used, in order to make things easy.

vszakats · 2021-06-14T22:37:14Z

@MarcelRaad: Thanks for pointing out @jay's 'best effort' patch, I did miss that.

I'm however not convinced about such solution and have a gut feeling that it leads to no good at all. First it will lead to potential exploits, and when fixing/changing it or possibly being forced to remove the workaround at a future point, it will lead to a second round of breaking change, where callers will need to finally do the right thing and send UTF-8 proper. Simply I do not believe that you can correctly guess the codepage/encoding by trial and error. (I'm coming from a place with accents.)

Either way, I'd prefer to commit this as-is now and leave adding compatibility-bandaids in a separate commit to your own discrection.

vszakats · 2021-06-14T22:50:18Z

FWIW, I'm now tempted to turn off UNICODE in curl-for-win. I wasn't aware about the "best effort" measures and I see it as a massive potential to open a can of problems. Also, if UNICODE builds are still supposed to accept 8-bit ("ANSI") strings and libcurl randomly/arbitrarily decides how to interpet them, what is the purpose of UNICODE mode at all?

What if an 8-bit ("ANSI") string is a valid UTF-8 string? What if a file-operation with a valid UTF-8 filename fails for an unrelated reason then libcurl continues to try the same string as an 8-bit one and corrupts/reads/overwrites an existing but unrelated file as a result? What if this behaviour gets actively exploited by a malicious party?

[ Just to clarify "ANSI" (and "OEM") is not a codepage. It's Microsoft/Windows terminology to denote a list of 8-bit codepages. The exact interpretation will depend on the OS environment. ]

dEajL3kA · 2021-06-14T22:54:37Z

FWIW, I'm now tempted to turn off UNICODE in curl-for-win.

UNICODE support is an absoloute must for Windows! Windows has supported Unicode for like ~20 years now and all modern Windows applications need to support it. The old ANSI APIs only still exist in Windows for legacy compatibility. Using ANSI in your application on Windows results in tons of problems, like the command-line arguments that were passed to your application get converted into a string containing "?" characters for all characters that cannot be represented in the local ANSI codepage. This way you will not be able to open files that contain such characters. And such files almost always do exist, because Window itself and its file system are fully Unicode (UTF-16) under the hood. So I think getting proper Unicode support in libcurl is essential.

Just to clarify "ANSI" (and "OEM") is not a codepage. It's Microsoft/Windows terminology to denote a list of 8-bit codepages. The exact interpretation will depend on the OS environment.

In Win32 API all funnctions exist twice: As a "Unicode" (wide string) version that uses UTF-16 strings, and as an "ANSI" version that uses multi-byte strings that are encoded in whatever local codepage happens to be select in the system (e.g. Windows-1252). So, when I say "ANSI", then I mean the system's local codepage, as used by the Win32 API in its "ANSI" functions.

jay · 2021-06-14T23:08:54Z

What if a file-operation with a valid UTF-8 filename fails for an unrelated reason then libcurl continues to try the same string as an 8-bit one and corrupts/reads/overwrites an existing but unrelated file as a result?

Yes as it's written what could actually happen is opening a file using the unicode character set could fail for some reason unrelated to a failed conversion, but then fall back on treating the string as local encoding. In my opinion this could be improved to fall back only if the string is not UTF-8 encoded. The counter is clearly get rid of the fallbacks, and I just don't think we should do that because it could break existing user code. I see problems either way and I don't see a significant security issue here. I am also not convinced, but not strongly opposed either.

vszakats · 2021-06-14T23:16:33Z

@dEajL3kA: Agreed completely. I'm an absolute proponent of UNICODE. But, if support for UNICODE in libcurl doesn't have clearly defined expectations and has dynamically changing behaviour when interpreting input, it's a slipperly slope.

Back to IDN specifically, it seems that WinIDN builds don't do the "best effort" interpreation and expect UTF-8 or fail. Meaning that an alternative for curl-for-win is switching back to WinIDN.

It also raises the question: Why is it desired to implement a "best effort" strategy for libidn2 and not doing so for WinIDN?

vszakats · 2021-06-14T23:18:54Z

@jay: The problem is that there is no exact way of detecting UTF-8 encoding. There are several methods, but none of them 100%.

- If the UTF-8 to UTF-16 conversion fails in Windows Unicode builds then no longer fall back to assuming the string is in a local encoding. Background: Some functions in Windows Unicode builds must convert UTF-8 to UTF-16 to pass to the Windows CRT API wide-character functions since in Windows UTF-8 is not a valid locale (or at least 99% of the time right now). Prior to this change if the Unicode encoding conversion failed then libcurl would assume, for backwards compatibility with applications that may have written their code for non-Unicode builds, attempt to convert the string from local encoding to UTF-16. That type of "best effort" could theoretically cause some type of security or other problem if a string that was locally encoded was also valid UTF-8, and therefore an unexpected UTF-8 to UTF-16 conversion could occur. Ref: curl#7246 Closes #xxxx

jay · 2021-06-15T04:22:02Z

Back to IDN specifically, it seems that WinIDN builds don't do the "best effort" interpreation and expect UTF-8 or fail. Meaning that and alternative for curl-for-win is switching back to WinIDN.

It also raises the question: Why is it desired to implement a "best effort" strategy for libidn2 and not doing so for WinIDN?

It is for anyone using libidn2 and passing in the local encoding. As I said I don't want to disrupt their application; while at the same time I'd like to fix this for the curl tool. This is not the case with WinIDN which always used UTF-8.

Somewhat related to Marcel's point, I think we have to consider how likely is it right now that anyone using a Unicode build of libcurl for their application is also using libidn2 and not WinIDN? Probably very unlikely considering we don't support it outright in autobuild (see #7229). curl-for-win has a libcurl that uses libidn, but is anyone using that version of libcurl for their application or are they just using the curl tool? I don't know.

@jay: The problem is that there is no exact way of detecting UTF-8 encoding. There are several methods, but none of them 100%.

Yes I was aware of this, it is what I meant by the risk. I'm not disputing that there's some chance a locally encoded string may successfully translate to Unicode mistakenly as UTF-8.

On further consideration reading through all comments I think it's sensible that we retire the fallback behavior... so how about #7257 for the next feature window? I would rather wait with this PR until then as well. Also the documentation could be modified at that time by adding this:

diff --git a/docs/libcurl/opts/CURLOPT_URL.3 b/docs/libcurl/opts/CURLOPT_URL.3
index 17e5c8b..8d705fc 100644
--- a/docs/libcurl/opts/CURLOPT_URL.3
+++ b/docs/libcurl/opts/CURLOPT_URL.3
@@ -72,7 +72,7 @@ expected to be a sequence of characters using an ASCII compatible encoding.
 
 If libcurl is built with IDN support, the server name part of the URL can use
 an "international name" by using the current encoding (according to locale) or
-UTF-8 (when winidn is used).
+UTF-8 (when winidn is used; or a Windows Unicode build using libidn).
 
 If libcurl is built without IDN support, the server name is used exactly as
 specified when passed to the name resolver functions.

vszakats · 2021-06-15T08:18:34Z

@jay: Agreed on the removal of fallbacks and the addition to the documentation. I've updated the PR with your patch.

Minor question: Shall we use the term 'libidn' in the docs, or rather 'libidn2'? It seems curl had fully switched to 'libidn2' a while back (9c91ec7).

bagder · 2021-06-15T08:20:43Z

Shall we use the term 'libidn' in the docs, or rather 'libidn2'?

Since they are two different libraries implementing different IDNA standards I think it's a point to say libidn2.

Unicode Windows builds use UTF-8 strings internally in libcurl, so make sure to call the UTF-8 flavour of the libidn2 API. Reported-by: dEajL3kA on github Assisted-by: Jay Satiro Reviewed-by: Marcel Raad Fixes curl#7228

- If the UTF-8 to UTF-16 conversion fails in Windows Unicode builds then no longer fall back to assuming the string is in a local encoding. Background: Some functions in Windows Unicode builds must convert UTF-8 to UTF-16 to pass to the Windows CRT API wide-character functions since in Windows UTF-8 is not a valid locale (or at least 99% of the time right now). Prior to this change if the Unicode encoding conversion failed then libcurl would assume, for backwards compatibility with applications that may have written their code for non-Unicode builds, attempt to convert the string from local encoding to UTF-16. That type of "best effort" could theoretically cause some type of security or other problem if a string that was locally encoded was also valid UTF-8, and therefore an unexpected UTF-8 to UTF-16 conversion could occur. Ref: curl#7246 Closes #xxxx

- If the UTF-8 to UTF-16 conversion fails in Windows Unicode builds then no longer fall back to assuming the string is in a local encoding. Background: Some functions in Windows Unicode builds must convert UTF-8 to UTF-16 to pass to the Windows CRT API wide-character functions since in Windows UTF-8 is not a valid locale (or at least 99% of the time right now). Prior to this change if the Unicode encoding conversion failed then libcurl would assume, for backwards compatibility with applications that may have written their code for non-Unicode builds, attempt to convert the string from local encoding to UTF-16. That type of "best effort" could theoretically cause some type of security or other problem if a string that was locally encoded was also valid UTF-8, and therefore an unexpected UTF-8 to UTF-16 conversion could occur. Ref: #7246 Closes #7257

On closer inspection, the state of Unicode support in libcurl does not seem to be ready for production. Existing support extended certain Windows interfaces to use the Unicode flavour of the Windows API, but that also meant that the expected encoding/codepage of strings (e.g. local filenames, URLs) exchanged via the libcurl API became ambiguous and undefined. Previously all strings had to be passed in the active Windows locale, using an 8-bit codepage. In Unicode libcurl builds, the expected string encoding became an undocumented mixture of UTF-8 and 8-bit locale, depending on the actual API/option, certain dynamic and static "fallback" logic inside libcurl and even in OpenSSL, while some parts of libcurl kept using 8-bit strings internally. From the user's perspective this poses an unreasonably difficult task in finding out how to pass a certain non-ASCII string to a specific API without unwanted or accidental (possibly lossy) conversions or other side-effects. Missing the correct encoding may result in unexpected behaviour, e.g. in some cases not finding files, finding different files, accessing the wrong URL or passing a corrupt username or password. Note that these issues may _only_ affect strings with _non-ASCII_ content. For now the best solution seems to be to revert back to how libcurl/curl worked for most of its existence and only re-enable Unicode once the remaining parts of Windows Unicode support are well-understood, ironed out and documented. Unicode was enabled in curl-for-win about a year ago with 7.71.0. Hopefully this period had the benefit to have surfaced some of these issues. Ref: curl/curl#6089 Ref: curl/curl#7246 Ref: curl/curl#7251 Ref: curl/curl#7252 Ref: curl/curl#7257 Ref: curl/curl#7281 Ref: curl/curl#7421 Ref: https://github.com/curl/curl/wiki/libcurl-and-expected-string-encodings Ref: 8023ee5

On closer inspection, the state of Windows Unicode support in libcurl does not seem to be ready for production. Existing support extended certain Windows interfaces to use the Unicode flavour of the Windows API, but that also meant that the expected encoding/codepage of strings (e.g. local filenames, URLs) exchanged via the libcurl API became ambiguous and undefined. Previously all strings had to be passed in the active Windows locale, using an 8-bit codepage. In Unicode libcurl builds, the expected string encoding became an undocumented mixture of UTF-8 and 8-bit locale, depending on the actual API, build options/dependencies, internal fallback logic based on runtime auto-detection of passed string, and the result of file operations (scheduled for removal in 7.78.0). While some parts of libcurl kept using 8-bit strings internally, e.g. when reading the environment. From the user's perspective this poses an unreasonably complex task in finding out how to pass (or read) a certain non-ASCII string to (from) a specific API without unwanted or accidental conversions or other side-effects. Missing the correct encoding may result in unexpected behaviour, e.g. in some cases not finding files, reading/writing a different file, accessing the wrong URL or passing a corrupt username or password. Note that these issues may only affect strings with _non-7-bit-ASCII_ content. For now the least bad solution seems to be to revert back to how libcurl/curl worked for most of its existence and only re-enable Unicode once the remaining parts of Windows Unicode support are well-understood, ironed out and documented. Unicode was enabled in curl-for-win about a year ago with 7.71.0. Hopefully this period had the benefit to have surfaced some of these issues. Ref: curl/curl#6089 Ref: curl/curl#7246 Ref: curl/curl#7251 Ref: curl/curl#7252 Ref: curl/curl#7257 Ref: curl/curl#7281 Ref: curl/curl#7421 Ref: https://github.com/curl/curl/wiki/libcurl-and-expected-string-encodings Ref: 8023ee5

vszakats added the Windows Windows-specific label Jun 11, 2021

MarcelRaad approved these changes Jun 12, 2021

View reviewed changes

vszakats added a commit to vszakats/curl that referenced this pull request Jun 12, 2021

account for different types idn2_lookup_*()

69a28e2

Ref: curl#7246 (comment)

vszakats force-pushed the libidn2uni branch from 9bcd276 to e9b86eb Compare June 12, 2021 17:59

vszakats changed the title ~~idn: fix libidn2 with unicode windows~~ idn: fix libidn2 with windows unicode builds Jun 12, 2021

vszakats force-pushed the libidn2uni branch from e9b86eb to 3592721 Compare June 12, 2021 18:00

jay requested changes Jun 13, 2021

View reviewed changes

jay mentioned this pull request Jun 14, 2021

url: Use UTF-8 conversion for Windows Unicode libidn builds #7256

Closed

jay mentioned this pull request Jun 15, 2021

curl_multibyte: Remove local encoding fallbacks #7257

Closed

vszakats force-pushed the libidn2uni branch from 3592721 to ce31598 Compare June 15, 2021 08:15

vszakats force-pushed the libidn2uni branch from ce31598 to 15ac1af Compare June 15, 2021 08:19

idn: fix libidn2 with windows unicode builds

aab8f5a

Unicode Windows builds use UTF-8 strings internally in libcurl, so make sure to call the UTF-8 flavour of the libidn2 API. Reported-by: dEajL3kA on github Assisted-by: Jay Satiro Reviewed-by: Marcel Raad Fixes curl#7228

vszakats force-pushed the libidn2uni branch from 2251791 to aab8f5a Compare June 15, 2021 09:32

vszakats closed this in 2026124 Jun 15, 2021

vszakats mentioned this pull request Jul 20, 2021

curl: revert to non-Unicode builds [ci skip] curl/curl-for-win#20

Merged

vszakats deleted the libidn2uni branch February 2, 2023 11:51

vszakats added the unicode Unicode, code page, character encoding label Feb 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

idn: fix libidn2 with windows unicode builds #7246

idn: fix libidn2 with windows unicode builds #7246

vszakats commented Jun 11, 2021 •

edited

vszakats commented Jun 12, 2021

dEajL3kA commented Jun 12, 2021

vszakats commented Jun 12, 2021

jay left a comment •

edited

dEajL3kA commented Jun 14, 2021 •

edited

jay commented Jun 14, 2021

dEajL3kA commented Jun 14, 2021 •

edited

jay commented Jun 14, 2021

dEajL3kA commented Jun 14, 2021 •

edited

jay commented Jun 14, 2021

vszakats commented Jun 14, 2021 •

edited

jay commented Jun 14, 2021

dEajL3kA commented Jun 14, 2021 •

edited

vszakats commented Jun 14, 2021 •

edited

vszakats commented Jun 14, 2021 •

edited

MarcelRaad commented Jun 14, 2021

jay commented Jun 14, 2021

dEajL3kA commented Jun 14, 2021 •

edited

vszakats commented Jun 14, 2021 •

edited

vszakats commented Jun 14, 2021 •

edited

dEajL3kA commented Jun 14, 2021 •

edited

jay commented Jun 14, 2021

vszakats commented Jun 14, 2021 •

edited

vszakats commented Jun 14, 2021

jay commented Jun 15, 2021

vszakats commented Jun 15, 2021

bagder commented Jun 15, 2021

idn: fix libidn2 with windows unicode builds #7246

idn: fix libidn2 with windows unicode builds #7246

Conversation

vszakats commented Jun 11, 2021 • edited

vszakats commented Jun 12, 2021

dEajL3kA commented Jun 12, 2021

vszakats commented Jun 12, 2021

jay left a comment • edited

Choose a reason for hiding this comment

dEajL3kA commented Jun 14, 2021 • edited

jay commented Jun 14, 2021

dEajL3kA commented Jun 14, 2021 • edited

jay commented Jun 14, 2021

dEajL3kA commented Jun 14, 2021 • edited

jay commented Jun 14, 2021

vszakats commented Jun 14, 2021 • edited

jay commented Jun 14, 2021

dEajL3kA commented Jun 14, 2021 • edited

vszakats commented Jun 14, 2021 • edited

vszakats commented Jun 14, 2021 • edited

MarcelRaad commented Jun 14, 2021

jay commented Jun 14, 2021

dEajL3kA commented Jun 14, 2021 • edited

vszakats commented Jun 14, 2021 • edited

vszakats commented Jun 14, 2021 • edited

dEajL3kA commented Jun 14, 2021 • edited

jay commented Jun 14, 2021

vszakats commented Jun 14, 2021 • edited

vszakats commented Jun 14, 2021

jay commented Jun 15, 2021

vszakats commented Jun 15, 2021

bagder commented Jun 15, 2021

vszakats commented Jun 11, 2021 •

edited

jay left a comment •

edited

dEajL3kA commented Jun 14, 2021 •

edited

dEajL3kA commented Jun 14, 2021 •

edited

dEajL3kA commented Jun 14, 2021 •

edited

vszakats commented Jun 14, 2021 •

edited

dEajL3kA commented Jun 14, 2021 •

edited

vszakats commented Jun 14, 2021 •

edited

vszakats commented Jun 14, 2021 •

edited

dEajL3kA commented Jun 14, 2021 •

edited

vszakats commented Jun 14, 2021 •

edited

vszakats commented Jun 14, 2021 •

edited

dEajL3kA commented Jun 14, 2021 •

edited

vszakats commented Jun 14, 2021 •

edited