curl on Windows incorrectly handle IDN-urls #731

Closed
Karlson2k opened this Issue Mar 26, 2016 · 15 comments

Projects

None yet

4 participants

@Karlson2k
Contributor

I did this

curl -v http://яндекс.рф
Results:
MinGW64-compiled:

* STATE: INIT => CONNECT handle 0x712f30; line 1108 (connection #-5000)
* Rebuilt URL to: http://яндекс.рф/
* Input domain encoded as `cp1251'
Segmentation fault

Cygwin-compiled:

* STATE: INIT => CONNECT handle 0x6000108b0; line 1108 (connection #-5000)
* Rebuilt URL to: http://яндекс.рф/
* Input domain encoded as `ANSI_X3.4-1968'
* Failed to convert яндекс.рф to ACE; System iconv failed
* Added connection 0. The cache now contains 1 members
* getaddrinfo(3) failed for яндекс.рф:80
* Couldn't resolve host 'яндекс.рф'
* Closing connection 0
* The cache now contains 0 members
* Expire cleared
curl: (6) Couldn't resolve host 'яндекс.рф'

VS14-compiled:

* Could not resolve host: http
* Closing connection 0
curl: (6) Could not resolve host: http

I expected the following

Normal http connection to site.

curl/libcurl version

MinGW64-compiled:

curl 7.46.0 (x86_64-w64-mingw32) libcurl/7.47.2-DEV OpenSSL/1.0.2e zlib/1.2.8 libidn/1.32 libssh2/1.6.0 librtmp/2.3
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtmp rtsp scp sftp smb smbs smtp smtps telnet tftp
Features: Debug IDN IPv6 Largefile NTLM SSL libz TLS-SRP

Cygwin-compiled:

curl 7.48.1-DEV (x86_64-unknown-cygwin) libcurl/7.48.1-DEV zlib/1.2.8 libidn/1.29
Protocols: dict file ftp gopher http imap pop3 rtsp smtp telnet tftp
Features: Debug IDN IPv6 Largefile libz UnixSockets

VS14-compiled:

curl 7.48.1-DEV (x86_64-pc-win32) libcurl/7.48.1-DEV WinSSL WinIDN
Protocols: dict file ftp ftps gopher http https imap imaps ldap pop3 pop3s rtsp smb smbs smtp smtps telnet tftp
Features: AsynchDNS IDN Largefile SSPI Kerberos SPNEGO NTLM SSL

operating system

Windows 7 SP1 x64
MinGW, Cygwin, VS and components - latest version.


Checked - libidn support for current locale encoding is broken. It detects some incorrect encoding from environment variable CHARSET and from nl_langinfo (CODESET) (which need to be fixed for Win32). Instead of fixing libidn I suggest to revamp encoding handling in libcurl/curl tool.
Using "locale encoding" may means something only for curl tool, but for libcurl it's useless and unsafe as other threads may change it on-fly.
I suggest to document that libcurl always assumes UTF-8 encoding in urls (which change nothing for most POSIX systems), convert encoding in curl tool and use idna_to_ascii_8z() in libcurl.
WinIDN-conversion curl_win32_idn_to_ascii() already assume UTF-8 input, but currently it's incorrect as curl tool don't convert anything.
For curl tool I'd use highly portable mbstowcs() with fast custom conversion wchar_t -> UTF-8.

@gvanem
Contributor
gvanem commented Mar 26, 2016

Seem to be related to #637. Which is closed, so some issues remains then?

I've never seen a Segmentation fault related to this. Can you figure out where?

@Karlson2k
Contributor

WinIDN conversion in libcurl is still not perfect, but it's not directly related.
libcurl expect urls in some "locale encoding" if libidn is used and urls in UTF-8 encoding if WinIDN is used.

@jay
Member
jay commented Mar 26, 2016

libcurl expect urls in some "locale encoding" if libidn is used and urls in UTF-8 encoding if WinIDN is used.

See #654 (comment).

I have a MinGW64 built curl and it does not crash

curl 7.48.0-DEV (i686-w64-mingw32) libcurl/7.48.0-DEV mbedTLS/2.2.1 zlib/1.2.8 libidn/1.32 libssh2/1.7.0 nghttp2/1.8.0
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp scp sftp smtp smtps telnet tftp
Features: AsynchDNS Debug TrackMemory IDN IPv6 Largefile SSPI Kerberos SPNEGO NTLM SSL libz HTTP2
* STATE: INIT => CONNECT handle 0xb58f78; line 1108 (connection #-5000)
* Added connection 0. The cache now contains 1 members
* STATE: CONNECT => WAITRESOLVE handle 0xb58f78; line 1145 (connection #0)
* Could not resolve host: http
* Closing connection 0
* The cache now contains 0 members
curl: (6) Could not resolve host: http
@Karlson2k
Contributor

Segmentation fault caused by call to tld_check_lz() in tld_check_name().

@Karlson2k
Contributor

@jay "Doctor, I have a pain in my leg!"
"Very strange! I have same leg as yours and don't have a pain!"
Set Russian locale and retry.

@Karlson2k
Contributor

The problem is in bad support of charset conversion and charset detection in libidn.
Windows with Russian locale use CP1251 code page for GUI programs and CP866 for console programs (for historical reason - for better compatibility with DOS programs). See GetOEMCP() for console codepage and GetACP() for GUI codepage.
Moreover, msys2 use UTF-8 as codepage (65001). And codepage can be changed at any moment by chcp command in console. But libidn always use CP1251 on my machine.

@jay
Member
jay commented Mar 26, 2016

I still can't get it to crash, also cygwin built curl works for me

curl 7.47.1 (i686-pc-cygwin) libcurl/7.47.1 OpenSSL/1.0.2g zlib/1.2.8 libidn/1.29 libpsl/0.12.0 (+libidn/1.29) libssh2/1.7.0 nghttp2/1.7.1
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp scp sftp smb smbs smtp smtps telnet tftp
Features: Debug IDN IPv6 Largefile GSS-API Kerberos SPNEGO NTLM NTLM_WB SSL libz TLS-SRP HTTP2 UnixSockets Metalink PSL
$ curl -v http://яндекс.рф/
* STATE: INIT => CONNECT handle 0x80048530; line 1103 (connection #-5000)
* Input domain encoded as `UTF-8'
* Added connection 0. The cache now contains 1 members
*   Trying 77.88.55.66...
* STATE: CONNECT => WAITCONNECT handle 0x80048530; line 1156 (connection #0)
* Connected to яндекс.рф (77.88.55.66) port 80 (#0)
* STATE: WAITCONNECT => SENDPROTOCONNECT handle 0x80048530; line 1253 (connection #0)
* STATE: SENDPROTOCONNECT => DO handle 0x80048530; line 1271 (connection #0)
> GET / HTTP/1.1
> Host: xn--d1acpjx3f.xn--p1ai
> User-Agent: curl/7.47.1
> Accept: */*
@Karlson2k
Contributor

I'd suggest to use UTF-8 in tld_check_name() (as libidn convert "locale encoding" through UTF-8) and don't use offset err_pos as array index - it's index of error in UTF-32, this index is not converted when UTF-32 converted to UTF-8 and then - to "locale encoding". Can cause buffer overrun.

@Karlson2k
Contributor

@jay Try export LC_ALL=C && curl -v http://яндекс.рф/

@jay
Member
jay commented Mar 26, 2016

I tried LC_ALL set to C but it can't convert the domain name, as expected. If there's a buffer overrun I'm not seeing it.

@jay jay added a commit that referenced this issue Mar 26, 2016
@jay jay url: don't use bad offset in tld_check_name to show error
libidn's tld_check_lz returns an error offset of the first character
that it failed to process, however that offset is not a byte offset and
may not even be in the locale encoding therefore we can't use it to show
the user the character that failed to process.

Bug: #731
Reported-by: Karlson2k
3d144ab
@jay
Member
jay commented Mar 26, 2016

You are correct about the bad err_pos. Fixed in 3d144ab.

@bagder
Member
bagder commented Mar 27, 2016

So the crash is removed by this?

@Karlson2k
Contributor

@bagder No, crash is caused by libidn internals.
Will try to build libidn from sources to catch exact point.

@Karlson2k Karlson2k referenced this issue in Alexpux/MINGW-packages Apr 2, 2016
Merged

Fix crash in libidn on Win64 #1276

@Karlson2k
Contributor

Crash fixed by Msys2 local patch. Alexpux/MINGW-packages#1276
Upstream pending.

@jay
Member
jay commented Apr 6, 2016

Glad you were able to find and fix the crash. The remaining issue that curl in Windows incorrectly handles IDN URLs is caused by two problems.

First is curl only works with arguments that are encoded in the current ANSI codepage. Except in the case of Cygwin you can't set the locale to UTF-8 (or you can, but it won't work properly). See #345.

Second is related to the first and that is if you are using WinIDN (which expects IDN URLs to be UTF-8 encoded) curl cannot retrieve UTF-8 encoded arguments. Or it can, but it can cause other problems down the line in libcurl since there's no way to set the locale to UTF-8.

Since we don't have time to work on this right now it's been added to KNOWN_BUGS. 9f740d3

@jay jay closed this Apr 6, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment