-
-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
windows:curl -F filename error when i use chinese #10261
Comments
Please give us the version information |
@jay error version:curl 7.87.0 (x86_64-w64-mingw32) libcurl/7.87.0 OpenSSL/1.1.1s (Schannel) zlib/1.2.13 brotli/1.0.9 zstd/1.5.2 libidn2/2.3.3 libpsl/0.21.1 (+libidn2/2.3.1) libssh2/1.10.0 nghttp2/1.51.0 another version: curl 7.83.1 (Windows) libcurl/7.83.1 Schannel |
I can reproduce this in a Windows 7 VM when I change the region and locale (including non-Unicode locale) to traditional Chinese and use the mingw-w64 official curl builds after 7.83.1_2. With the same locale settings I have Visual Studio builds of 7.87.0, both no character set and multibyte character set, that will not reproduce in that VM with 7.87.0. I notice the Chinese characters are multibyte in the ansi codepage so maybe that has something to do with it, even though I can't reproduce with VS builds. This is the last good version:
This is the first bad version:
The @vszakats any idea? |
I cannot spot anything between 7.83.1_1 and 7.83.1_2 that might affect character handling (skimming through the 100 commits, which are mainly HTTP/3 / LibreSSL prep work, switching libssh2 builds to autotools, "unified" packaging, and some other unrelated things): 7.83.1_2 changed packaging to the single-zip style (curl/curl-for-win@3182733). It also enabled So I cannot see anything in this revision that might affect Chinese text. 7.83.1_3 replaced OpenSSL with openssl-quic, adding HTTP/3 support. That seems even less related: Though rereading the thread makes it unclear if this became an issue between 7.84.0 and 7.83.1, or between two distinct curl-for-win builds of 7.83.1. If the former, we need to turn to the curl source code. It's also not very likely that this is curl-for-win-specific, though if we have the regression between two revisions, I can look it up again. |
My generic guess would be this: curl on Windows doesn't have Unicode support by default. In these builds all strings are considered raw-bytes and passed around as-is (*). This makes things appear to work in certain practical cases. For some this may be perceived as curl having special support for certain codepages/encodings. The reality though that it is just a happy coincidence. (*) But, curl has certain places (and/or dependencies) which may steer off the track of the optimistic "as-is" string handling, and doing string converstion/manipulation, which instantly breaks the happy coincidences by assuming certain input formats and spitting out certain output formats. It's enough to call any Win32 API function with a This is made even more complex in curl due to using both CRT functions and the Win32 API directly, each with potentially different encoding requirement or Unicode support. (And even more complexity comes when interfacing with the dependencies curl supports.) Speaking of curl with Unicode support enabled: This is the correct path, but as of today, the level of support is just not enough to cover all cases. (And before we could cover all cases, we'd need to clear codepage requirements for each string curl is accepting or returning). The downside is that this mode breaks all the use-cases which work "correctly" by happy coincidence in non-Unicode mode, because of the added conversions and heavier use of Unicode-enabled functions needing them. Till these fundamental issues are solved, iterations of these non-ASCII issues will keep popping up. The solution is complex. Even if fully solved, with Unicode mode finished and enabled by default, it will inherently result in fallouts, because the old "happy coincidence" cases will start to break and will require correction to work as expected in a Unicode-enabled environment. (even more so for libcurl API users) |
@vszakats I believe it is caused by commit 68fa9bf in which Lines 1437 to 1445 in 38262c9
With mingw-w64's #include <libgen.h>
#include <stdio.h>
int main(){
char s[] = "测试中文-0.1.3-win.exe";
return puts(basename(s));
} |
@Cherish98: Nice catch! That commit syncs the If we can confirm this as the root cause, the fix would be to ignore Can you make a curl build without Correction: Native MSVC builds were never affected by that patch. |
Untested patch: diff --git a/lib/curl_setup.h b/lib/curl_setup.h
index 2eb9697fd..fcdfe3ca2 100644
--- a/lib/curl_setup.h
+++ b/lib/curl_setup.h
@@ -838,6 +838,10 @@ int getpwuid_r(uid_t uid, struct passwd *pwd, char *buf,
#define USE_HTTP3
#endif
+#if defined(HAVE_BASENAME) && defined(WIN32)
+#undef HAVE_BASENAME
+#endif
+
#if defined(USE_UNIX_SOCKETS) && defined(WIN32)
# if defined(__MINGW32__) && !defined(LUP_SECURE)
typedef u_short ADDRESS_FAMILY; /* Classic mingw, 11y+ old mingw-w64 */ |
|
Test binaries with disabled |
I tested the binaries with HAVE_BASENAME disabled, and it indeed has fixed the issue. |
Thanks for your test @Cherish98! I'll make a PR of that patch soon. |
The mingw-w64 |
The `basename()` [1][2] implementation provided by mingw-w64 [3] makes assumptions about input encoding and thus may break with non-ASCII strings. `basename()` was auto-detected with CMake, autotools and since 68fa9bf (2022-10-13), also in `Makefile.mk` after syncing its behaviour with the mainline build methods. A similar patch for curl-for-win broke official Windows builds earlier, in release 7.83.1_4 (2022-06-15). This patch forces all Windows builds to use curl's internal `basename()` implementation to avoid such problems. [1]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/basename.html [2]: https://www.man7.org/linux/man-pages/man3/basename.3.html [3]: https://sourceforge.net/p/mingw-w64/mingw-w64/ci/master/tree/mingw-w64-crt/misc/basename.c Reported-by: UnicornZhang on Github Assisted-by: Cherish98 on Github Fixes curl#10261 Closes curl#10475
The `basename()` [1][2] implementation provided by mingw-w64 [3] makes assumptions about input encoding and may break with non-ASCII strings. `basename()` was auto-detected with CMake, autotools and since 68fa9bf (2022-10-13), also in `Makefile.mk` after syncing its behaviour with the mainline build methods. A similar patch for curl-for-win broke official Windows builds earlier, in release 7.83.1_4 (2022-06-15). This patch forces all Windows builds to use curl's internal `basename()` implementation to avoid such problems. [1]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/basename.html [2]: https://www.man7.org/linux/man-pages/man3/basename.3.html [3]: https://sourceforge.net/p/mingw-w64/mingw-w64/ci/master/tree/mingw-w64-crt/misc/basename.c Reported-by: UnicornZhang on Github Assisted-by: Cherish98 on Github Reviewed-by: Daniel Stenberg Fixes curl#10261 Closes curl#10475
I want to upload file by curl ,but i get the file name error and missing file suffix
curl/libcurl version 7.87.0
i get file name '测试中文-0.1.3-w'
In version 7.83.1
I get file name '测试中文-0.1.3-win.exe'
System:win10
The text was updated successfully, but these errors were encountered: