-
-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Content-Disposition: Add support for UTF-8 encoded filename* field. #1995
Conversation
This is still red because of this:
|
af2a0ff
to
750e9bc
Compare
Okay, I added a cast to int because it's safe to assume that will be large enough. I also added a Curl_safefree on the unencoded filename buffer if it is replaced by an encoded filename. |
src/tool_cb_hdr.c
Outdated
/* no match, find next parameter */ | ||
while((p < end) && (*p != ';')) | ||
p++; | ||
continue; | ||
} | ||
p += 9; | ||
p += 8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to be strict, you removed the check for the = so filenamefoo for example would be incorrectly accepted
src/tool_cb_hdr.c
Outdated
@@ -19,6 +19,7 @@ | |||
* KIND, either express or implied. | |||
* | |||
***************************************************************************/ | |||
#include <curl/curl.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need to include curl.h it is included automatically by tool_setup
750e9bc
to
1dc5758
Compare
Good catch. I backed up and went with a simpler approach. I also filled in a missing piece of the "ext-value" from RFC 5987. Both "filename-parm" and "charset" need to be case insensitive (RFC 6266, 5987), so I used curl_strnequal for the new parameter, as well as to replace memcmp for filename=. |
1dc5758
to
fc669f6
Compare
If -J, --remote-header-name is specified, and filename* is given in Content-Disposition as utf-8 with a blank language tag, e.g. filename*=utf-8''foo%C2%AE, it is decoded by libcurl's curl_easy_unescape according to RFC 3986 and used as the filename for -O, --remote-name. A UTF-8 filename* takes precedence over an ASCII filename (no asterisk) if both are specified. Replacing memcmp with curl_strnequal also adds case insensitivity to filename and filename* parameters as is specified in RFC6266 section 4.3. Closes curl#1888 Reported-by: devbazilio
fc669f6
to
cb4431a
Compare
I wonder how this would work on Windows. I think we would need to set a separate flag is_utf8_filename and then something like this: diff --git a/src/tool_cb_wrt.c b/src/tool_cb_wrt.c
index 6716ba5..c16b945 100644
--- a/src/tool_cb_wrt.c
+++ b/src/tool_cb_wrt.c
@@ -36,15 +36,32 @@ bool tool_create_output_file(struct OutStruct *outs)
{
struct GlobalConfig *global = outs->config->global;
FILE *file;
+#ifdef WIN32
+ wchar_t *u16filename = NULL;
+#endif
if(!outs->filename || !*outs->filename) {
warnf(global, "Remote filename has no length!\n");
return FALSE;
}
+#ifdef WIN32
+ if(outs->is_utf8_filename) {
+ if(!utf8_to_utf16(filename, &u16filename)) {
+ warnf(global, "Can't convert UTF-8 filename to UTF-16\n");
+ return FALSE;
+ }
+ }
+#endif
+
if(outs->is_cd_filename) {
/* don't overwrite existing files */
- file = fopen(outs->filename, "rb");
+#ifdef WIN32
+ if(u16filename)
+ file = _wfopen(u16filename, L"rb");
+ else
+#endif
+ file = fopen(outs->filename, "rb");
if(file) {
fclose(file);
warnf(global, "Refusing to overwrite %s: %s\n", outs->filename,
@@ -54,7 +71,12 @@ bool tool_create_output_file(struct OutStruct *outs)
}
/* open file for writing */
- file = fopen(outs->filename, "wb");
+#ifdef WIN32
+ if(u16filename)
+ file = _wfopen(u16filename, L"wb");
+ else
+#endif
+ file = fopen(outs->filename, "wb");
if(!file) {
warnf(global, "Failed to create the file %s: %s\n", outs->filename,
strerror(errno)); How does this work on other operating systems, do they just recognize the encoding? |
Most Linux systems are setup to mount and assume that file systems are UTF8. But that's not necessarily true everywhere and you can certainly mount file systems using other encodings. |
curl doesn't work at all In my Windows 10 environment if I built if from the branch of this PR: $ cd curl\winbuild
$ git checkout reidwagner/filename_encoding_support
$ nmake /f .\Makefile.vc mode=dll
$ ..\builds\libcurl-vc-x86-release-dll-ipv6-sspi-winssl\bin\curl.exe http://example.org/ It prints the example.org content (as intended) and then loops forever, entirely eating one CPU core (as not intended). (At the same time, the version from the master branch works for me, so it looks like nothing's wrong with my environment, and that's a real trouble with the code in the branch.) |
Yeah, just this problem is harder than what's being done so far in this PR. Pretending everything works fine as-is if it comes UTF-8 encoded is not going to work either since the presumption that the file system is using UTF-8 is wrong too often. Windows specifically typically doesn't encode files UTF-8. |
Okay, that was a naive assumption in the original PR. So a first order solution might be conditionally including code handling UTF-8 encoding for Windows (like above) and known POSIX operating systems. Then in the latter case, statfs() could be used to determine the filesystem encoding. Or, would it just be simpler to make this opt-in, so a user that knows what they're doing can include UTF-8 support with a command line flag? |
I'm unable to reproduce. |
Closes #1888.
If -J, --remote-header-name is specified, and filename* is given in Content-Disposition, it is decoded by libcurl's curl_easy_unescape according to RFC 3986 and used as the filename for -O, --remote-name. A UTF-8 filename* takes precedence over an ASCII filename (no asterisk) if both are specified.
I tested this by serving a file with an encoded filename* field in the Content-Disposition header as specified in RFC 6266, and verifying that it saved on my filesystem as expected. I tested this with filename* alone, and both filename and filename*.
I wasn't able to write a test in tests/data that allowed me save the output file in tests/log, as --remote-name saves specifically in the current directory.