Content-Disposition: Add support for UTF-8 encoded filename* field. #1995

reidwagner · 2017-10-16T03:50:14Z

Closes #1888.

If -J, --remote-header-name is specified, and filename* is given in Content-Disposition, it is decoded by libcurl's curl_easy_unescape according to RFC 3986 and used as the filename for -O, --remote-name. A UTF-8 filename* takes precedence over an ASCII filename (no asterisk) if both are specified.

I tested this by serving a file with an encoded filename* field in the Content-Disposition header as specified in RFC 6266, and verifying that it saved on my filesystem as expected. I tested this with filename* alone, and both filename and filename*.

I wasn't able to write a test in tests/data that allowed me save the output file in tests/log, as --remote-name saves specifically in the current directory.

bagder · 2017-10-16T08:44:42Z

This is still red because of this:

tool_cb_hdr.c: In function ‘tool_header_cb’:
tool_cb_hdr.c:127:9: error: conversion to ‘int’ from ‘size_t’ may alter its value [-Werror=conversion]
         filename = curl_easy_unescape(outs->config->easy, filename, len, NULL);
         ^
cc1: all warnings being treated as errors
make[2]: *** [curl-tool_cb_hdr.o] Error 1

reidwagner · 2017-10-17T00:45:58Z

Okay, I added a cast to int because it's safe to assume that will be large enough. I also added a Curl_safefree on the unencoded filename buffer if it is replaced by an encoded filename.

jay · 2017-10-18T07:16:09Z

src/tool_cb_hdr.c

        /* no match, find next parameter */
        while((p < end) && (*p != ';'))
          p++;
        continue;
      }
-      p += 9;
+      p += 8;


this needs to be strict, you removed the check for the = so filenamefoo for example would be incorrectly accepted

jay · 2017-10-18T07:20:10Z

src/tool_cb_hdr.c

@@ -19,6 +19,7 @@
 * KIND, either express or implied.
 *
 ***************************************************************************/
+#include <curl/curl.h>


you don't need to include curl.h it is included automatically by tool_setup

reidwagner · 2017-10-18T23:39:00Z

Good catch. I backed up and went with a simpler approach. I also filled in a missing piece of the "ext-value" from RFC 5987.

Both "filename-parm" and "charset" need to be case insensitive (RFC 6266, 5987), so I used curl_strnequal for the new parameter, as well as to replace memcmp for filename=.

If -J, --remote-header-name is specified, and filename* is given in Content-Disposition as utf-8 with a blank language tag, e.g. filename*=utf-8''foo%C2%AE, it is decoded by libcurl's curl_easy_unescape according to RFC 3986 and used as the filename for -O, --remote-name. A UTF-8 filename* takes precedence over an ASCII filename (no asterisk) if both are specified. Replacing memcmp with curl_strnequal also adds case insensitivity to filename and filename* parameters as is specified in RFC6266 section 4.3. Closes curl#1888 Reported-by: devbazilio

jay · 2017-11-05T22:57:55Z

I wonder how this would work on Windows. I think we would need to set a separate flag is_utf8_filename and then something like this:

diff --git a/src/tool_cb_wrt.c b/src/tool_cb_wrt.c
index 6716ba5..c16b945 100644
--- a/src/tool_cb_wrt.c
+++ b/src/tool_cb_wrt.c
@@ -36,15 +36,32 @@ bool tool_create_output_file(struct OutStruct *outs)
 {
   struct GlobalConfig *global = outs->config->global;
   FILE *file;
+#ifdef WIN32
+  wchar_t *u16filename = NULL;
+#endif
 
   if(!outs->filename || !*outs->filename) {
     warnf(global, "Remote filename has no length!\n");
     return FALSE;
   }
 
+#ifdef WIN32
+  if(outs->is_utf8_filename) {
+    if(!utf8_to_utf16(filename, &u16filename)) {
+      warnf(global, "Can't convert UTF-8 filename to UTF-16\n");
+      return FALSE;
+    }
+  }
+#endif
+
   if(outs->is_cd_filename) {
     /* don't overwrite existing files */
-    file = fopen(outs->filename, "rb");
+#ifdef WIN32
+    if(u16filename)
+      file = _wfopen(u16filename, L"rb");
+    else
+#endif
+      file = fopen(outs->filename, "rb");
     if(file) {
       fclose(file);
       warnf(global, "Refusing to overwrite %s: %s\n", outs->filename,
@@ -54,7 +71,12 @@ bool tool_create_output_file(struct OutStruct *outs)
   }
 
   /* open file for writing */
-  file = fopen(outs->filename, "wb");
+#ifdef WIN32
+  if(u16filename)
+    file = _wfopen(u16filename, L"wb");
+  else
+#endif
+    file = fopen(outs->filename, "wb");
   if(!file) {
     warnf(global, "Failed to create the file %s: %s\n", outs->filename,
           strerror(errno));

How does this work on other operating systems, do they just recognize the encoding?

bagder · 2017-11-06T08:48:09Z

Most Linux systems are setup to mount and assume that file systems are UTF8. But that's not necessarily true everywhere and you can certainly mount file systems using other encodings.

ForNeVeR · 2018-01-01T13:38:57Z

curl doesn't work at all In my Windows 10 environment if I built if from the branch of this PR:

$ cd curl\winbuild
$ git checkout reidwagner/filename_encoding_support
$ nmake /f .\Makefile.vc mode=dll
$ ..\builds\libcurl-vc-x86-release-dll-ipv6-sspi-winssl\bin\curl.exe http://example.org/

It prints the example.org content (as intended) and then loops forever, entirely eating one CPU core (as not intended).

(At the same time, the version from the master branch works for me, so it looks like nothing's wrong with my environment, and that's a real trouble with the code in the branch.)

bagder · 2018-01-30T12:14:46Z

Yeah, just this problem is harder than what's being done so far in this PR. Pretending everything works fine as-is if it comes UTF-8 encoded is not going to work either since the presumption that the file system is using UTF-8 is wrong too often. Windows specifically typically doesn't encode files UTF-8.

reidwagner · 2018-01-31T03:47:44Z

Okay, that was a naive assumption in the original PR.

So a first order solution might be conditionally including code handling UTF-8 encoding for Windows (like above) and known POSIX operating systems. Then in the latter case, statfs() could be used to determine the filesystem encoding.

Or, would it just be simpler to make this opt-in, so a user that knows what they're doing can include UTF-8 support with a command line flag?

jay · 2018-01-31T06:49:14Z

It prints the example.org content (as intended) and then loops forever, entirely eating one CPU core (as not intended).

I'm unable to reproduce.

bagder added cmdline tool HTTP labels Oct 16, 2017

reidwagner force-pushed the filename_encoding_support branch from af2a0ff to 750e9bc Compare October 17, 2017 00:43

jay reviewed Oct 18, 2017

View reviewed changes

reidwagner force-pushed the filename_encoding_support branch from 750e9bc to 1dc5758 Compare October 18, 2017 23:26

reidwagner force-pushed the filename_encoding_support branch from 1dc5758 to fc669f6 Compare October 21, 2017 17:08

reidwagner force-pushed the filename_encoding_support branch from fc669f6 to cb4431a Compare October 21, 2017 17:10

bagder added the stale label Apr 26, 2018

stale bot closed this May 10, 2018

bagder mentioned this pull request May 15, 2018

curl -O should save file at the url-decoded filename #2573

Closed

lock bot locked as resolved and limited conversation to collaborators Aug 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Content-Disposition: Add support for UTF-8 encoded filename* field. #1995

Content-Disposition: Add support for UTF-8 encoded filename* field. #1995

reidwagner commented Oct 16, 2017

bagder commented Oct 16, 2017

reidwagner commented Oct 17, 2017 •

edited

Loading

jay Oct 18, 2017

jay Oct 18, 2017

reidwagner commented Oct 18, 2017

jay commented Nov 5, 2017

bagder commented Nov 6, 2017

ForNeVeR commented Jan 1, 2018 •

edited

Loading

bagder commented Jan 30, 2018

reidwagner commented Jan 31, 2018

jay commented Jan 31, 2018

Content-Disposition: Add support for UTF-8 encoded filename* field. #1995

Content-Disposition: Add support for UTF-8 encoded filename* field. #1995

Conversation

reidwagner commented Oct 16, 2017

bagder commented Oct 16, 2017

reidwagner commented Oct 17, 2017 • edited Loading

jay Oct 18, 2017

Choose a reason for hiding this comment

jay Oct 18, 2017

Choose a reason for hiding this comment

reidwagner commented Oct 18, 2017

jay commented Nov 5, 2017

bagder commented Nov 6, 2017

ForNeVeR commented Jan 1, 2018 • edited Loading

bagder commented Jan 30, 2018

reidwagner commented Jan 31, 2018

jay commented Jan 31, 2018

reidwagner commented Oct 17, 2017 •

edited

Loading

ForNeVeR commented Jan 1, 2018 •

edited

Loading