Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to upload in the case I use multipart upload #787

Closed
KOBA789 opened this issue Feb 3, 2014 · 6 comments
Closed

Failed to upload in the case I use multipart upload #787

KOBA789 opened this issue Feb 3, 2014 · 6 comments
Labels
Milestone

Comments

@KOBA789
Copy link

KOBA789 commented Feb 3, 2014

Using the filename including multibyte chars, it fails to upload it.

I found the content of Initiate Multipart Upload XML is broken like this:

<?xml version="1.0" encoding="UTF-8"?><InitiateMultipartUploadResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Bucket>isidai</Bucket><Key>�����/������.m4v</Key><UploadId>d16oiXpDTBCmJhsZpDoRWA==</UploadId></InitiateMultipartUploadResult>

The value of Key element is corrupted.

Some S3 compatible clients use this element value, so it failed to upload files containing multibyte chars.

The cause is here.

    case riak_cs_mp_utils:initiate_multipart_upload(Bucket, list_to_binary(Key),
                                                    ContentType, User, Opts,
                                                    RiakcPid) of
        {ok, UploadId} ->
            XmlDoc = {'InitiateMultipartUploadResult',
                       [{'xmlns', "http://s3.amazonaws.com/doc/2006-03-01/"}],
                       [
                        {'Bucket', [binary_to_list(Bucket)]},
                        {'Key', [Key]},  %% !!HERE!!
                        {'UploadId', [binary_to_list(base64url:encode(UploadId))]}
                       ]
                     },
            Body = riak_cs_xml:export_xml([XmlDoc]),
            RD2 = wrq:set_resp_body(Body, RD),
            {true, RD2, Ctx};

In L.110, Key is not a binary but a list. Because riak_cs_xml:export_xml encodes lists into utf8, Key will be broken, but Key should be a binary. Binaries are not encoded by riak_cs_xml:export_xml.

@isidai
Copy link

isidai commented Feb 3, 2014

Related to this issue
It will resolve this problem.

Cyberduck is effected by this problem.
Cyberduck uses XML's value of response when request last finalizing POST.
so the last request's URL different from the first request and part object requests.

POST /isidai/%E3%83%86%E3%82%B9%E3%83%88%E3%81%A0%E3%81%AC%2F%E6%97%A5%E6%9C%AC%E8%AA%9E%E3%83%86%E3%82%B9%E3%83%88.m4v?uploads HTTP/1.1

PUT /isidai/%E3%83%86%E3%82%B9%E3%83%88%E3%81%A0%E3%81%AC%2F%E6%97%A5%E6%9C%AC%E8%AA%9E%E3%83%86%E3%82%B9%E3%83%88.m4v?uploadId=d16oiXpDTBCmJhsZpDoRWA%3D%3D&partNumber=3 HTTP/1.1

POST /isidai/%C3%A3%C2%83%C2%86%C3%A3%C2%82%C2%B9%C3%A3%C2%83%C2%88%C3%A3%C2%81%C2%A0%C3%A3%C2%81%C2%AC%2F%C3%A6%C2%97%C2%A5%C3%A6%C2%9C%C2%AC%C3%A8%C2%AA%C2%9E%C3%A3%C2%83%C2%86%C3%A3%C2%82%C2%B9%C3%A3%C2%83%C2%88.m4v?uploadId=d16oiXpDTBCmJhsZpDoRWA%3D%3D HTTP/1.1

@shino
Copy link
Contributor

shino commented Feb 3, 2014

@KOBA789 @isidai Thank you for reporting the issue!
I confirmed the bug by using python boto library.

@shino shino added the Bug label Feb 3, 2014
@shino
Copy link
Contributor

shino commented Feb 3, 2014

First-aid dirty diff:

diff --git a/src/riak_cs_wm_object_upload.erl b/src/riak_cs_wm_object_upload.erl
index 4bc0dc4..1c4af88 100644
--- a/src/riak_cs_wm_object_upload.erl
+++ b/src/riak_cs_wm_object_upload.erl
@@ -100,3 +100,3 @@ process_post(RD, Ctx=#context{local_context=LocalCtx,
                         {'Bucket', [binary_to_list(Bucket)]},
-                        {'Key', [Key]},
+                        {'Key', [unicode:characters_to_list(list_to_binary(Key))]},
                         {'UploadId', [binary_to_list(base64url:encode(UploadId))]}
diff --git a/src/riak_cs_wm_object_upload_part.erl b/src/riak_cs_wm_object_upload_part.erl
index 5b16a4e..d450a89 100644
--- a/src/riak_cs_wm_object_upload_part.erl
+++ b/src/riak_cs_wm_object_upload_part.erl
@@ -297,3 +297,3 @@ to_xml(RD, Ctx=#context{local_context=LocalCtx,
                        {'Bucket', [binary_to_list(Bucket)]},
-                       {'Key', [Key]},
+                       {'Key', [unicode:characters_to_list(list_to_binary(Key))]},
                        {'UploadId', [binary_to_list(base64url:encode(UploadId))]},

Should keys are treated as UTF-8 encoded binary() everywhere in riak_cs?
The Key above is a list created by converting UTF-8 binary as if it is Latin-1 encoded one (simply speaking, byte-by-byte mapping).

@kuenishi
Copy link
Contributor

kuenishi commented Feb 3, 2014

This reminds me of very old issue: #317

@shino
Copy link
Contributor

shino commented Feb 3, 2014

@shino
Copy link
Contributor

shino commented Feb 14, 2014

@KOBA789 @isidai This bug has been fixed by #807. Will be included next release 1.4.5.
If you want to test the fix before release, you can use release/1.4 branch.

@shino shino closed this as completed Feb 14, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants