Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List object fails when non-utf8 key is in the bucket [JIRA: RCS-289] #974

Closed
shino opened this issue Sep 22, 2014 · 9 comments
Closed

List object fails when non-utf8 key is in the bucket [JIRA: RCS-289] #974

shino opened this issue Sep 22, 2014 · 9 comments

Comments

@shino
Copy link
Contributor

shino commented Sep 22, 2014

Updated 2015-09-16:

Original description used key name rebar.config%FF in user layer, s3cmd url-encode'd it to %25FF which was not non-utf8. The original behaivior came from a combination with rewrite double decode bug #910. The right reproduction step is to use %FF in HTTP layer after URL encoding.
It can be done by, for example, s3curl.pl.

Original description below


Origianlly zd#8754 (I guess).

Steps to reproduce:

  • Put non-UTF-8 key to a certain bucket
  • Request List object to the bucket
$ s3cmd -c .s3cfg.15018.alice put rebar.config 's3://test2/rebar.config%FF'
$ S3CURL=.s3curl.15018.alice s3curl.pl  --id cs -- -s -x localhost:15018 http://test.s3.amazonaws.com/'' X L
<?xml version="1.0"?>
<html>
  <head>
    <title>500 Internal Server Error</title>
  </head>
  <body><h1>Internal Server Error</h1>The server encountered an error while processing this request</body>
</html>

console.log

01:01:04.857 [error] Lager event handler error_logger_lager_h exited with reason
 {'EXIT',{{badmatch,["/buckets/test/objects",{error,{error,function_clause,
   [{xmerl_lib,expand_attributes,["abc",1,[{error,1},{'Key',1},{'Contents',16},{'ListBucketResult',1}]],
   [{file,"src/xmerl_lib.erl"},{line,234}]},{xmerl_lib,expand_element,4,[{file,"src/xmerl_lib.erl"},{line,179}]},{xmerl_lib,expand_content,4,[{file,"src/xmerl_lib.erl"},{line,224}]},{xmerl_lib,expand_element,4,[{file,"src/xmerl_lib.erl"},{line,187}]},{xmerl_lib,expand_content,4,[{file,"src/xmerl_lib.erl"},{line,224}]},{xmerl_lib,...},...]}},...]},...}}

inspector

$ ./scripts.local/riak_cs_inspector.sh 127.0.0.1 dev1 test2 | od -t x1z
0000000 43 6f 6e 6e 65 63 74 69 6e 67 20 74 6f 20 31 32  >Connecting to 12<
0000020 37 2e 30 2e 30 2e 31 3a 31 30 30 31 37 2e 2e 2e  >7.0.0.1:10017...<
0000040 0a 4b 65 79 20 3d 3d 3d 3d 3d 3d 3d 3d 3d 3d 3d  >.Key ===========<
0000060 3d 3d 3d 3d 3d 3d 3d 3d 3d 3d 3d 3d 3d 3d 3d 3d  >================<
0000100 3d 3a 20 53 69 62 6c 2e 20 3d 3d 20 53 74 61 74  >=: Sibl. == Stat<
0000120 65 20 3d 3d 3d 3d 3d 3d 3d 3d 3d 3d 20 55 55 49  >e ========== UUI<
0000140 44 20 3d 3d 3d 3d 3d 3d 3d 3d 3d 3d 3d 3d 3d 3d  >D ==============<
0000160 3d 3d 3d 3d 3d 3d 3d 3d 3d 3d 3d 3d 3d 20 43 6f  >============= Co<
0000200 6e 74 65 6e 74 2d 4c 65 6e 67 74 68 3d 3d 20 46  >ntent-Length== F<
0000220 69 72 73 74 20 42 6c 6f 63 6b 20 3d 3d 3d 3d 0a  >irst Block ====.<
0000240 72 65 62 61 72 2e 63 6f 6e 66 69 67 ff 20 20 20  >rebar.config.   <
0000260 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20  >                <
0000300 3a 20 31 20 20 20 20 20 20 20 20 61 63 74 69 76  >: 1        activ<
0000320 65 20 20 20 20 20 20 20 20 20 20 20 33 32 66 62  >e           32fb<
0000340 62 32 66 65 37 30 34 30 34 33 64 66 61 34 63 34  >b2fe704043dfa4c4<
0000360 33 31 62 62 64 65 38 64 62 63 66 61 20 20 20 20  >31bbde8dbcfa    <
0000400 20 20 20 20 20 20 20 20 20 32 37 32 33 20 46 6f  >         2723 Fo<
0000420 75 6e 64 20 20 20 20 20 20 20 20 20 20 20 0a 72  >und           .r<
0000440 65 62 61 72 2e 63 6f 6e 66 69 67 ff 20 20 20 20  >ebar.config.    <
0000460 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 3a  >               :<
0000500 20 31 20 20 20 20 20 20 20 20 61 63 74 69 76 65  > 1        active<
0000520 20 20 20 20 20 20 20 20 20 20 20 39 61 35 34 62  >           9a54b<
0000540 62 65 66 61 35 35 30 34 65 36 63 38 32 65 32 61  >befa5504e6c82e2a<
0000560 64 61 61 35 66 30 36 30 38 35 62 20 20 20 20 20  >daa5f06085b     <
0000600 20 20 20 20 20 20 20 20 32 37 32 33 20 46 6f 75  >        2723 Fou<
0000620 6e 64 20 20 20 20 20 20 20 20 20 20 20 0a        >nd           .<
0000636

The result above are in release/1.5.1 branch just after 1.5.1 tag a19681c .

@shino shino added the Bug label Sep 22, 2014
@shino shino added this to the 2.0.0 milestone Sep 22, 2014
@shino
Copy link
Contributor Author

shino commented Sep 22, 2014

I have put "bug" label to this, but I'm not confident it is a bug.
It is nice to confirtm how S3 behaives for this senario.

Some informations about XML and characters

The current implementation to convert UTF-8 encoded binary to
the list of unicode characters is:

riak_cs_xml

format_value(Val) when is_binary(Val) ->
    unicode:characters_to_list(Val, unicode);

Should we treat {error, _, _} case?

@shino
Copy link
Contributor Author

shino commented Sep 22, 2014

#910 is similar symptom. Its root cause is different (or more complicated).

@shino
Copy link
Contributor Author

shino commented Sep 24, 2014

My bad 😅
This is completely a duplicate of #910.

Close.

@shino
Copy link
Contributor Author

shino commented Sep 10, 2015

Some discussion from chat

http://www.w3.org/TR/xml11/
Char       ::=          [#x1-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
https://en.wikipedia.org/wiki/Valid_characters_in_XML

Note that the code point U+0000, assigned to the null control
character, is the only character encoded in Unicode and ISO/IEC 10646
that is always invalid in any XML 1.0 and 1.1 document.  ```
http://www.fileformat.info/info/unicode/char/0000/index.htm
NULL!
2> unicode:characters_to_list(<<0>>).
[0]
3> unicode:characters_to_list(<<128>>).
{error,[],<<128>>}

@Basho-JIRA Basho-JIRA changed the title List object fails when non-utf8 key is in the bucket List object fails when non-utf8 key is in the bucket [JIRA: RCS-289] Sep 10, 2015
@shino
Copy link
Contributor Author

shino commented Sep 10, 2015

Current conjecture:

validation for key byte sequence:

  1. transform it assuming UTF-8 encoded. If error happens, its invalid. If succeeded then
  2. ensure every "codepoint" after transformation is in "XML 1.1 character" definitions [#x1-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

@shino
Copy link
Contributor Author

shino commented Sep 10, 2015

Not a duplicate of #910. Not resolved yet.

@shino shino reopened this Sep 10, 2015
@shino shino modified the milestones: 2.2.0, 2.0.0 Sep 10, 2015
@shino
Copy link
Contributor Author

shino commented Sep 29, 2015

List Objects Response XML is not 1.1 but 1.0. Character set is a little narrower than that of XML 1.1.
Particularly, every control code (from '0x00' to '0x1F') except HT/LF/CR is invalid.

Character Range

[2]     Char ::= #x9 | #xA | #xD    |      /* HT, LF and CR */
                 [#x20-#xD7FF]      |        
                                           /* Excludes high/low surrogate CPs */
                 [#xE000-#xFFFD]    |
                                           /* Excludes two non characters FFFE, FFFF */
                 [#x10000-#x10FFFF]

From - Extensible Markup Language (XML) 1.0 (Fifth Edition) http://www.w3.org/TR/REC-xml/#charsets

@shino
Copy link
Contributor Author

shino commented Nov 5, 2015

Additional memo on Delete Multiple Objects API, it does NOT accept (pseud-) numeric character reference &#x1;, example response at 2015-11-05 was :

<?xml version="1.0" encoding="UTF-8"?>
<Error>
  <Code>MalformedXML</Code>
  <Message>The XML you provided was not well-formed or did not validate against our published schema</Message>
  <RequestId>AEE909649A8508A7</RequestId>
  <HostId>/h680JmV8WxdvKlstfH/Ln93m4uvSSOUJzfKIHwxY3KS2MIHtS+da7xyBEIHEIbrS9w960QFOL8=</HostId>
</Error>

Requested XML

<Delete>
  <Object>
    <Key>1</Key>
    <Key>&#x1;</Key>
  </Object>
</Delete>

@kuenishi kuenishi modified the milestones: 2.1.1, 2.2.0 Dec 9, 2015
@kuenishi
Copy link
Contributor

kuenishi commented Dec 9, 2015

Looks like this will be fixed in 2.1.1 (or 2.0.2) in a kludgy way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants