Allow any bytes (including non-UTF8 ones) in List Objects response XML #1255
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR addresses #974 (RCS-289)
By some reasons, List Object response XML can not be valid.
#x9
,#xA
and#xD
for characters <#x1F
. Then, PUT Object with path like%01
in URL-encoded form can not be included in valid XML 1.0.
AWS S3 does allow such bytes [1].
%01
tonumeric-character-reference-like-but-just-invalid byte in XML,

[2].s3cmd and aws cli both fails to parse response including

.This policy that this PR chooses:
them are valid XML 1.0 characters, then List Object responds contents
which is valid XML 1.0.
1.0 but as =reasonable for humans= as possible in order to deliver
information about keys in buckets.
The actual logic is very simple. Just return bytes as it has been
uploaded except xml escaping
<
,>
and&
[3]. (For reviewers,the main commit in this PR is b28fec7,
others are just refactoring.)
For example, assuming uploaded key was
%01
, then list resultsincludes binary like
<<"<Key>", 16#01, "</Key>">>
(in Erlangnotation). Users can manipulate such response by grep, sed, or
anything if XML library fails [4]. What one should do are only:
<Key>
and</Key>
(not ambiguous because<
is escaped)&*
references[1] Example by s3curl to AWS S3
Sidenote:
%00
is NOT allowd.Seems like AWS S3 validation is based on XML 1.1 character range for
PUT request {shrug}.
[2] Extracted from XML response:
<Contents><Key>abcdef</Key>[snip...]
.[3] Numeric reference like (but not valid) representation, e.g.

isnot used. It's because 1. it is not still valid in XML 1.0 because
it is outside of character set and 2. if one treat it as XML 1.1 then
the byte
0x01
(or<<1>>
in Erlang) is valid as is.[4] s3curl does nice job for such lower level manipulation. AWS CLI also
nice because it output response body to stderr if it fails to parse
it as XML. s3cmd can produce such output by
-d
debug switch.