AWS CLI can't remove objects containing "\n" in Key #675

j3tm0t0 · 2014-02-25T12:37:24Z

Here is how to reproduce the issue.

~ $ aws s3 mb s3://j3tm0t0-rmtest
make_bucket: s3://j3tm0t0-rmtest/

# upload file with \n in filename
~ $ aws s3 cp hoge s3://j3tm0t0-rmtest/hoge^Mpuge <-- Ctrl+v Ctrl+m
pugead: ./hoge to s3://j3tm0t0-rmtest/hoge

# file exists
~ $ aws s3 ls s3://j3tm0t0-rmtest/
2014-02-25 21:18:23          5 hoge
puge

# direct API shows "\n" in the key
 $ aws s3api list-objects --bucket j3tm0t0-rmtest --output json --query Contents[].Key
[
    "hoge\npuge"
]

# remove bucket --force pretends to delete it, but it doesn't actually and failed to remove bucket
~ $ aws s3 rb s3://j3tm0t0-rmtest --force
delete: s3://j3tm0t0-rmtest/hoge
puge
remove_bucket failed: s3://j3tm0t0-rmtest/ A client error (BucketNotEmpty) occurred when calling the DeleteBucket operation: The bucket you tried to delete is not empty

# it still exists
~ $ aws s3 ls s3://j3tm0t0-rmtest/
2014-02-25 21:18:23          5 hoge
puge

# rm --recursive also the same
~ $ aws s3 rm s3://j3tm0t0-rmtest/ --recursive
delete: s3://j3tm0t0-rmtest/hoge
puge
~ $ aws s3api list-objects --bucket j3tm0t0-rmtest --output json --query Contents[].Key
[
    "hoge\npuge"
]

The text was updated successfully, but these errors were encountered:

jamesls · 2014-02-26T16:16:42Z

Looking into this. From what I can tell it's only when we try to do a recursive type of delete (either s3 rb --force or s3 rm --recursive). Trying to delete the file directly seems to work for me:

$ aws s3 cp bar.txt s3://bucket/newlinetest/bar^M.txt
$ aws s3 ls bucket/newlinetest/
2014-02-26 07:56:04         11 bar
.txt
$ aws s3 rm s3://bucket/newlinetest/bar^M.txt
.txtte: s3://bucket/newlinetest/bar
$ aws s3 ls bucket/newlinetest/
$

It seems to be that in the recursive case we send:

"DELETE /newlinetest/bar%0A.txt HTTP/1.1" 204 0

But when I specify the single file directly we send:

"DELETE /newlinetest/bar%0D.txt HTTP/1.1" 204 0

Though it's interesting that S3 sends a 204 response in both cases. I would have expected the first request with %0A to generate a 404.

jamesls · 2014-03-04T18:11:55Z

So I believe the issue here is our use of cElementTree as an XML parser. I can repro the issue using just the xml parser alone. What's happening is that if we give the parser keys with a carriage return, it will automatically convert them to a new line char. This is why it does not work in the recursive delete case. We need to parse the XML response to know what keys to delete. But when a single file is explicitly specified by the user, we properly encode the character.

>>> from xml.etree import cElementTree
>>> parser = cElementTree.XMLParser(target=cElementTree.TreeBuilder(), encoding='utf-8')
>>> parser.feed(b'<Node>KeyWithCarriageReturn\r.txt</Node>')
>>> tree = parser.close()
>>> tree
<Element 'Node' at 0x10157c390>
>>> tree.text
'KeyWithCarriageReturn\n.txt'

I'll need to do some investigation with our XML parser to see what options we have.

jamesls · 2014-03-05T18:41:26Z

One potential option we have is to use the encoding-type parameter to force the key name to be urlencoded.

I'll need to investigate what, if anything, this does to performance as we'd now have to urldecode every key name, but I can confirm that it works:

$ aws s3api list-objects --bucket bucket-name --prefix newline --encoding url --query 'Contents[].Key'
[
    "newline/foo%0D.txt"
]

Botocore's xml parser does not handle control chars properly, so we need to urlencode the keys in the response so that we're able to handle them appropriately. Fixes aws#675.

Fixes aws#749. This was a regression from the fix for aws#675 where we use the encoding_type of "url" to workaround the stdlib xmlparser not handling new lines. The problem is that pagination in s3 uses the last key name as the marker, and because the keys are returned urlencoded, we need to urldecode the keys so botocore sends the correct next marker. In the case where urldecoded(key) != key we will incorrectly sync new files.

Fixes aws#749. This was a regression from the fix for aws#675 where we use the encoding_type of "url" to workaround the stdlib xmlparser not handling new lines. The problem is that pagination in s3 uses the last key name as the marker, and because the keys are returned urlencoded, we need to urldecode the keys so botocore sends the correct next marker. In the case where urldecoded(key) != key we will incorrectly sync new files. Also added an integ test for syncing with '+' chars.

jamesls added accepted labels Feb 26, 2014

jamesls added the investigating label Mar 4, 2014

jamesls added a commit to jamesls/aws-cli that referenced this issue Mar 12, 2014

URLEncode keys

a388fb2

Botocore's xml parser does not handle control chars properly, so we need to urlencode the keys in the response so that we're able to handle them appropriately. Fixes aws#675.

jamesls mentioned this issue Mar 13, 2014

Use encoding-type when listing objects for s3 transfers #700

Merged

jamesls closed this as completed in 2a345e0 Mar 14, 2014

jamesls mentioned this issue Apr 17, 2014

Fix S3 sync issue with keys containing urlencode values #755

Merged

thoward-godaddy pushed a commit to thoward-godaddy/aws-cli that referenced this issue Feb 12, 2022

Fix: Raise error for invalid environment variables file (aws#675)

e607d66

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AWS CLI can't remove objects containing "\n" in Key #675

AWS CLI can't remove objects containing "\n" in Key #675

j3tm0t0 commented Feb 25, 2014

jamesls commented Feb 26, 2014

jamesls commented Mar 4, 2014

jamesls commented Mar 5, 2014

AWS CLI can't remove objects containing "\n" in Key #675

AWS CLI can't remove objects containing "\n" in Key #675

Comments

j3tm0t0 commented Feb 25, 2014

jamesls commented Feb 26, 2014

jamesls commented Mar 4, 2014

jamesls commented Mar 5, 2014