Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS CLI can't remove objects containing "\n" in Key #675

Closed
j3tm0t0 opened this issue Feb 25, 2014 · 3 comments · Fixed by #700
Closed

AWS CLI can't remove objects containing "\n" in Key #675

j3tm0t0 opened this issue Feb 25, 2014 · 3 comments · Fixed by #700
Labels
bug This issue is a bug. investigating This issue is being investigated and/or work is in progress to resolve the issue.

Comments

@j3tm0t0
Copy link

j3tm0t0 commented Feb 25, 2014

Here is how to reproduce the issue.

~ $ aws s3 mb s3://j3tm0t0-rmtest
make_bucket: s3://j3tm0t0-rmtest/

# upload file with \n in filename
~ $ aws s3 cp hoge s3://j3tm0t0-rmtest/hoge^Mpuge <-- Ctrl+v Ctrl+m
pugead: ./hoge to s3://j3tm0t0-rmtest/hoge

# file exists
~ $ aws s3 ls s3://j3tm0t0-rmtest/
2014-02-25 21:18:23          5 hoge
puge

# direct API shows "\n" in the key
 $ aws s3api list-objects --bucket j3tm0t0-rmtest --output json --query Contents[].Key
[
    "hoge\npuge"
]

# remove bucket --force pretends to delete it, but it doesn't actually and failed to remove bucket
~ $ aws s3 rb s3://j3tm0t0-rmtest --force
delete: s3://j3tm0t0-rmtest/hoge
puge
remove_bucket failed: s3://j3tm0t0-rmtest/ A client error (BucketNotEmpty) occurred when calling the DeleteBucket operation: The bucket you tried to delete is not empty

# it still exists
~ $ aws s3 ls s3://j3tm0t0-rmtest/
2014-02-25 21:18:23          5 hoge
puge

# rm --recursive also the same
~ $ aws s3 rm s3://j3tm0t0-rmtest/ --recursive
delete: s3://j3tm0t0-rmtest/hoge
puge
~ $ aws s3api list-objects --bucket j3tm0t0-rmtest --output json --query Contents[].Key
[
    "hoge\npuge"
]
@jamesls
Copy link
Member

jamesls commented Feb 26, 2014

Looking into this. From what I can tell it's only when we try to do a recursive type of delete (either s3 rb --force or s3 rm --recursive). Trying to delete the file directly seems to work for me:

$ aws s3 cp bar.txt s3://bucket/newlinetest/bar^M.txt
$ aws s3 ls bucket/newlinetest/
2014-02-26 07:56:04         11 bar
.txt
$ aws s3 rm s3://bucket/newlinetest/bar^M.txt
.txtte: s3://bucket/newlinetest/bar
$ aws s3 ls bucket/newlinetest/
$

It seems to be that in the recursive case we send:

"DELETE /newlinetest/bar%0A.txt HTTP/1.1" 204 0

But when I specify the single file directly we send:

"DELETE /newlinetest/bar%0D.txt HTTP/1.1" 204 0

Though it's interesting that S3 sends a 204 response in both cases. I would have expected the first request with %0A to generate a 404.

@jamesls
Copy link
Member

jamesls commented Mar 4, 2014

So I believe the issue here is our use of cElementTree as an XML parser. I can repro the issue using just the xml parser alone. What's happening is that if we give the parser keys with a carriage return, it will automatically convert them to a new line char. This is why it does not work in the recursive delete case. We need to parse the XML response to know what keys to delete. But when a single file is explicitly specified by the user, we properly encode the character.

>>> from xml.etree import cElementTree
>>> parser = cElementTree.XMLParser(target=cElementTree.TreeBuilder(), encoding='utf-8')
>>> parser.feed(b'<Node>KeyWithCarriageReturn\r.txt</Node>')
>>> tree = parser.close()
>>> tree
<Element 'Node' at 0x10157c390>
>>> tree.text
'KeyWithCarriageReturn\n.txt'

I'll need to do some investigation with our XML parser to see what options we have.

@jamesls
Copy link
Member

jamesls commented Mar 5, 2014

One potential option we have is to use the encoding-type parameter to force the key name to be urlencoded.

I'll need to investigate what, if anything, this does to performance as we'd now have to urldecode every key name, but I can confirm that it works:

$ aws s3api list-objects --bucket bucket-name --prefix newline --encoding url --query 'Contents[].Key'
[
    "newline/foo%0D.txt"
]

jamesls added a commit to jamesls/aws-cli that referenced this issue Mar 12, 2014
Botocore's xml parser does not handle control chars properly,
so we need to urlencode the keys in the response so that we're able
to handle them appropriately.

Fixes aws#675.
jamesls added a commit to jamesls/aws-cli that referenced this issue Apr 17, 2014
Fixes aws#749.  This was a regression from the fix for aws#675
where we use the encoding_type of "url" to workaround
the stdlib xmlparser not handling new lines.

The problem is that pagination in s3 uses the last key name as
the marker, and because the keys are returned urlencoded, we
need to urldecode the keys so botocore sends the correct next
marker.  In the case where urldecoded(key) != key we will incorrectly
sync new files.
jamesls added a commit to jamesls/aws-cli that referenced this issue Apr 17, 2014
Fixes aws#749.  This was a regression from the fix for aws#675
where we use the encoding_type of "url" to workaround
the stdlib xmlparser not handling new lines.

The problem is that pagination in s3 uses the last key name as
the marker, and because the keys are returned urlencoded, we
need to urldecode the keys so botocore sends the correct next
marker.  In the case where urldecoded(key) != key we will incorrectly
sync new files.

Also added an integ test for syncing with '+' chars.
jamesls added a commit to jamesls/aws-cli that referenced this issue Apr 17, 2014
Fixes aws#749.  This was a regression from the fix for aws#675
where we use the encoding_type of "url" to workaround
the stdlib xmlparser not handling new lines.

The problem is that pagination in s3 uses the last key name as
the marker, and because the keys are returned urlencoded, we
need to urldecode the keys so botocore sends the correct next
marker.  In the case where urldecoded(key) != key we will incorrectly
sync new files.

Also added an integ test for syncing with '+' chars.
thoward-godaddy pushed a commit to thoward-godaddy/aws-cli that referenced this issue Feb 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. investigating This issue is being investigated and/or work is in progress to resolve the issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants