Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I'm seeing "%2F" instead of "/" in remote paths, which causes "s3 sync" to always delete/replace unchanged files #1879

Closed
Xavion opened this issue Mar 26, 2016 · 7 comments
Labels
closing-soon This issue will automatically close in 4 days unless further comments are made.

Comments

@Xavion
Copy link

Xavion commented Mar 26, 2016

In recent weeks, I've noticed that "aws s3 sync" always deletes and replaces the remote files (even if they haven't changed). This, of course, makes the sync operations much longer than they should be (as everything has to be re-uploaded).

I think it's because the final "/" in the remote path is getting converted to "%2F". This didn't use to happen, so I'm guessing it's due to a bug somewhere in the code. I've tried downgrading "botocore" to v1.3.23 and "aws-cli" to v1.10.1, but this unfortunately doesn't fix the problem.

Below is an example of the output I'm receiving these days. The file hasn't changed in months, so there's no way it should be deleted/replaced. Unfortunately, no matter how many times I run this command, I see the below output. In which component is the bug and how can I work around it?

$ aws s3 sync --endpoint-url https://objects.dreamhost.com --acl public-read --delete two/ s3://one/two/
delete: s3://one/two%2Fthree.txt
upload: two/three.txt to s3://one/two/three.txt
@jamesls
Copy link
Member

jamesls commented Mar 28, 2016

I'm not seeing this behavior. We'll need more information to help troubleshoot:

  • Do you have a set of steps you can share that demonstrate the issue?
  • What's the output of aws s3 ls s3://one/two/ after you've run the sync command?

Here's what I tried:

/tmp/repro $ mkdir two
/tmp/repro $ echo "hello world" > two/three.txt
/tmp/repro $ aws s3 sync . s3://bucket-name/one/two/ --delete
upload: two/three.txt to s3://bucket-name/one/two/two/three.txt

# 1. Verify the file is not uploaded with '%2F'
/tmp/repro $ aws s3 ls bucket-name/one/ --recursive
2016-03-28 16:36:31         12 one/two/two/three.txt

# 2. Verify the file is not resync'd:
/tmp/repro $ aws s3 sync . s3://bucket-name/one/two/ --delete
/tmp/repro $

@jamesls jamesls added the closing-soon This issue will automatically close in 4 days unless further comments are made. label Mar 28, 2016
@Xavion
Copy link
Author

Xavion commented Mar 29, 2016

Hi James,

I just repeated your steps and the result was as follows:

$ mkdir two
$ echo "hello world" > two/three.txt
$ aws s3 sync --endpoint-url https://objects.dreamhost.com . s3://bucket/one/ --delete
upload: two/three.txt to s3://bucket/one/two/three.txt
$ aws s3 ls --endpoint-url https://objects.dreamhost.com bucket/one/ --recursive
2016-03-29 11:47:43         12 one%2Ftwo%2Fthree.txt
$ aws s3 sync --endpoint-url https://objects.dreamhost.com . s3://bucket/one/ --delete
delete: s3://bucket/one%2Ftwo%2Fthree.txt
upload: two/three.txt to s3://bucket/one/two/three.txt

I then went a step further by issuing the "rm" command manually:

$ aws s3 rm --endpoint-url https://objects.dreamhost.com s3://bucket/one/two/three.txt
delete: s3://bucket/one/two/three.txt

This means the bug is isolated to the "ls" command.

I'm an Arch Linux user. You can see the dependency tree for the "aws-cli" package here.

@robbat2
Copy link

robbat2 commented Mar 29, 2016

Dreamhost Ceph dev here; I think this might be a Ceph bug, but that part of the S3 API docs are unclear...

The latest master branch of ceph http://github.com/ceph/ceph, doing the ls gives this ListBucketResult; whitespace added for readability.

Notice that the Key blocks have URL encoding of the /.

2016-03-28 20:29:29,351 - MainThread - botocore.vendored.requests.packages.urllib3.connectionpool - DEBUG - "GET /awscli-1879?prefix=&encoding-type=url HTTP/1.1" 200 1040
2016-03-28 20:29:29,351 - MainThread - botocore.parsers - DEBUG - Response headers: {'date': 'Tue, 29 Mar 2016 03:29:29 GMT', 'content-length': '1040', 'x-amz-request-id': 'tx0000000000000000057dc-0056f9f699-106d-default', 'content-type': 'application/xml'}
2016-03-28 20:29:29,352 - MainThread - botocore.parsers - DEBUG - Response body:
<?xml version="1.0" encoding="UTF-8"?>
<ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
  <Name>awscli-1879</Name>
  <Prefix/>
  <Marker/>
  <MaxKeys>1000</MaxKeys>
  <IsTruncated>false</IsTruncated>
  <Contents>
    <Key>dir1%2Fdir2%2Ffile1</Key>
    <LastModified>2016-03-29T03:28:54.330Z</LastModified>
    <ETag>"6d2c6aaf55038715886efb02e580d10c"</ETag>
    <Size>906</Size>
    <StorageClass>STANDARD</StorageClass>
    <Owner>
      <ID>test1</ID>
      <DisplayName>test1</DisplayName>
    </Owner>
  </Contents>
  <Contents>
    <Key>dir1%2Ffile1</Key>
    <LastModified>2016-03-29T03:28:50.841Z</LastModified>
    <ETag>"6d2c6aaf55038715886efb02e580d10c"</ETag>
    <Size>906</Size>
    <StorageClass>STANDARD</StorageClass>
    <Owner>
      <ID>test1</ID>
      <DisplayName>test1</DisplayName>
    </Owner>
  </Contents>
  <Contents>
    <Key>file1</Key>
    <LastModified>2016-03-29T03:28:45.454Z</LastModified>
    <ETag>"6d2c6aaf55038715886efb02e580d10c"</ETag>
    <Size>906</Size>
    <StorageClass>STANDARD</StorageClass>
    <Owner>
      <ID>test1</ID>
      <DisplayName>test1</DisplayName>
    </Owner>
  </Contents>
</ListBucketResult>

The API document here: http://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketGET.html
Says that padding encoding-type=url will get URL encoding, but doesn't specify which characters should be encoded.

Here is where the change was introduced in Ceph:
ceph/ceph@180ca7b5

Xavion referenced this issue in ceph/ceph Mar 29, 2016
This change introduces handling for the encoding-type request
parameter on the get bucket operation. An object key may contain
characters which are not supported in XML. Passing the value "url" for
the encoding-type parameter will cause the key to be urlencoded in the
response.

Fixes: #12735
Signed-off-by: Jeff Weber <jweber@cofront.net>
@robbat2
Copy link

robbat2 commented Mar 29, 2016

For completeness, here is the response of ListBucketResult on AWS s3:

2016-03-28 22:31:21,346 - MainThread - botocore.parsers - DEBUG - Response headers: {'x-amz-bucket-region': 'eu-west-1', 'x-amz-id-2': 'f/EMOFQciZ7mHsIq13xHaQtxpMK2plNJqGn3ZDaYAzrsWPDip1dmtIR7qR9HuUhHY2gWpEXIFQo=', 'server': 'AmazonS3', 'transfer-encoding': 'chunked', 'x-amz-request-id': 'FEFA3E2825687075', 'date': 'Tue, 29 Mar 2016 05:31:22 GMT', 'content-type': 'application/xml'}
2016-03-28 22:31:21,346 - MainThread - botocore.parsers - DEBUG - Response body:
<?xml version="1.0" encoding="UTF-8"?>
<ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
  <Name>BUCKET</Name>
  <Prefix/>
  <Marker/>
  <MaxKeys>1000</MaxKeys>
  <EncodingType>url</EncodingType>
  <IsTruncated>false</IsTruncated>
  <Contents>
    <Key>dir1/dir2/file1</Key>
    <LastModified>2016-03-29T05:31:10.000Z</LastModified>
    <ETag>"6d2c6aaf55038715886efb02e580d10c"</ETag>
    <Size>906</Size>
    <Owner>
      <ID>XXX</ID>
      <DisplayName>XXX</DisplayName>
    </Owner>
    <StorageClass>STANDARD</StorageClass>
  </Contents>
  <Contents>
    <Key>dir1/file1</Key>
    <LastModified>2016-03-29T05:31:07.000Z</LastModified>
    <ETag>"6d2c6aaf55038715886efb02e580d10c"</ETag>
    <Size>906</Size>
    <Owner>
      <ID>XXX</ID>
      <DisplayName>XX</DisplayName>
    </Owner>
    <StorageClass>STANDARD</StorageClass>
  </Contents>
  <Contents>
    <Key>file1</Key>
    <LastModified>2016-03-29T05:30:59.000Z</LastModified>
    <ETag>"6d2c6aaf55038715886efb02e580d10c"</ETag>
    <Size>906</Size>
    <Owner>
      <ID>XXX</ID>
      <DisplayName>XXX</DisplayName>
    </Owner>
    <StorageClass>STANDARD</StorageClass>
  </Contents>
</ListBucketResult>

@robbat2
Copy link

robbat2 commented Mar 29, 2016

Tests with an actual unicode character also show breakage:

Ceph

$ aws s3  cp 我 s3://ceph-unicode/ --endpoint-url CENSORED
upload: ./我 to s3://ceph-unicode/我
$ aws s3  ls s3://ceph-unicode/ --endpoint-url CENSORED
2016-03-28 22:40:07        906 %E6%88%91

Debug Log:
2016-03-28 22:40:16,066 - MainThread - botocore.parsers - DEBUG - Response headers: {'date': 'Tue, 29 Mar 2016 05:40:16 GMT', 'content-length': '526', 'x-amz-request-id': 'tx0000000000000000057e6-0056fa1540-106d-default', 'content-type': 'application/xml'}
2016-03-28 22:40:16,066 - MainThread - botocore.parsers - DEBUG - Response body:
<?xml version="1.0" encoding="UTF-8"?>
<ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
  <Name>ceph-unicode</Name>
  <Prefix/>
  <Marker/>
  <MaxKeys>1000</MaxKeys>
  <Delimiter>/</Delimiter>
  <IsTruncated>false</IsTruncated>
  <Contents>
    <Key>%E6%88%91</Key>
    <LastModified>2016-03-29T05:40:07.686Z</LastModified>
    <ETag>"6d2c6aaf55038715886efb02e580d10c"</ETag>
    <Size>906</Size>
    <StorageClass>STANDARD</StorageClass>
    <Owner>
      <ID>test1</ID>
      <DisplayName>test1</DisplayName>
    </Owner>
  </Contents>
</ListBucketResult>

AWS

$ aws s3  cp 我 s3://BUCKET/ 
upload: ./我 to s3://BUCKET/我
$ aws s3  ls s3://BUCKET/
2016-03-28 22:35:00        906 我

Debug log:
2016-03-28 22:35:14,664 - MainThread - botocore.parsers - DEBUG - Response headers: {'x-amz-bucket-region': 'eu-west-1', 'x-amz-id-2': 'zg6JhOD7IKLsOxaE4oHRtvY0LFjhTQf4qLelRHlx6UaC6/5Q/BMLL9EYwi+nWfM6AaB1vU8QMf0=', 'server': 'AmazonS3', 'transfer-encoding': 'chunked', 'x-amz-request-id': 'DCEB4659A4596B34', 'date': 'Tue, 29 Mar 2016 05:35:15 GMT', 'content-type': 'application/xml'}
2016-03-28 22:35:14,664 - MainThread - botocore.parsers - DEBUG - Response body:
<?xml version="1.0" encoding="UTF-8"?>
<ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
  <Name>BUCKET</Name>
  <Prefix/>
  <Marker/>
  <MaxKeys>1000</MaxKeys>
  <Delimiter>/</Delimiter>
  <EncodingType>url</EncodingType>
  <IsTruncated>false</IsTruncated>
  <Contents>
    <Key>%E6%88%91</Key>
    <LastModified>2016-03-29T05:35:00.000Z</LastModified>
    <ETag>"6d2c6aaf55038715886efb02e580d10c"</ETag>
    <Size>906</Size>
    <Owner>
      <ID>XXX</ID>
      <DisplayName>XXX</DisplayName>
    </Owner>
    <StorageClass>STANDARD</StorageClass>
  </Contents>
</ListBucketResult>

@Xavion
Copy link
Author

Xavion commented Mar 29, 2016

For the sake of completeness, I've just mentioned this problem on the DreamObjects forum. As you can see, I've also linked to the patch that apparently fixes it. One of the DreamHost Ceph developers will hopefully merge it in soon ;-).

@jamesls
Copy link
Member

jamesls commented Mar 30, 2016

Looks like there's a few threads tracking this issue. Am I correct in summarizing that the root cause was related to an issue in ceph/ceph and the fix for that issue has been merged?

If that's the case I'm going to go ahead and close out this issue as I don't think there's anything needed on our end. Let me know if I've missed anything and I'll reopen and take another look.

@jamesls jamesls closed this as completed Mar 30, 2016
thoward-godaddy pushed a commit to thoward-godaddy/aws-cli that referenced this issue Feb 12, 2022
Co-authored-by: Jacob Fuss <jfuss@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closing-soon This issue will automatically close in 4 days unless further comments are made.
Projects
None yet
Development

No branches or pull requests

3 participants