Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws s3 cp with --metadata-directive REPLACE not guessing correct Content-Type #6078

Closed
2 tasks done
bersalazar opened this issue Apr 8, 2021 · 9 comments
Closed
2 tasks done
Assignees
Labels
closed-for-staleness guidance Question that needs advice or information. s3mimetype s3 v2

Comments

@bersalazar
Copy link

bersalazar commented Apr 8, 2021

Confirm by changing [ ] to [x] below to ensure that it's a bug:

Describe the bug
When using awscli v2 and copying a file using aws s3 cp with the --metadata-directive REPLACE parameter, Content-Type is not correctly guessed for files and are set to binary/octet-stream after upload.

This behavior is not present when running the same command from Ubuntu or when using awscli v1 from macOS.

SDK version number
aws-cli/2.1.35 Python/3.8.8 Darwin/19.6.0 exe/x86_64 prompt/off

Platform/OS/Hardware/Device
macOS 10.15.7 (Catalina)

To Reproduce (observed behavior)
aws s3 cp <bucket>/<file> <bucket>/<file> --metadata-directive REPLACE

Expected behavior
Content-type is correctly guessed by aws s3 cp, even if the source metadata is replaced.

Logs/output
debug-logs-from-aws-s3-cp.txt

Additional context
awscli makes use of Python's mimetypes module for guessing MIME types for files, which are usually installed in /etc/mime.types in Ubuntu, however, this is not the directory where the mime.types file is present for macOS. They are found in /etc/apache2/mime.types which the mimetypes module has listed as knownfiles for fetching MIME types. In spite of this, awscli doesn't seem to pick up them up.

Python 3.8.5 (default, Feb 16 2021, 11:12:50)
[Clang 12.0.0 (clang-1200.0.32.29)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import mimetypes
>>> mimetypes.knownfiles
['/etc/mime.types', '/etc/httpd/mime.types', '/etc/httpd/conf/mime.types', '/etc/apache/mime.types', '/etc/apache2/mime.types', '/usr/local/etc/httpd/conf/mime.types', '/usr/local/lib/netscape/mime.types', '/usr/local/etc/httpd/conf/mime.types', '/usr/local/etc/mime.types']

This issue is not present in awscli v1, where the --metadata-directive REPLACE parameter guesses correctly the Content-Types when copying.

@bersalazar bersalazar added the needs-triage This issue or PR still needs to be triaged. label Apr 8, 2021
@stobrien89 stobrien89 self-assigned this Apr 10, 2021
@stobrien89 stobrien89 added bug This issue is a bug. s3 s3mimetype and removed needs-triage This issue or PR still needs to be triaged. labels Apr 10, 2021
@stobrien89
Copy link
Member

Hi @bersalazar,

Thanks for pointing this out— I was able to reproduce. Marking as a bug for now.

@stobrien89
Copy link
Member

stobrien89 commented Apr 23, 2021

Hi @bersalazar,

#6115 should take care of this— next V2 release will hopefully be next week. Please feel free to reach out if you're still having issues after that!

@stobrien89
Copy link
Member

Hi @bersalazar,

After a review of the v1 behavior and the PR I mentioned in my last comment, it looks like we actually explicitly removed support for guessing mimetypes for copies in v2 now that we copy over metadata— it was only by accident that v1 supports injecting mimetypes. Even in the docs for v1 --no-guess-mimetype, it states that this is only supported for uploads.

Would you be able to clarify what your particular use case is for this? We may have a current set of parameters that could work and we're hesitant to add this 'mistake' back if there's not a strong reason for it. Thanks!

@stobrien89 stobrien89 added feature-request A feature should be added or improved. response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. and removed bug This issue is a bug. labels Apr 29, 2021
@bersalazar
Copy link
Author

Hi @stobrien89

Thanks for the replies. The use case is indeed when copying files from bucket to bucket. The source and target buckets are the same in this case, the idea is to use aws cp to inject the --cache-control 'no-cache' header to a HTML file, after an aws s3 sync.

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Apr 30, 2021
@stobrien89
Copy link
Member

Hi @bersalazar,

I may be missing something here, but I did some testing with aws s3 sync s3://src-bucket s3://dest-bucket --cache-control no-cache and that appears to copy over the correct content-type, as well as the no-cache header. Have you tried this as well?

@stobrien89 stobrien89 added the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label May 3, 2021
@bersalazar
Copy link
Author

hey @stobrien89, aws sync probably works correctly. The issue is on aws cp. The workflow is as follows:

  • aws sync from local to bucketA a big directory of static files
  • aws cp from bucketA to bucketA to set --cache-control no-cache --metadata-directive REPLACE to one specific file

Content-Type is correctly set on sync, however on cp it doesn't carry over the correct Content-Type for that one file, it replaces with binary/octet-stream.

I understand --metadata-directive REPLACE would only set the values specified in the command's parameters, but maybe the ones that are NOT specified should be guessed, or copied? Trying to understand what the correct behavior should be here.

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label May 4, 2021
@stobrien89
Copy link
Member

Hi @bersalazar,

My apologies— I almost forgot you were needing to copy a single file rather than an entire directory.

So the 'correct' behavior is actually found in v2 for reasons I mentioned before— if you compare debug logs between v1 and v2 for aws s3 cp s3://src-bucket s3://dest-bucket --cache-control 'no-cache' --metadata-directive REPLACE, v1 actually sends the content-type header as a part of the copy request. REPLACE, in theory, should only have the metadata values that were specified by the CLI command.

In this case, you may be better off not specifying --metadata-directive REPLACE in v2, as the correct content-type is retained during the copy, or manually specify --content-type as text/html if there are other metadata values that you need to replace.

To test, you can create a blank index.html file (or something to that effect), use cp to upload to your source bucket, call s3api head-object on the uploaded object, copy the file to the destination bucket using cp with cache-control and then call s3api head-object on the file in the destination bucket to display the content-type and cache-control metadata.

Hope this helps!

@stobrien89 stobrien89 added closing-soon This issue will automatically close in 4 days unless further comments are made. and removed feature-request A feature should be added or improved. labels May 5, 2021
@stobrien89 stobrien89 added the guidance Question that needs advice or information. label May 5, 2021
@github-actions github-actions bot added closed-for-staleness and removed closing-soon This issue will automatically close in 4 days unless further comments are made. labels May 9, 2021
@github-actions github-actions bot closed this as completed May 9, 2021
@bersalazar
Copy link
Author

OK, thanks for the help and clarification @stobrien89, really appreciated!

@stobrien89
Copy link
Member

Of course— Hope this works for you! Please feel free to reach out at any time if you have any additional questions.

@aws aws locked and limited conversation to collaborators Apr 22, 2022
@kdaily kdaily converted this issue into discussion #6895 Apr 22, 2022

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
closed-for-staleness guidance Question that needs advice or information. s3mimetype s3 v2
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants