Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix] s3 ls command fails when UTF-8 is piped #1844

Merged
merged 1 commit into from Mar 31, 2016

Conversation

malthejorgensen
Copy link
Contributor

Piping the output of the aws s3 ls command when used on a bucket that has keys with UTF-8 characters in it, will raise an exception, exit the program and not list all files.

E.g. aws s3 ls s3://<BUCKET-WITH-UTF8> | wc -l will print the error encode() argument 1 must be string, not None and exit, upon reaching the UTF-8 key.

It fails because when stdout is being piped sys.stdout.encoding is None.

I am aware that the PYTHONIOENCODING environment variable can be set in order to change that, but it seems that the current code is trying to default to ascii and simply failing to default: awscli/customizations/s3/utils.py, line 398 (latest commit on develop)

uni_print(u'SomeChars\u2713\u2714OtherChars', out)
# Unicode characters get replaced with their
# UTF-8 byte value.
self.assertEqual(buf.getvalue(), b'SomeChars\xe2\x9c\x93\xe2\x9c\x94OtherChars')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably missing something, but how does this get utf-8 encoded if your code change above sets the default to ascii?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was just an observation – I actually don't know :S – maybe it's a Python default when encoding unicode strings to ascii.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. I found out (I think). TextIOWrapper sets encoding to UTF-8 when it's passed encoding=None, so this test never hits the UnicodeEncodeError branch of uni_print.

I'm gonna update this branch to fix it.

Piping the output of the `aws s3 ls` command when used on a bucket that has keys with UTF-8 characters in it, will raise an exception, exit the
program and not list all files.

E.g. `aws s3 ls s3://<BUCKET-WITH-UTF8> | wc -l` will print the error
`encode() argument 1 must be string, not None` and exit, upon reaching
the UTF-8 key.

It fails because when stdout is being piped `sys.stdout.encoding` is
`None`.
@jamesls
Copy link
Member

jamesls commented Mar 30, 2016

Thanks for updating. Looks good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants