New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix] `s3 ls` command fails when UTF-8 is piped #1844

Merged
merged 1 commit into from Mar 31, 2016

Conversation

Projects
None yet
2 participants
@malthejorgensen
Contributor

malthejorgensen commented Mar 11, 2016

Piping the output of the aws s3 ls command when used on a bucket that has keys with UTF-8 characters in it, will raise an exception, exit the program and not list all files.

E.g. aws s3 ls s3://<BUCKET-WITH-UTF8> | wc -l will print the error encode() argument 1 must be string, not None and exit, upon reaching the UTF-8 key.

It fails because when stdout is being piped sys.stdout.encoding is None.

I am aware that the PYTHONIOENCODING environment variable can be set in order to change that, but it seems that the current code is trying to default to ascii and simply failing to default: awscli/customizations/s3/utils.py, line 398 (latest commit on develop)

uni_print(u'SomeChars\u2713\u2714OtherChars', out)
# Unicode characters get replaced with their
# UTF-8 byte value.
self.assertEqual(buf.getvalue(), b'SomeChars\xe2\x9c\x93\xe2\x9c\x94OtherChars')

This comment has been minimized.

@jamesls

jamesls Mar 11, 2016

Member

I'm probably missing something, but how does this get utf-8 encoded if your code change above sets the default to ascii?

This comment has been minimized.

@malthejorgensen

malthejorgensen Mar 12, 2016

Contributor

It was just an observation – I actually don't know :S – maybe it's a Python default when encoding unicode strings to ascii.

This comment has been minimized.

@malthejorgensen

malthejorgensen Mar 12, 2016

Contributor

Okay. I found out (I think). TextIOWrapper sets encoding to UTF-8 when it's passed encoding=None, so this test never hits the UnicodeEncodeError branch of uni_print.

I'm gonna update this branch to fix it.

[Fix] `s3 ls` command fails when UTF-8 is piped
Piping the output of the `aws s3 ls` command when used on a bucket that has keys with UTF-8 characters in it, will raise an exception, exit the
program and not list all files.

E.g. `aws s3 ls s3://<BUCKET-WITH-UTF8> | wc -l` will print the error
`encode() argument 1 must be string, not None` and exit, upon reaching
the UTF-8 key.

It fails because when stdout is being piped `sys.stdout.encoding` is
`None`.

@malthejorgensen malthejorgensen force-pushed the malthejorgensen:fix-s3-pipe-encoding-none branch from 35d60d1 to e1c35c2 Mar 12, 2016

@jamesls

This comment has been minimized.

Member

jamesls commented Mar 30, 2016

Thanks for updating. Looks good to me.

@jamesls jamesls merged commit e1c35c2 into aws:develop Mar 31, 2016

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment