Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix] `s3 ls` command fails when UTF-8 is piped #1844

Merged
merged 1 commit into from Mar 31, 2016

Conversation

@malthejorgensen
Copy link
Contributor

@malthejorgensen malthejorgensen commented Mar 11, 2016

Piping the output of the aws s3 ls command when used on a bucket that has keys with UTF-8 characters in it, will raise an exception, exit the program and not list all files.

E.g. aws s3 ls s3://<BUCKET-WITH-UTF8> | wc -l will print the error encode() argument 1 must be string, not None and exit, upon reaching the UTF-8 key.

It fails because when stdout is being piped sys.stdout.encoding is None.

I am aware that the PYTHONIOENCODING environment variable can be set in order to change that, but it seems that the current code is trying to default to ascii and simply failing to default: awscli/customizations/s3/utils.py, line 398 (latest commit on develop)

uni_print(u'SomeChars\u2713\u2714OtherChars', out)
# Unicode characters get replaced with their
# UTF-8 byte value.
self.assertEqual(buf.getvalue(), b'SomeChars\xe2\x9c\x93\xe2\x9c\x94OtherChars')

This comment has been minimized.

@jamesls

jamesls Mar 11, 2016
Member

I'm probably missing something, but how does this get utf-8 encoded if your code change above sets the default to ascii?

This comment has been minimized.

@malthejorgensen

malthejorgensen Mar 12, 2016
Author Contributor

It was just an observation – I actually don't know :S – maybe it's a Python default when encoding unicode strings to ascii.

This comment has been minimized.

@malthejorgensen

malthejorgensen Mar 12, 2016
Author Contributor

Okay. I found out (I think). TextIOWrapper sets encoding to UTF-8 when it's passed encoding=None, so this test never hits the UnicodeEncodeError branch of uni_print.

I'm gonna update this branch to fix it.

Piping the output of the `aws s3 ls` command when used on a bucket that has keys with UTF-8 characters in it, will raise an exception, exit the
program and not list all files.

E.g. `aws s3 ls s3://<BUCKET-WITH-UTF8> | wc -l` will print the error
`encode() argument 1 must be string, not None` and exit, upon reaching
the UTF-8 key.

It fails because when stdout is being piped `sys.stdout.encoding` is
`None`.
@malthejorgensen malthejorgensen force-pushed the malthejorgensen:fix-s3-pipe-encoding-none branch from 35d60d1 to e1c35c2 Mar 12, 2016
@jamesls
Copy link
Member

@jamesls jamesls commented Mar 30, 2016

Thanks for updating. Looks good to me.

@jamesls jamesls merged commit e1c35c2 into aws:develop Mar 31, 2016
1 check passed
1 check passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@berlic berlic mentioned this pull request Jul 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants