-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix unicode argument processing for py2 #679
Conversation
In python2, sys.argv is a bytestring of whatever encoding is used by the terminal. In python3, sys.argv is a list of unicode strings. This causes problems because the rest of the code assumes unicode. The fix is to automatically decode to unicode based on sys.stdin as soon as we parse the args. This was originally reported in aws#593, and boto/botocore#218. I'll need to more investigation to see if this problem applies to JSON files via file://, this commit only fixes the case where unicode is specified on the command line.
This happens in our unittest.
@@ -43,6 +45,19 @@ def _check_value(self, action, value): | |||
msg.extend(extra) | |||
raise argparse.ArgumentError(action, '\n'.join(msg)) | |||
|
|||
def parse_known_args(self, args, namespace=None): | |||
parsed, remaining = super(CLIArgParser, self).parse_known_args(args, namespace) | |||
terminal_encoding = getattr(sys.stdin, 'encoding', 'utf-8') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you really want a default value of utf-8
here? I guess I'm not sure what it would mean for sys.stdin
not to have an encoding
attribute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've only encountered it when something patches out sys.stdin, and is file-like enough to work, but is missing attributes such as encoding
. For example, our test runner will do something like:
original = sys.stdin
sys.stdin = cStringIO.StringIO()
try:
...
finally:
sys.stdin = original
Otherwise, LGTM FWIW |
LGTM. |
In python2, sys.argv is a bytestring of whatever encoding
is used by the terminal. In python3, sys.argv is a list of unicode
strings. This causes problems because the rest of the code assumes
unicode.
The fix is to automatically decode to unicode based on sys.stdin
as soon as we parse the args.
This was originally reported in #593, and
boto/botocore#218.
I'll need to more investigation to see if this problem applies
to JSON files via file://, this commit only fixes the case where
unicode is specified on the command line.
Before:
After: