Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix unicode argument processing for py2 #679

Merged
merged 4 commits into from
Mar 3, 2014
Merged

Conversation

jamesls
Copy link
Member

@jamesls jamesls commented Feb 28, 2014

In python2, sys.argv is a bytestring of whatever encoding
is used by the terminal. In python3, sys.argv is a list of unicode
strings. This causes problems because the rest of the code assumes
unicode.

The fix is to automatically decode to unicode based on sys.stdin
as soon as we parse the args.

This was originally reported in #593, and
boto/botocore#218.

I'll need to more investigation to see if this problem applies
to JSON files via file://, this commit only fixes the case where
unicode is specified on the command line.

Before:

$ aws elasticbeanstalk create-application --application-name ✓

'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128)

After:

$ aws elasticbeanstalk create-application --application-name ✓
{
    "Application": {
        "ApplicationName": "\u00e2\u009c\u0093",
        "ConfigurationTemplates": [],
        "DateUpdated": "2014-02-28T21:39:52.988Z",
        "Versions": [],
        "DateCreated": "2014-02-28T21:39:52.988Z"
    }
}

In python2, sys.argv is a bytestring of whatever encoding
is used by the terminal.  In python3, sys.argv is a list of unicode
strings.  This causes problems because the rest of the code assumes
unicode.

The fix is to automatically decode to unicode based on sys.stdin
as soon as we parse the args.

This was originally reported in aws#593, and
boto/botocore#218.

I'll need to more investigation to see if this problem applies
to JSON files via file://, this commit only fixes the case where
unicode is specified on the command line.
This happens in our unittest.
@@ -43,6 +45,19 @@ def _check_value(self, action, value):
msg.extend(extra)
raise argparse.ArgumentError(action, '\n'.join(msg))

def parse_known_args(self, args, namespace=None):
parsed, remaining = super(CLIArgParser, self).parse_known_args(args, namespace)
terminal_encoding = getattr(sys.stdin, 'encoding', 'utf-8')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really want a default value of utf-8 here? I guess I'm not sure what it would mean for sys.stdin not to have an encoding attribute.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've only encountered it when something patches out sys.stdin, and is file-like enough to work, but is missing attributes such as encoding. For example, our test runner will do something like:

original = sys.stdin
sys.stdin = cStringIO.StringIO()
try:
  ...
finally:
  sys.stdin = original

@garnaat
Copy link
Contributor

garnaat commented Mar 2, 2014

Otherwise, LGTM FWIW

@toastdriven
Copy link
Contributor

LGTM. :shipit:

@jamesls jamesls merged commit 0adc501 into aws:develop Mar 3, 2014
@jamesls jamesls deleted the unicode-args branch June 23, 2014 18:29
thoward-godaddy pushed a commit to thoward-godaddy/aws-cli that referenced this pull request Feb 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants