Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider use UTF-8 by default for Azure CLI #28497

Open
doggy8088 opened this issue Mar 2, 2024 · 11 comments
Open

Consider use UTF-8 by default for Azure CLI #28497

doggy8088 opened this issue Mar 2, 2024 · 11 comments
Assignees
Labels
Account az login/account Auto-Assign Auto assign by bot Azure CLI Team The command of the issue is owned by Azure CLI team customer-reported Issues that are reported by GitHub users external to the Azure organization. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that
Milestone

Comments

@doggy8088
Copy link

doggy8088 commented Mar 2, 2024

Describe the bug

I was reported a bug on StackOverflow: https://stackoverflow.com/q/78008939/910074

When I have to use UTF-8 for my default console output encoding ([Console]::OutputEncoding), the Azure CLI unable to handle Chinese characters because Encoding issue. It cause either Chinese chars missing or produce messy code.

Related command

$(az account list -o json)

az account list -o json | jq '.'

Errors

image

Issue script & Debug output

It's an encoding issue.

Expected behavior

I expected Azure CLI can handle Chinese characters correctly.

Environment Summary

azure-cli 2.57.0

core 2.57.0
telemetry 1.1.0

Extensions:
account 0.2.3
azure-devops 0.25.0
front-door 1.0.16
interactive 0.4.5
k8s-extension 1.2.4
managementpartner 0.1.3

Dependencies:
msal 1.26.0
azure-mgmt-resource 23.1.0b2

Python location 'C:\Program Files\Microsoft SDKs\Azure\CLI2\python.exe'
Extensions directory 'C:\Users\wakau.azure\cliextensions'

Python (Windows) 3.11.7 (tags/v3.11.7:fa7a6f2, Dec 4 2023, 19:24:49) [MSC v.1937 64 bit (AMD64)]

Legal docs and information: aka.ms/AzureCliLegal

Your CLI is up-to-date.

Additional context

I have a workaround by now. Just edit C:\Program Files\Microsoft SDKs\Azure\CLI2\wbin\az.cmd file. Add -X utf8 to the python arguments.

::
:: Microsoft Azure CLI - Windows Installer - Author file components script
:: Copyright (C) Microsoft Corporation. All Rights Reserved.
::

@IF EXIST "%~dp0\..\python.exe" (
  SET AZ_INSTALLER=MSI
  "%~dp0\..\python.exe" -X utf8 -IBm azure.cli %*
) ELSE (
  echo Failed to load python executable.
  exit /b 1
)
@doggy8088 doggy8088 added the bug This issue requires a change to an existing behavior in the product in order to be resolved. label Mar 2, 2024
@yonzhan
Copy link
Collaborator

yonzhan commented Mar 2, 2024

Thank you for opening this issue, we will look into it.

@microsoft-github-policy-service microsoft-github-policy-service bot added customer-reported Issues that are reported by GitHub users external to the Azure organization. Auto-Assign Auto assign by bot ARM az resource/group/lock/tag/deployment/policy/managementapp/account management-group labels Mar 2, 2024
@microsoft-github-policy-service microsoft-github-policy-service bot added Azure CLI Team The command of the issue is owned by Azure CLI team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Mar 2, 2024
@yonzhan yonzhan added this to the Backlog milestone Mar 3, 2024
@yonzhan yonzhan removed the bug This issue requires a change to an existing behavior in the product in order to be resolved. label Mar 3, 2024
@yonzhan yonzhan added Account az login/account and removed ARM az resource/group/lock/tag/deployment/policy/managementapp/account management-group labels Mar 4, 2024
@jiasli
Copy link
Member

jiasli commented Mar 4, 2024

I am able to repro with the latest PowerShell 7.4.1. My system locale is English (United States):

image

Printing to console is fine:

> az group show -n testrg
{
  ...
  "tags": {
    ...
    "key1": "测试"
  },
  ...
}

But a warning is shown when redirecting:

> az group show -n testrg > out.txt
WARNING: Unable to encode the output with cp1252 encoding. Unsupported characters are discarded.

(Actually, I wrote that warning in microsoft/knack#178.)

According to https://docs.python.org/3/library/sys.html#sys.stdout

sys.stdout
Non-character devices such as disk files and pipes use the system locale encoding (i.e. the ANSI codepage).

So changing the console's encoding with [Console]::OutputEncoding = [Text.UTF8Encoding]::new() won't affect Python's output encoding.

I would recommend changing your system encoding to UTF-8 (follow microsoft/knack#178), so that you won't need to modify the az.cmd entry script every time you update Azure CLI.

Also see: python/cpython#74595

@doggy8088
Copy link
Author

Changing the system encoding to UTF-8 is not an option for most of non-English locale people.

@jiasli
Copy link
Member

jiasli commented Mar 4, 2024

Changing the system encoding to UTF-8 is not an option for most of non-English locale people.

Can you explain why? My personal desktop computer is using UTF-8 as I need to display Chinese (Simplified, China).

image

@jiasli
Copy link
Member

jiasli commented Mar 4, 2024

I can verify Windows PowerShell 5.1 can't handle UTF-8 correctly:

> $PSVersionTable

Name                           Value
----                           -----
PSVersion                      5.1.22621.2506
PSEdition                      Desktop
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0...}
BuildVersion                   10.0.22621.2506
CLRVersion                     4.0.30319.42000
WSManStackVersion              3.0
PSRemotingProtocolVersion      2.3
SerializationVersion           1.1.0.1

> [Console]::OutputEncoding

IsSingleByte      : True
BodyName          : IBM437
EncodingName      : OEM United States
HeaderName        : IBM437
WebName           : IBM437
WindowsCodePage   : 1252
IsBrowserDisplay  : False
IsBrowserSave     : False
IsMailNewsDisplay : False
IsMailNewsSave    : False
EncoderFallback   : System.Text.InternalEncoderBestFitFallback
DecoderFallback   : System.Text.InternalDecoderBestFitFallback
IsReadOnly        : False
CodePage          : 437

> & "C:\Program Files\Microsoft SDKs\Azure\CLI2\python.exe" -X utf8 -c "print('测试測試')" > out.txt ; Get-Content out.txt
测试測試

This can be fixed by setting [Console]::OutputEncoding = [Text.UTF8Encoding]::new():

> [Console]::OutputEncoding = [Text.UTF8Encoding]::new()

> [Console]::OutputEncoding

BodyName          : utf-8
EncodingName      : Unicode (UTF-8)
HeaderName        : utf-8
WebName           : utf-8
WindowsCodePage   : 1200
IsBrowserDisplay  : True
IsBrowserSave     : True
IsMailNewsDisplay : True
IsMailNewsSave    : True
IsSingleByte      : False
EncoderFallback   : System.Text.EncoderReplacementFallback
DecoderFallback   : System.Text.DecoderReplacementFallback
IsReadOnly        : False
CodePage          : 65001

> & "C:\Program Files\Microsoft SDKs\Azure\CLI2\python.exe" -X utf8 -c "print('测试測試')" > out.txt ; Get-Content out.txt
测试測試

https://stackoverflow.com/a/78023334/2199657 mentions PowerShell 7.4 doesn't interpret the redirected data anymore.

https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_redirection?view=powershell-7.4#redirecting-output-from-native-commands

PowerShell 7.4 changed the behavior of the redirection operators when used to redirect the stdout stream of a native command. The redirection operators now preserve the byte-stream data when redirecting output from a native command. PowerShell doesn't interpret the redirected data or add any additional formatting.

Simply calling python -X utf8 will work:

> & "C:\Program Files\Microsoft SDKs\Azure\CLI2\python.exe" -X utf8 -c "print('测试測試')" > out.txt ; Get-Content out.txt
测试測試

Same approach can be used to call Azure CLI:

> & "C:\Program Files\Microsoft SDKs\Azure\CLI2\python.exe" -X utf8 -IBm azure.cli group show -n testrg > out.txt ; Get-Content out.txt
{
  ...
  "tags": {
    ...
    "key1": "测试測試"
  },
  ...
}

@jiasli
Copy link
Member

jiasli commented Mar 4, 2024

Wait. As you are already using cp950 which is big5: ANSI/OEM Traditional Chinese (Taiwan; Hong Kong SAR, PRC); Chinese Traditional (Big5) according to https://learn.microsoft.com/en-us/windows/win32/intl/code-page-identifiers, I guess you are trying to parse characters not in cp950. May I know the original Chinese character that is causing problem?

@doggy8088
Copy link
Author

I'm okay with the cp950 in both Windows PowerShell or PowerShell 7+.

It because I installed Oh-My-Posh in PowerShell and used in Windows Terminal. So I have to use UTF-8 in the Console. That's why I need az.cmd to output UTF-8 by default.

@jiasli
Copy link
Member

jiasli commented Mar 5, 2024

It because I installed Oh-My-Posh in PowerShell and used in Windows Terminal. So I have to use UTF-8 in the Console.

I fail to understand the relationship between Oh-My-Posh and encoding. Could you give more context on this? I don't think it is Oh-My-Posh that causes the encoding error. May I know the original Chinese character that is causing problem?

@doggy8088
Copy link
Author

It doesn't matter what original Chinese character are. All Chinese characters will be truncated from the output.

For your confusing, it because Oh-My-Posh can define special unicode font to display symbols on the prompt, like this:

image

So that my Console output encoding must be in UTF-8 encoding. Let's why I don't set cp950 on the Console.

@jiasli
Copy link
Member

jiasli commented Mar 7, 2024

I don't think this got anything to do with Oh-My-Posh when redirection is involved. Without redirection, like a pure az account list, the output is indeed in UTF-8.

https://docs.python.org/3/library/sys.html#sys.stdout

On Windows, UTF-8 is used for the console device.

> python -c "import sys; print(sys.stdout.encoding)"
utf-8

In your original screenshot, Azure CLI is trying to encode its output with cp950, but certain characters can't be encoded by cp950 showing as "unsupported":

image

Besides Azure CLI, you can repro this issue with Python:

> python -c "print('测试')" > out.txt
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\jiasli\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-1: character maps to <undefined>

> python -c "import sys; print(sys.stdout.encoding)" > out.txt ; Get-Content out.txt
cp1252

@doggy8088
Copy link
Author

doggy8088 commented Mar 7, 2024

Here is my test:

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Account az login/account Auto-Assign Auto assign by bot Azure CLI Team The command of the issue is owned by Azure CLI team customer-reported Issues that are reported by GitHub users external to the Azure organization. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that
Projects
None yet
Development

No branches or pull requests

4 participants