Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formatting Emoji Symbol messages in Azure Devops with Windows agent #455

Closed
sasi143 opened this issue Sep 7, 2022 · 7 comments
Closed
Labels
bug Something isn't working

Comments

@sasi143
Copy link

sasi143 commented Sep 7, 2022

Expected Behavior

Getting issue with the below command when running in Azure DevOps with Windows Agent (Working well with Linux Agent)
"dbx deploy --deployment-file config/adb_deployment.yaml --workflow training-pipeline"

I am feeling "Emoji" symbol (Snake symbol) causing this issue where the windows system is unable to format those

Current Behavior

| 16 | | 17 class IncrementalEncoder(codecs.IncrementalEncoder): | | 18 def encode(self, input, final=False): | | > 19 return codecs.charmap_encode(input,self.errors,encoding_table | | 20 | | 21 class IncrementalDecoder(codecs.IncrementalDecoder): | | 22 def decode(self, input, final=False): | | | | +-------------------------------- locals ---------------------------------+ | | | final = False | | | | input = '[dbx][2022-09-07 07:39:26.949] \U0001f40d Building a Python-based | | | | project\r\n' | | | | self = <encodings.cp1252.IncrementalEncoder object at | | | | 0x0000021BF10E9160> | | | +-------------------------------------------------------------------------+ | +-----------------------------------------------------------------------------+ UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f40d' in position 31: character maps to <undefined>

Steps to Reproduce (for bugs)

Running Latest dbx==0.7.4 package in Azure DevOps with Windows Agent

Your Environment

Azure DevOps with Windows Latest Agent

  • dbx version used: 0.7.4
  • Databricks Runtime version: 10.4.x-scala2.12
@renardeinside renardeinside added the bug Something isn't working label Sep 7, 2022
@sasi143
Copy link
Author

sasi143 commented Sep 7, 2022

In a testing analysis, Until dbx==0.6.12 works well. The issue is starting from the dbx==0.7.0 version to the latest (0.7.4)

@NodeJSmith
Copy link

NodeJSmith commented Nov 6, 2022

Fwiw, I've seen this issue in Azure Devops pipelines with windows agents in other OSS, notably Prefect. I ended up monkey patching their code to remove the offending character before it was printed. Here is the offending line in their code, also a unicode character.

@renardeinside
Copy link
Contributor

I'm not sure what's the root cause of this. Linking the ticket in the relevant library.

@NodeJSmith
Copy link

I know a ticket was opened on Typer's GH, but I just thought it was worth pointing this out - this same bug happens in the build pipeline for this repo. It just gets handled better so it doesn't cause the pipeline to fail.

image

@NodeJSmith
Copy link

I've been chasing this down all week and have determined that:

a) this is a Windows only problem and mainly on Windows agents (as opposed to the general Python user's workstation) because commands ran by the Windows agent will have their output redirected to a file. This affects Windows agents used by Azure DevOps and GitHub Actions, I can't speak to any other CI tool's images.

b) there are a few solutions you can use to resolve this, but they all have to happen in the pipeline, and cannot be implemented by libraries (so far as I can tell, at least)

  • You can add the environment variable 'PYTHONUTF8' to your pipeline with the value of 1
  • You can add the environment variable 'PYTHONIOENCODING' to your pipeline with the value of 'utf8'
  • You can call python/python.exe with the argument -X utf8 before your script (e.g. python -X utf8 ./path/to/python_file.py)

Note: If using an environment variable you will need to set this on the pipeline itself, not in a run command.

Azure DevOps example:

variables:
- name: PYTHONUTF8
  value: 1

Github Actions example:

env:
  PYTHONIOENCODING: "utf8"

Quick explanation

This is caused by the windows agent running any commands provided in a way that pipes output to a file. You can cause the same error in your own terminal (if you have a Windows machine) by running a python command or a python library CLI and using > to redirect the output to a file.

PS C:\Users\UserName> python -c "print('└')"
└

PS C:\Users\UserName> python -c "print('└')" > test_file.txt
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Program Files\Python38\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2514' in position 0: character maps to <undefined>

Because the output is sent to a file and not to the Windows console, all of the tricks that libraries like Click, Typer, and Rich employ to print Unicode on Windows consoles are not applicable. And because the file that the output is being redirected to is opened outside of user control, you cannot specify an encoding of UTF8 to resolve this. The file will be opened using the preferred locale (locale.getpreferredencoding(False)) which is usually not a Unicode compatible code page. For the hosted Windows agents it is cp1525, which is why the error messages show something similar to File "C:\Program Files\Python38\lib\encodings\cp1252.py", line 19, in encode.

This could be resolved if the Windows agents did not redirect the output of commands to files or if Windows had a default code page that was unicode compatible. I believe the latter may be happening in Windows 11, although it seems to be introducing problems of its own.

@sasi143
Copy link
Author

sasi143 commented Dec 9, 2022

Thanks a lot, @NodeJSmith & Kevin Deldyck . Your solution helped me a lot by adding the below in the pipeline task
env: PYTHONIOENCODING: "utf8"

@sasi143 sasi143 closed this as completed Dec 9, 2022
@truonghoanglam
Copy link

truonghoanglam commented Jan 16, 2023

Thanks a lot, @NodeJSmith & Kevin Deldyck . Your solution helped me a lot by adding the below in the pipeline task env: PYTHONIOENCODING: "utf8"

Hi @sasi143 , can you please show me how to add this to task in Prefect? I don't know to add that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants