Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unicode error when view source file which is not utf-8 encoded #292

Open
kuna opened this issue Sep 10, 2019 · 3 comments
Open

unicode error when view source file which is not utf-8 encoded #292

kuna opened this issue Sep 10, 2019 · 3 comments

Comments

@kuna
Copy link

kuna commented Sep 10, 2019

I encountered this issue when I attempt to open cp949 encoded file. I think same issue would occur with shift-jis encoded file and these can be easily reproduced.

output
'utf-8' codec can't decode byte 0xc0 in position ~

The problem is in backend.py:read_file method. In here, it attempts to read source file with default open() method, which reads file as unicode by default. Exception occurs when file is not in unicode. I think codecs.open with correct encoding option is necessary. correct encoding shall be passed with gdbgui parameter or use session environment variable.

Currently I modified backend.py:689 line to set encoding from my environment variable and no problem.

...
sys_enc = os.getenv('LC_ALL', 'utf-8')                               
with codecs.open(path, "r", encoding=sys_enc) as f:
...

Environments are,
Ubuntu 14.04
gdbgui 0.13.2.0 (downloaded from pip)
gdb 8.2
firefox 66.0.3

Thanks.

ps. Fixed error message as previous one was incorrect.

@kuna kuna changed the title UnicodeDecodeError when view source file which is not utf-8 encoded unicode error when view source file which is not utf-8 encoded Sep 10, 2019
@ruoruo220
Copy link

Hi kuna, I'm facing the same problem now. I tried to follow your solution but I couldn't find the backend.py. At your convenience, could you give me some advice? Thanks!

@GitMensch
Copy link

GitMensch commented Dec 30, 2021

Looks like you search for

with open(path, "r") as f:

Just out of interest: what is the output of locale?
According to https://docs.python.org/3.9/library/functions.html#open python uses the preferred user encoding https://docs.python.org/3.9/library/locale.html#locale.getpreferredencoding so setting up LANG and friends should help - doesn't it?

Looks like #347 is related to this.

@kuna
Copy link
Author

kuna commented Mar 19, 2022

Hi @GitMensch,
I got interested in your suggestion and tested a little:

LANG=ko_KR.cp949 python test.py test_cp949.txt
ko_KR.cp949
Traceback (most recent call last):
  File "/Users/dongwon/dev/test.py", line 8, in <module>
    for l in f.readlines():
  File "/Users/dongwon/.pyenv/versions/3.10.2/lib/python3.10/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 18: invalid start byte

LANG=cp949 also didn't work. So, LANG seems like not effective in this case.
And from 'locale' documentation it seems it always set encoding to UTF-8, so that wouldn't help.
Well, maybe methods in this link(Korean document) seems works, but anyway code should be changed in this way.

kubouch added a commit to kubouch/gdbgui that referenced this issue Mar 21, 2022
This PR implements a fix suggested in cs01#407.

There are also other UTF-8 related issues:
cs01#130
cs01#347
cs01#292
cs01 pushed a commit that referenced this issue Oct 18, 2023
This PR implements a fix suggested in #407.

There are also other UTF-8 related issues:
#130
#347
#292
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants