Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError: 'utf-8' codec can't decode byte #75

Closed
paulocoutinhox opened this issue Nov 17, 2023 · 8 comments · Fixed by #78
Closed

UnicodeDecodeError: 'utf-8' codec can't decode byte #75

paulocoutinhox opened this issue Nov 17, 2023 · 8 comments · Fixed by #78

Comments

@paulocoutinhox
Copy link

Hi,

Im getting this error on render.com:

Traceback (most recent call last):
Nov 17 03:33:06 AM    File "kaktos.py", line 6, in <module>
Nov 17 03:33:06 AM      system.process_command()
Nov 17 03:33:06 AM    File "/opt/render/project/src/modules/system.py", line 122, in process_command
Nov 17 03:33:06 AM      run(command_params)
Nov 17 03:33:06 AM    File "/opt/render/project/src/modules/commands/build.py", line 12, in run
Nov 17 03:33:06 AM      system.build_pages()
Nov 17 03:33:06 AM    File "/opt/render/project/src/modules/system.py", line 85, in build_pages
Nov 17 03:33:06 AM      assets.build_js()
Nov 17 03:33:06 AM    File "/opt/render/project/src/modules/assets.py", line 44, in build_js
Nov 17 03:33:06 AM      b.write(minify_js(og.read()))
Nov 17 03:33:06 AM    File "/opt/render/project/src/modules/assets.py", line 21, in minify_js
Nov 17 03:33:06 AM      result = str(es5(babel_compile(str(code))["code"]))
Nov 17 03:33:06 AM    File "/opt/render/project/src/.venv/lib/python3.8/site-packages/dukpy/babel.py", line 13, in babel_compile
Nov 17 03:33:06 AM      return evaljs(
Nov 17 03:33:06 AM    File "/opt/render/project/src/.venv/lib/python3.8/site-packages/dukpy/evaljs.py", line 138, in evaljs
Nov 17 03:33:06 AM      return JSInterpreter().evaljs(code, **kwargs)
Nov 17 03:33:06 AM    File "/opt/render/project/src/.venv/lib/python3.8/site-packages/dukpy/evaljs.py", line 31, in __init__
Nov 17 03:33:06 AM      self._init_process()
Nov 17 03:33:06 AM    File "/opt/render/project/src/.venv/lib/python3.8/site-packages/dukpy/evaljs.py", line 87, in _init_process
Nov 17 03:33:06 AM      self.evaljs("process = {}; process.env = dukpy.environ", environ=dict(os.environ))
Nov 17 03:33:06 AM    File "/opt/render/project/src/.venv/lib/python3.8/site-packages/dukpy/evaljs.py", line 61, in evaljs
Nov 17 03:33:06 AM      return json.loads(res.decode('utf-8'))
Nov 17 03:33:06 AM  UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 6510: invalid continuation byte

Do you know what can be wrong?

@amol-
Copy link
Owner

amol- commented Mar 8, 2024

As it seems it has not occurred again, I'll be closing this one unless someone has a JS snippet that can reproduce the issue

@robinvandernoord
Copy link
Contributor

I get the same error when running the example:

python3.11 -m venv venv
. venv/bin/activate
pip install dukpy
python
import dukpy
dukpy.typescript_compile("console.log('hi')") # or any other code
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/tmp/ts/venv/lib/python3.11/site-packages/dukpy/tsc.py", line 11, in typescript_compile
    return evaljs(
           ^^^^^^^
  File "/tmp/ts/venv/lib/python3.11/site-packages/dukpy/evaljs.py", line 138, in evaljs
    return JSInterpreter().evaljs(code, **kwargs)
           ^^^^^^^^^^^^^^^
  File "/tmp/ts/venv/lib/python3.11/site-packages/dukpy/evaljs.py", line 31, in __init__
    self._init_process()
  File "/tmp/ts/venv/lib/python3.11/site-packages/dukpy/evaljs.py", line 87, in _init_process
    self.evaljs("process = {}; process.env = dukpy.environ", environ=dict(os.environ))
  File "/tmp/ts/venv/lib/python3.11/site-packages/dukpy/evaljs.py", line 61, in evaljs
    return json.loads(res.decode('utf-8'))
                      ^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 3095: invalid continuation byte

Using the cli gives the same error

@amol-
Copy link
Owner

amol- commented May 17, 2024

I'm unable to reproduce the issues locally, is there anything specific to the system that might be influencing the encoding? Maybe the system locale isn't utf-8 or something like that? (even though that shouldn't matter)

Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import dukpy
>>> dukpy.typescript_compile("console.log('hi')") # or any other code
"System.register([], function(exports_1) {\n    return {\n        setters:[],\n        execute: function() {\n            console.log('hi');\n        }\n    }\n});\n"
>>> 

@robinvandernoord
Copy link
Contributor

I've tried it on another machine and it works there. Both machines are running linux mint (based on ubuntu 22.04) with $LC_NAME = nl_NL.UTF-8.

I downloaded the repo and think I found out what is going wrong:
self.evaljs("process = {}; process.env = dukpy.environ", environ=dict(os.environ))

os.environ contains PS1. My Bash prompt (PS1) is pretty customized and contains an emoji on my desktop (to indicate which machine I'm on when running multiple shells over ssh), which should be valid UTF-8 but seems to be the cause of this issue anyway.

The emoji is represented in the res variable as
\xed\xa0\xbc\xed\xbf\xa0, but when I convert it to utf-8 bytes myself, it is
\xf0\x9f\x8f\xa0.

I see there's a test function:

    def test_unicode(self):
        s = dukpy.evaljs("dukpy.c + 'A'", c="華")
        assert s == '華A'

If you change the unicode character 華 to something like 🏠, I predict you'll get the same exception.
I tried to look at the code and saw some unicode/encoding logic in duktape.c but my C knowledge doesn't go nearly far enough to know what's happening in that file.

@amol-
Copy link
Owner

amol- commented May 20, 2024

Thanks, this is helpful, I'll try to debug it as soon as I can

@robinvandernoord
Copy link
Contributor

I have (partly) solved the issue (I think): #78
However, if you use an emoji in the code itself the error still occurs, which I can't seem to fix yet - I think this happens somewhere in eval_string.

If you want to close my PR and rather debug it yourself, I also understand of course!

@amol- amol- closed this as completed in #78 May 29, 2024
@amol-
Copy link
Owner

amol- commented Jun 4, 2024

@robinvandernoord @paulocoutinhox would you mind testing with https://github.com/amol-/dukpy/pull/79/files ?
That might address the encoding issues.

@paulocoutinhox
Copy link
Author

Can you create a new version/release?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants