Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tikzplot raises UnicodeDecodeError and terminates process #51

Open
nyanpasu64 opened this issue Jun 10, 2019 · 3 comments
Open

tikzplot raises UnicodeDecodeError and terminates process #51

nyanpasu64 opened this issue Jun 10, 2019 · 3 comments

Comments

@nyanpasu64
Copy link

nyanpasu64 commented Jun 10, 2019

  • The final while True loop should catch and ignore all exceptions (but print stack trace anyway). I know it's generally a bad idea to swallow exceptions (only if you don't print or log, and the program cannot possibly continue?). But individual commands failing should not corrupt internal state, and losing analysis data built over an hour is a terrible user experience.
  • Maybe you should add an option to save (eg. pickle/CPickle) the 7GB of RAM to a file, and reload it later on.
  • The crash below.
    • Notes: NTFS stores UTF-16 (unpaired surrogates allowed) which can be encoded as WTF-8. Python 2 has 8-bit bytes/str and arbitrary-bit unicode. For the minimum changes to your code, you could try latin1 instead of ascii.
> tikzplot 67
Traceback (most recent call last):
  File "main.py", line 385, in <module>
    main()
  File "main.py", line 382, in main
    interpret(cmd, arguments, parts, shorthands, args.outputdir)
  File "main.py", line 171, in interpret
    print utils.tikz_part(part)
  File "/home/jimbo1qaz/Dropbox/encrypted/code/pypy/RecuperaBit/recuperabit/utils.py", line 310, in tikz_part
    lines += [tikz_child(entry, 4)[0] for entry in (part.root, part.lost)]
  File "/home/jimbo1qaz/Dropbox/encrypted/code/pypy/RecuperaBit/recuperabit/utils.py", line 283, in tikz_child
    content, number = tikz_child(entry, padding+4)
  File "/home/jimbo1qaz/Dropbox/encrypted/code/pypy/RecuperaBit/recuperabit/utils.py", line 283, in tikz_child
    content, number = tikz_child(entry, padding+4)
  File "/home/jimbo1qaz/Dropbox/encrypted/code/pypy/RecuperaBit/recuperabit/utils.py", line 283, in tikz_child
    content, number = tikz_child(entry, padding+4)
  File "/home/jimbo1qaz/Dropbox/encrypted/code/pypy/RecuperaBit/recuperabit/utils.py", line 283, in tikz_child
    content, number = tikz_child(entry, padding+4)
  File "/home/jimbo1qaz/Dropbox/encrypted/code/pypy/RecuperaBit/recuperabit/utils.py", line 280, in tikz_child
    lines = [r'%schild {%s' % (pad, _tikz_repr(directory))]
  File "/home/jimbo1qaz/Dropbox/encrypted/code/pypy/RecuperaBit/recuperabit/utils.py", line 273, in _tikz_repr
    _ltx_clean(node.index), _ltx_clean(node.name)
  File "/home/jimbo1qaz/Dropbox/encrypted/code/pypy/RecuperaBit/recuperabit/utils.py", line 263, in _ltx_clean
    clean = str(label).replace('$', r'\$').replace('_', r'\_')
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf1' in position 11: ordinal not in range(128)

Git master 18090ab
Python 2.7.13 (8cdda8b8cdb8ff29d9e620cccd6c5edd2f2a23ec, Apr 16 2019, 18:25:57)
[PyPy 7.1.1 with GCC 8.2.0]

@Lazza
Copy link
Owner

Lazza commented Jun 24, 2019

All your comments are spot-on and I agree with what you say. I am actually surprised that somebody else tried the TikzPlot, it was left there just because I needed some figures in my thesis. 😄

Maybe you should add an option to save (eg. pickle/CPickle) the 7GB of RAM to a file

In the perspective of a future 2.0 version, I was thinking about using a more advanced file format, rather than the current savefile which is quite poor. It could be based on a SQLite DB so the RAM usage would considerably drop as well.

Python 2 has 8-bit bytes/str and arbitrary-bit unicode

True. The fact that RecuperaBit is currently written in Python 2 provides several issues with Unicode, it should definitely be ported to Python 3.

Unfortunately, I do not have a lot of free time these days and thus I am not in a position to provide time estimates for this task.

@mirh
Copy link

mirh commented Feb 4, 2020

Sigh, I guess like this was also responsible of my crash with tree

Traceback (most recent call last):
  File "main.py", line 384, in <module>
    main()
  File "main.py", line 381, in main
    interpret(cmd, arguments, parts, shorthands, args.outputdir)
  File "main.py", line 118, in interpret
    print utils.tree_folder(part.lost)
  File "python-2_7_17_amd64\lib\codecs.py", line 369, in write
    data, consumed = self.encode(object, self.errors)
  File "python-_7_17_amd64\lib\encodings\cp1252.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode characters in position 11295777
-11295778: character maps to <undefined>

@Lazza
Copy link
Owner

Lazza commented Jan 2, 2021

@nyanpasu64 I was wondering if, by any chance, you could try the newly released v1.1.2 (for Python3) and see if it still gives you those errors.

@mirh some feedback from you would be great as well.

Thanks to both of you for your time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants