New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Beets crashing with UnicodeError on initial import #2585

Closed
prg318 opened this Issue Jun 5, 2017 · 5 comments

Comments

Projects
None yet
3 participants
@prg318

prg318 commented Jun 5, 2017

Problem

I was running an initial import (no autotag, "-A") of a large mp3 library and got the following error:

Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/beets/util/__init__.py", line 352, in bytestring_path
    return path.encode(_fsencoding())
UnicodeEncodeError: 'utf-8' codec can't encode character '\udce4' in position 42: surrogates not allowed
 
During handling of the above exception, another exception occurred:
 
Traceback (most recent call last):
  File "/usr/bin/beet", line 11, in <module>
    load_entry_point('beets==1.4.4', 'console_scripts', 'beet')()
  File "/usr/lib/python3.6/site-packages/beets/ui/__init__.py", line 1228, in main
    _raw_main(args)
  File "/usr/lib/python3.6/site-packages/beets/ui/__init__.py", line 1215, in _raw_main
    subcommand.func(lib, suboptions, subargs)
  File "/usr/lib/python3.6/site-packages/beets/ui/commands.py", line 933, in import_func
    import_files(lib, paths, query)
  File "/usr/lib/python3.6/site-packages/beets/ui/commands.py", line 885, in import_files
    if not os.path.exists(syspath(normpath(path))):
  File "/usr/lib/python3.6/site-packages/beets/util/__init__.py", line 133, in normpath
    return bytestring_path(path)
  File "/usr/lib/python3.6/site-packages/beets/util/__init__.py", line 354, in bytestring_path
    return path.encode('utf-8')
UnicodeEncodeError: 'utf-8' codec can't encode character '\udce4' in position 42: surrogates not allowed

I am running on a unicode system which I believe is configured properly:

$ locale
LANG=en_US.utf8
LC_CTYPE="en_US.utf8"
LC_NUMERIC="en_US.utf8"
LC_TIME="en_US.utf8"
LC_COLLATE="en_US.utf8"
LC_MONETARY="en_US.utf8"
LC_MESSAGES="en_US.utf8"
LC_PAPER="en_US.utf8"
LC_NAME="en_US.utf8"
LC_ADDRESS="en_US.utf8"
LC_TELEPHONE="en_US.utf8"
LC_MEASUREMENT="en_US.utf8"
LC_IDENTIFICATION="en_US.utf8"
LC_ALL=

Setup

  • OS: Linux
  • Python version: 3.6.1
  • beets version: 1.4.3 and latest git
  • Turning off plugins made problem go away (yes/no): no

My configuration (output of beet config) is:

directory: /media/nas/audio
library: /media/nas/audio.db

import:
    copy: no
    write: no
@sampsyo

This comment has been minimized.

Show comment
Hide comment
@sampsyo

sampsyo Jun 5, 2017

Member

Hmm; thanks for filing! It sounds like this is an issue with handling the encoding of the top-level path you were trying to import (i.e., a path you specified on the command line). Could that be right? Were there non-ASCII characters in that path?

[Technical details warning.] If so, I think the problem is that, on Python 3, the import command handler (beets.ui.commands.import_func) gets the paths list from the command-line arguments, so they're Unicode strings. (On Python 2, they're bytes.) We need to get these into bytes form. This was hidden from most people, but your system seems to not be specifying its filesystem encoding, and the path was non-UTF-8. This was bound to happen eventually. The right solution is to convert these paths back to bytes in import_func, before handing them off to import_files.

Edit: To revise that prediction a bit, I think the problem is not that your OS isn't reporting an encoding; it's just that the file path doesn't fit in that encoding—which isn't all that rare. Our exception handler in bytestring_path is getting triggered. We need something like path.encode(_fsencoding(), 'surrogateescape').

To confirm, can you also please try running this command to see what Python thinks your filesystem encoding is?

$ python3 -c 'import sys; print(sys.getfilesystemencoding())'
Member

sampsyo commented Jun 5, 2017

Hmm; thanks for filing! It sounds like this is an issue with handling the encoding of the top-level path you were trying to import (i.e., a path you specified on the command line). Could that be right? Were there non-ASCII characters in that path?

[Technical details warning.] If so, I think the problem is that, on Python 3, the import command handler (beets.ui.commands.import_func) gets the paths list from the command-line arguments, so they're Unicode strings. (On Python 2, they're bytes.) We need to get these into bytes form. This was hidden from most people, but your system seems to not be specifying its filesystem encoding, and the path was non-UTF-8. This was bound to happen eventually. The right solution is to convert these paths back to bytes in import_func, before handing them off to import_files.

Edit: To revise that prediction a bit, I think the problem is not that your OS isn't reporting an encoding; it's just that the file path doesn't fit in that encoding—which isn't all that rare. Our exception handler in bytestring_path is getting triggered. We need something like path.encode(_fsencoding(), 'surrogateescape').

To confirm, can you also please try running this command to see what Python thinks your filesystem encoding is?

$ python3 -c 'import sys; print(sys.getfilesystemencoding())'

@sampsyo sampsyo added the needinfo label Jun 5, 2017

@prg318

This comment has been minimized.

Show comment
Hide comment
@prg318

prg318 Jun 5, 2017

Hi! The toplevel path I was importing was actually plain ASCII (/media/nas/audio). I'm somewhat upset that I am unable to reproduce this bug! To get more debugging information, I ran my initial import with "-vv" ("beets -vv import -A /media/nas/audio") and it worked! Furthermore, I have cleared my database again and am now reimporting again without the "-vv" flags, and this is also working!

Maybe there was a transient issue going on somewhere in my system? My audio is mounted via NFS so there could been something funky going on with the NFS server at the time perhaps? Maybe I launched my beets import from a really funky terminal where the locale was screwed up (very unlikely but possible!)

Until this issue is reproducable, I'd say maybe we can close it until someone can reproduce it again?

Thanks for your feedback and looking into it - I will report back and re-open if I run into something like this again in the future. Sorry I don't have any debugging information to provide!

prg318 commented Jun 5, 2017

Hi! The toplevel path I was importing was actually plain ASCII (/media/nas/audio). I'm somewhat upset that I am unable to reproduce this bug! To get more debugging information, I ran my initial import with "-vv" ("beets -vv import -A /media/nas/audio") and it worked! Furthermore, I have cleared my database again and am now reimporting again without the "-vv" flags, and this is also working!

Maybe there was a transient issue going on somewhere in my system? My audio is mounted via NFS so there could been something funky going on with the NFS server at the time perhaps? Maybe I launched my beets import from a really funky terminal where the locale was screwed up (very unlikely but possible!)

Until this issue is reproducable, I'd say maybe we can close it until someone can reproduce it again?

Thanks for your feedback and looking into it - I will report back and re-open if I run into something like this again in the future. Sorry I don't have any debugging information to provide!

@prg318 prg318 closed this Jun 5, 2017

@sampsyo

This comment has been minimized.

Show comment
Hide comment
@sampsyo

sampsyo Jun 6, 2017

Member

That certainly is very strange! I still think the fix I outlined above is probably the right one, but let's chalk it up to a flaky NAS connection for now and see if this reappears.

Member

sampsyo commented Jun 6, 2017

That certainly is very strange! I still think the fix I outlined above is probably the right one, but let's chalk it up to a flaky NAS connection for now and see if this reappears.

@itheof

This comment has been minimized.

Show comment
Hide comment
@itheof

itheof Jan 16, 2018

Hi! I just had the same issue with a folder containing é' characters. Changing it to a regulare' made it work. Please tell me if there is any information I could provide to help narrowing the real issue

itheof commented Jan 16, 2018

Hi! I just had the same issue with a folder containing é' characters. Changing it to a regulare' made it work. Please tell me if there is any information I could provide to help narrowing the real issue

@sampsyo

This comment has been minimized.

Show comment
Hide comment
@sampsyo

sampsyo Jan 16, 2018

Member

Hmm; that's troubling, @itheof! If it's something you can reproduce reliably, please open another bug.

Member

sampsyo commented Jan 16, 2018

Hmm; that's troubling, @itheof! If it's something you can reproduce reliably, please open another bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment