Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

break our builds with a non-utf8 locale #2158

Closed
wants to merge 2 commits into from
Closed

break our builds with a non-utf8 locale #2158

wants to merge 2 commits into from

Conversation

ghost
Copy link

@ghost ghost commented Aug 7, 2016

This is just to show how we do (not too well). If this is something
we wanna take care of outside of windows, then it would fall under

env:
  matrix:
    - LANG: en_US.utf-8
    - LANG: de_DE

in the travis config

@ghost
Copy link
Author

ghost commented Aug 7, 2016

that's interesting. I must not be configuring something wrong ocally for this test, because I have a ton of failures on py2.7 even when running it locally. I guess this wasn't as useful as i wanted it to be. I must be missing a setting.

@ghost
Copy link
Author

ghost commented Aug 7, 2016

looks like our build no longer breaks after fixing the edit plugin. What exactly do you wanna support @sampsyo? Considering that once we go with click, we definitely wont' be supporting C.

@sampsyo
Copy link
Member

sampsyo commented Aug 7, 2016

Good question. I'm still sad about Click bailing with an error when LC_ALL=C on Python 3—it seems unnecessarily user-hostile—but there may be no way around it. Barring that, I think we should do our best to support any configuration, with the knowledge that some locales might lead to missing special characters.

@ghost
Copy link
Author

ghost commented Aug 7, 2016

even BIG5 or or SHIFT-JIS? I have no idea how often they used today in practice though.

@sampsyo
Copy link
Member

sampsyo commented Aug 7, 2016

I also don't. It seems like the kind options for beets would be, even for weird and exotic encodings, one of these two:

  1. Respect the encoding, except when we can't (i.e., use the 'replace' encoding error handler).
  2. Ignore the encoding and use UTF-8 instead.

Click's approach, just exiting with an error, is less friendly.

@ghost
Copy link
Author

ghost commented Aug 10, 2016

if 2 was possible, why wouldn't we just do that? I didn't think it'd be that easy though. If it was that easy, then why wouldn't click do it?

@ghost
Copy link
Author

ghost commented Aug 10, 2016

finally got the errors i was seeing locally to show up. Uggh, that took way too long! Our builds are finally broken!

@ghost ghost mentioned this pull request Aug 11, 2016
@ghost
Copy link
Author

ghost commented Aug 13, 2016

some highlights/questions:

The results here assume beets is run with a non-utf8 encoding:

  • beets.util.hidden.is_hidden explicitly decodes utf-8 paths, but they aren't utf-8 encoded

  • _normalize_path in importer.py also does the same thing

  • the play plugin assumes writing m3u files as utf8 is ok

  • others are output stuff like:

    Traceback (most recent call last):
    
    File "/home/travis/build/beetbox/beets/test/test_library.py", line 415, in test_unicode_extension_in_fragment
    
      self.assertEqual(dest, u'foo.caf\xe9')
    
    nose.proxy.AssertionError: 'foo.caf' != 'foo.caf�'
    
    - foo.caf
    
    + foo.caf�
    
    ?        +

    In total: FAILED (SKIP=21, errors=18, failures=3)
    Fixing is_hidden handles about half of it.

Some thoughts/data on m3u:
The m3u "spec" neither suggests nor specifies an encoding. Lots of players will read utf-8 encoded playlists with no problem, but also other common ones won't. I imagine it'd be rare to run into this problem on Linux or Mac OS. but many well known players on Windows, or other portable devices will not like them. There's no real good answer here! It'd probably be best to encode them in the local filesystem encoding by default, but with an encoding switch (maybe?)

NOTE: m3u8 is something that exists, but it doesn't seem well supported.

All this brings to mind a question though.. Are the paths in the beets database currently portable between systems with different encodings?

@ghost
Copy link
Author

ghost commented Aug 13, 2016

I don't think we can currently use _fsencoding in beets.util.hidden, because beets.util includes beets.util.hidden, so circular dependency. Any suggestions on that?

@sampsyo
Copy link
Member

sampsyo commented Aug 13, 2016

It should be fine, as long as we use import beets.util or from beets import util instead of from beets.util import _fsencoding. Here's an example:

$ cat a.py
import b
def fa():
    print('a')
    b.fb()
def fa2():
    print('a2')
$ cat b.py
import a
def fb():
    print('b')
    a.fa2()
$ python
Python 2.7.11 (default, Jan 15 2016, 18:35:21) 
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import a
>>> a.fa()
a
b
a2
>>> 
$ python3
Python 3.5.2 (default, Jun 29 2016, 13:43:58) 
[GCC 4.2.1 Compatible Apple LLVM 7.3.0 (clang-703.0.31)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import a
>>> a.fa()
a
b
a2
>>>

The deal here is that you're free to do circular imports, but the namespaces won't actually be populated until the modules finish executing. So as long as all the references to util._fsencoding only occur in functions that aren't immediately executed—which is the usual case—we're fine.

edit: updated with a better example

@ghost
Copy link
Author

ghost commented Aug 13, 2016

in the moment i forgot that bytes objects had startswith :(

@ghost
Copy link
Author

ghost commented Aug 13, 2016

@sampsyo : with your fixes to is_hidden

FAILED (SKIP=21, errors=9, failures=3)

I guess the key here for the ui going over the places where we have ignore as the error handler and switching out to something else on PY3.

@ghost
Copy link
Author

ghost commented Oct 15, 2016

using util._fsencoding() in test_importer.py even fails on Python 2.7. with non utf8 locales

@jtpavlock
Copy link
Contributor

Is this still relevant?

@stale
Copy link

stale bot commented Nov 18, 2020

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Nov 18, 2020
@stale stale bot closed this Nov 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants