-
Notifications
You must be signed in to change notification settings - Fork 52
Possible Issues with natsort.humansorted or ns.LOCALE
In addition to modifying how characters are sorted, ns.LOCALE
will take
into account locale-dependent thousands separators (and locale-dependent
decimal separators if ns.FLOAT
is enabled). This means that if you are in a
locale that uses commas as the thousands separator, a number like
123,456
will be interpreted as 123456
. If this is not what you want,
you may consider using ns.LOCALEALPHA
which will only enable locale-aware
sorting for non-numbers (similarly, ns.LOCALENUM
enables locale-aware
sorting only for numbers).
Regenerate Key With natsort_keygen() After Changing Locale
When natsort_keygen() is called it returns a key function that
hard-codes the provided settings. This means that the key returned when
ns.LOCALE
is used contains the settings specified by the locale
loaded at the time the key is generated. If you change the locale,
you should regenerate the key to account for the new locale.
Corollary: Do Not Reuse natsort_keygen() After Changing Locale
If you change locale, the old function will not work as expected.
The locale library works with a global state. When
natsort_keygen() is called it does the best job that it can to
make the returned function as static as possible and independent of the global
state, but the locale.strxfrm() function must access this global state to
work; therefore, if you change locale and use ns.LOCALE
then you should
discard the old key.
NOTE: If you use PyICU then you may be able to reuse keys after changing locale.
The locale Module From the StdLib Has Issues
natsort will use PyICU for humansorted() or
ns.LOCALE
if it is installed. If not, it will fall back on the
locale library from the Python stdlib. If you do not have PyICU
installed, please keep the following known problems and issues in mind.
NOTE: Remember, if you have PyICU installed you shouldn't need to worry about any of these.
I have found that unless you explicitly set a locale, the sorted order may not
be what you expect. Setting this is straightforward
(in the below example I use 'en_US.UTF-8'
, but you should use your
locale):
>>> import locale
>>> locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')
'en_US.UTF-8'
The locale Module Is Broken on Mac OS X
It's not Python's fault, but the OS... the locale library for OSX (and possibly some other BSD systems) is broken. See the following links:
- https://stackoverflow.com/questions/3412933/python-not-sorting-unicode-properly-strcoll-doesnt-help
- https://bugs.python.org/issue23195
- https://github.com/SethMMorton/natsort/issues/21 (contains instructons on installing)
- https://stackoverflow.com/questions/33459384/unicode-character-not-in-range-when-calling-locale-strxfrm
- https://github.com/SethMMorton/natsort/issues/34
Of course, installing PyICU fixes this, but if you don't want to or cannot install this there is some hope.
- As of natsort version 4.0.0, natsort is configured to compensate for a broken locale library. When sorting non-numbers it will handle case as you expect, but it will still not be able to comprehend non-ASCII characters properly. Additionally, it has a built-in lookup table of thousands separators that are incorrect on OS X/BSD (but is possible it is not complete... please file an issue if you see it is not complete)
- Use
"*.ISO8859-1"
locale (i.e.'en_US.ISO8859-1'
) rather than"*.UTF-8"
locale. I have found that these have fewer issues than"UTF-8"
, but your mileage may vary.