Skip to content

Commit

Permalink
Add FreeBSD fix for locale failure
Browse files Browse the repository at this point in the history
FreeBSD seems to throw an OSError when locale.strxfrm is given 'Å',
which is surprising behavior. Well, maybe not, considering how many bugs
I have found with FreeBSD's implementation of locale over the course of
natsort development.

Anyway, we just ignore any input that causes locale.strxfrm to barf in
our tests.
  • Loading branch information
SethMMorton committed Mar 2, 2023
1 parent 0f993d0 commit def5928
Show file tree
Hide file tree
Showing 2 changed files with 17 additions and 2 deletions.
2 changes: 1 addition & 1 deletion natsort/compat/fake_fastnumbers.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ def fast_float(
String to attempt to convert to a float.
key : callable
Single-argument function to apply to *x* if conversion fails.
nan : object
nan : float
Value to return instead of NaN if NaN would be returned.
Returns
Expand Down
17 changes: 16 additions & 1 deletion tests/test_string_component_transform_factory.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
from typing import Any, Callable, FrozenSet, Union

import pytest
from hypothesis import example, given
from hypothesis import assume, example, given
from hypothesis.strategies import floats, integers, text
from natsort.compat.fastnumbers import try_float, try_int
from natsort.compat.locale import get_strxfrm
Expand All @@ -32,6 +32,20 @@ def no_null(x: str) -> bool:
return "\0" not in x


def input_is_ok_with_locale(x: str) -> bool:
"""Ensure this input won't cause locale.strxfrm to barf"""
# On FreeBSD, locale.strxfrm raises an OSError on input like 'Å'.

This comment has been minimized.

Copy link
@mgorny

mgorny Mar 3, 2023

Is this reported somewhere in FreeBSD?

This comment has been minimized.

Copy link
@SethMMorton

SethMMorton Mar 3, 2023

Author Owner

🤷

This comment has been minimized.

Copy link
@SethMMorton

SethMMorton Mar 3, 2023

Author Owner

I did not report it, no. I have found even more egregious issues with the locale library across BSD instances (including macos) and so have made the assumption that having a properly functioning locale library was not a priority for BSD. Are you suggesting this be reported?

This comment has been minimized.

Copy link
@mgorny

mgorny Mar 3, 2023

I'm pretty sure locales are supposed to work on FreeBSD, and it certainly would be worthwhile to look into the issue in detail. That said, I'm not sure if this is a problem with FreeBSD itself or CPython on FreeBSD.

This comment has been minimized.

Copy link
@SethMMorton

SethMMorton Mar 15, 2023

Author Owner

@mgorny I have created a repo that demonstrates conclusively the error is on BSD-like operating systems and not with Python. The CI is here: https://github.com/SethMMorton/broken-locale-demo/actions/runs/4423000939.

Recommendations for where to report?

This comment has been minimized.

Copy link
@mgorny

mgorny Mar 15, 2023

Do note that if you use en_US.UTF-8 rather than en_US.UTF8, then it passes. To be honest, I don't think this is a bug — rather than you're setting an incorrect locale for the platform in question.

While POSIX leaves locale names implementation-defined, IANA specifies the charset as UTF-8, not UTF8. See https://unix.stackexchange.com/a/605777/22833.

Note:

$ LC_ALL=en_US.UTF8 locale
LANG=C.UTF-8
LC_CTYPE="C"
LC_COLLATE="C"
LC_TIME="C"
LC_NUMERIC="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_ALL=en_US.UTF8
$ LC_ALL=en_US.UTF-8 locale
LANG=C.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_ALL=en_US.UTF-8

This comment has been minimized.

Copy link
@SethMMorton

SethMMorton Mar 15, 2023

Author Owner

Very interesting. I made this update, and it does fix the specific issue described here, but does not fix the incorrect sort ordering. So there still is something fishy.

Also, I'm not sure I would say that because I was using the incorrect locale string that there still was no bug. Ideally, if the locale I was using was invalid, it would have thrown an error on setlocale, and/or would have thrown an error for every input given to *xfrm(). Instead, it only threw the error for the Angstrom character. I would have expected consistent behavior.

This comment has been minimized.

Copy link
@mgorny

mgorny Mar 15, 2023

Very interesting. I made this update, and it does fix the specific issue described here, but does not fix the incorrect sort ordering. So there still is something fishy.

If you give me an example for that, I can try figuring it out.

Also, I'm not sure I would say that because I was using the incorrect locale string that there still was no bug. Ideally, if the locale I was using was invalid, it wold have thrown an error on setlocale, and/or would have thrown an error for every input given to *xfrm(). Instead, it only threw the error for the Angstrom character. I would have expected consistent behavior.

Actually, it did throw the error in setlocale(). If you do:

errno = 0;
::setlocale(LC_ALL, "en_US.UTF8");
EXPECT_EQ(errno, 0);

you'd notice this is where errno was set. ::wcsxfrm() does not use errno, and you did not reset it, so your test got the value from setlocale() call. If you add errno = 0 above ::wcsxfrm() call, it doesn't trigger.

This comment has been minimized.

Copy link
@SethMMorton

SethMMorton Mar 16, 2023

Author Owner

@mgorny

OK - I spoke too soon WRT to the sorting - it fixes it on FreeBSD to use the correct locale string. The odd one out is macos, and I don't even know if it's possible to report issues to Apple like this.


Anyway, I still wish I would have seen more consistent results from FreeBSD when an invalid locale string was given, either consistently doing nothing (like it did for "a" and "A") or consistently failing (like it did for "Å"). I would be happy to report that if you point me to the best place to do so.

This comment has been minimized.

Copy link
@SethMMorton

SethMMorton Mar 16, 2023

Author Owner

@mgorny OK - last one. It seems that it does fail for e.g. "a" as well. So, I guess FreeBSD doesn't have an issue (assuming the user uses a valid locale string).

So, my bad. I'll clarify the language in the source code with a mea culpa in the commit message, and also clarify some language in the documentation that the bad sorting is macos specific, not all of BSD OSes.

This comment has been minimized.

Copy link
@mgorny

mgorny Mar 16, 2023

Glad to hear the matter's resolved with FreeBSD. If you have any problems in the future, I'd probably go for their Bugzilla or perhaps mailing lists. In general, you could also set up a VM to experiment with a little bit — I needed one in the past, and found it really easy to install FreeBSD (in the usual enter-enter-enter… manner) in qemu.

As for macOS, I'm afraid I know very little of it. If you don't own the hardware, macOS-Simple-KVM used to help but I don't think it works anymore. Beyond that, I suppose you need to find someone more familiar with Darwin.

# You read that right - an *OSError* for invalid input.
# We cannot really fix that, so we just filter out any value
# that could cause locale.strxfrm to barf with this function.
try:
get_strxfrm()(x)
except OSError:
return False
else:
return True


@pytest.mark.parametrize(
"alg, example_func",
[
Expand Down Expand Up @@ -77,6 +91,7 @@ def test_string_component_transform_factory(
) -> None:
string_component_transform_func = string_component_transform_factory(alg)
x = str(x)
assume(input_is_ok_with_locale(x)) # handle broken locale lib on BSD.
try:
assert list(string_component_transform_func(x)) == list(example_func(x))
except ValueError as e: # handle broken locale lib on BSD.
Expand Down

0 comments on commit def5928

Please sign in to comment.