Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TST: test_table_group_by[True] and test_group_by_masked[True] failed with numpy 1.25rc1 #14882

Closed
pllim opened this issue May 30, 2023 · 9 comments · Fixed by #14907
Closed

Comments

@pllim
Copy link
Member

pllim commented May 30, 2023

I see this in the predeps job that pulls in numpy 1.25rc1. Example log: https://github.com/astropy/astropy/actions/runs/5117103756/jobs/9199883166

Hard to discern between the other 100+ failures from #14881 and I do not understand why we didn't catch this earlier in devdeps. @mhvk , does this look familiar to you?

def test_table_group_by(T1):

__________________________ test_table_group_by[True] ___________________________

T1 = <QTable length=8>
  a    b      c      d      q   
                            m   
int64 str1 float64 int64 float64
-...   0.0     4     4.0
    1    b     3.0     5     5.0
    1    a     2.0     6     6.0
    1    a     1.0     7     7.0

    def test_table_group_by(T1):
        """
        Test basic table group_by functionality for possible key types and for
        masked/unmasked tables.
        """
        for masked in (False, True):
            t1 = QTable(T1, masked=masked)
            # Group by a single column key specified by name
            tg = t1.group_by("a")
            assert np.all(tg.groups.indices == np.array([0, 1, 4, 8]))
            assert str(tg.groups) == "<TableGroups indices=[0 1 4 8]>"
            assert str(tg["a"].groups) == "<ColumnGroups indices=[0 1 4 8]>"
    
            # Sorted by 'a' and in original order for rest
>           assert tg.pformat() == [
                " a   b   c   d   q ",
                "                 m ",
                "--- --- --- --- ---",
                "  0   a 0.0   4 4.0",
                "  1   b 3.0   5 5.0",
                "  1   a 2.0   6 6.0",
                "  1   a 1.0   7 7.0",
                "  2   c 7.0   0 0.0",
                "  2   b 5.0   1 1.0",
                "  2   b 6.0   2 2.0",
                "  2   a 4.0   3 3.0",
            ]
E           AssertionError: assert [' a   b   c ...  5 5.0', ...] == [' a   b   c ...  6 6.0', ...]
E             At index 4 diff: '  1   a 1.0   7 7.0' != '  1   b 3.0   5 5.0'
E             Full diff:
E               [
E                ' a   b   c   d   q ',
E                '                 m ',
E                '--- --- --- --- ---',
E                '  0   a 0.0   4 4.0',
E             +  '  1   a 1.0   7 7.0',
E                '  1   b 3.0   5 5.0',
E                '  1   a 2.0   6 6.0',
E             -  '  1   a 1.0   7 7.0',
E             ?     ^     ^     ^^^
E             +  '  2   a 4.0   3 3.0',
E             ?     ^     ^     ^^^
E             +  '  2   b 6.0   2 2.0',
E             +  '  2   b 5.0   1 1.0',
E                '  2   c 7.0   0 0.0',
E             -  '  2   b 5.0   1 1.0',
E             -  '  2   b 6.0   2 2.0',
E             -  '  2   a 4.0   3 3.0',
E               ]

astropy/table/tests/test_groups.py:49: AssertionError

def test_group_by_masked(T1):

__________________________ test_group_by_masked[True] __________________________

T1 = <QTable length=8>
  a    b      c      d      q   
                            m   
int64 str1 float64 int64 float64
-...   0.0     4     4.0
    1    b     3.0     5     5.0
    1    a     2.0     6     6.0
    1    a     1.0     7     7.0

    def test_group_by_masked(T1):
        t1m = QTable(T1, masked=True)
        t1m["c"].mask[4] = True
        t1m["d"].mask[5] = True
>       assert t1m.group_by("a").pformat() == [
            " a   b   c   d   q ",
            "                 m ",
            "--- --- --- --- ---",
            "  0   a  --   4 4.0",
            "  1   b 3.0  -- 5.0",
            "  1   a 2.0   6 6.0",
            "  1   a 1.0   7 7.0",
            "  2   c 7.0   0 0.0",
            "  2   b 5.0   1 1.0",
            "  2   b 6.0   2 2.0",
            "  2   a 4.0   3 3.0",
        ]
E       AssertionError: assert [' a   b   c ... -- 5.0', ...] == [' a   b   c ...  6 6.0', ...]
E         At index 4 diff: '  1   a 1.0   7 7.0' != '  1   b 3.0  -- 5.0'
E         Full diff:
E           [
E            ' a   b   c   d   q ',
E            '                 m ',
E            '--- --- --- --- ---',
E            '  0   a  --   4 4.0',
E         +  '  1   a 1.0   7 7.0',
E            '  1   b 3.0  -- 5.0',
E            '  1   a 2.0   6 6.0',
E         -  '  1   a 1.0   7 7.0',
E         ?     ^     ^     ^^^
E         +  '  2   a 4.0   3 3.0',
E         ?     ^     ^     ^^^
E         +  '  2   b 6.0   2 2.0',
E         +  '  2   b 5.0   1 1.0',
E            '  2   c 7.0   0 0.0',
E         -  '  2   b 5.0   1 1.0',
E         -  '  2   b 6.0   2 2.0',
E         -  '  2   a 4.0   3 3.0',
E           ]

astropy/table/tests/test_groups.py:330: AssertionError
@pllim
Copy link
Member Author

pllim commented Jun 1, 2023

I cannot reproduce this locally. 🤯 The error log above looks like some lines moved about... but it does not make sense.

Also, to run this in an interactive session:

import numpy as np
from astropy import units as u
from astropy.table import QTable

T = QTable.read(
        [
            " a b c d",
            " 2 c 7.0 0",
            " 2 b 5.0 1",
            " 2 b 6.0 2",
            " 2 a 4.0 3",
            " 0 a 0.0 4",
            " 1 b 3.0 5",
            " 1 a 2.0 6",
            " 1 a 1.0 7",
        ],
        format="ascii",
)
T["q"] = np.arange(len(T)) * u.m
T.meta.update({"ta": 1})
T["c"].meta.update({"a": 1})
T["c"].description = "column c"
T.add_index("a")

t1 = QTable(T, masked=True)
tg = t1.group_by("a")
>>> tg
<QTable length=8>
  a    b      c      d      q
                            m
int64 str1 float64 int64 float64
----- ---- ------- ----- -------
    0    a     0.0     4     4.0
    1    b     3.0     5     5.0
    1    a     2.0     6     6.0
    1    a     1.0     7     7.0
    2    c     7.0     0     0.0
    2    b     5.0     1     1.0
    2    b     6.0     2     2.0
    2    a     4.0     3     3.0

@taldcroft
Copy link
Member

@pllim - I also cannot reproduce the problem locally on my Mac. What to do?

@pllim
Copy link
Member Author

pllim commented Jun 1, 2023

@taldcroft , does the order matter? I am guessing yes?

Looks like maybe somehow this test triggers some race condition but only in CI, or some global var is messing it up from a different test. But I don't know enough about internals to make a more educated guess.

@taldcroft
Copy link
Member

What about this: could the sort order have changed? Is the test failure on an "AVX-512 enabled processor"?

https://numpy.org/devdocs/release/1.25.0-notes.html#faster-np-sort-on-avx-512-enabled-processors

@pllim
Copy link
Member Author

pllim commented Jun 1, 2023

@taldcroft
Copy link
Member

@pllim - Grouping is supposed to maintain the original order within a group. That depends on the numpy sorting doing the same, which depends on the specific sort algorithm.

@taldcroft
Copy link
Member

So that looks promising. Let me remind myself of the code in there...

@pllim
Copy link
Member Author

pllim commented Jun 1, 2023

So if we decide that numpy/numpy#22315 is changing our result, is that a numpy bug?

@taldcroft
Copy link
Member

It turns out this is likely in the table indexing code, which is never easy to understand. But it does look like this is the issue, because indexing appears to use the numpy default sorting, which is quicksort. But quicksort is not guaranteed to be stable, so maybe it was passing accidentally before.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants