[Python] Test failures on 32-bit x86 #40153

mgorny · 2024-02-20T11:43:58Z

Describe the bug, including details regarding any error messages, version, and platform.

When running the test suite on 32-bit x86, I'm getting the following test failures:

FAILED tests/test_array.py::test_dictionary_to_numpy - TypeError: Cannot cast array data from dtype('int64') to dtype('int32') according to the rule 'safe'
FAILED tests/test_io.py::test_python_file_large_seeks - assert 5 == ((2 ** 32) + 5)
FAILED tests/test_io.py::test_memory_map_large_seeks - OSError: Read out of bounds (offset = 4294967301, size = 5) in file of size 10
FAILED tests/test_pandas.py::TestConvertStructTypes::test_from_numpy_nested - AssertionError: assert 8 == 12
FAILED tests/test_schema.py::test_schema_sizeof - assert 28 > 30
FAILED tests/interchange/test_conversion.py::test_pandas_roundtrip_string - OverflowError: Python int too large to convert to C ssize_t
FAILED tests/interchange/test_conversion.py::test_pandas_roundtrip_large_string - OverflowError: Python int too large to convert to C ssize_t
FAILED tests/interchange/test_conversion.py::test_pandas_roundtrip_string_with_missing - OverflowError: Python int too large to convert to C ssize_t
FAILED tests/interchange/test_conversion.py::test_pandas_roundtrip_categorical - OverflowError: Python int too large to convert to C ssize_t
FAILED tests/interchange/test_conversion.py::test_empty_dataframe - OverflowError: Python int too large to convert to C ssize_t

Tracebacks

============================================================== FAILURES ===============================================================
______________________________________________________ test_dictionary_to_numpy _______________________________________________________

obj = array([13.7, 11. ]), method = 'take', args = (array([0, 1, 1, 0], dtype=int64),)
kwds = {'axis': None, 'mode': 'raise', 'out': None}, bound = <built-in method take of numpy.ndarray object at 0xeaca6ad0>

    def _wrapfunc(obj, method, *args, **kwds):
        bound = getattr(obj, method, None)
        if bound is None:
            return _wrapit(obj, method, *args, **kwds)
    
        try:
>           return bound(*args, **kwds)
E           TypeError: Cannot cast array data from dtype('int64') to dtype('int32') according to the rule 'safe'

args       = (array([0, 1, 1, 0], dtype=int64),)
bound      = <built-in method take of numpy.ndarray object at 0xeaca6ad0>
kwds       = {'axis': None, 'mode': 'raise', 'out': None}
method     = 'take'
obj        = array([13.7, 11. ])

/usr/lib/python3.11/site-packages/numpy/core/fromnumeric.py:59: TypeError

During handling of the above exception, another exception occurred:

    def test_dictionary_to_numpy():
        expected = pa.array(
            ["foo", "bar", None, "foo"]
        ).to_numpy(zero_copy_only=False)
        a = pa.DictionaryArray.from_arrays(
            pa.array([0, 1, None, 0]),
            pa.array(['foo', 'bar'])
        )
        np.testing.assert_array_equal(a.to_numpy(zero_copy_only=False),
                                      expected)
    
        with pytest.raises(pa.ArrowInvalid):
            # If this would be changed to no longer raise in the future,
            # ensure to test the actual result because, currently, to_numpy takes
            # for granted that when zero_copy_only=True there will be no nulls
            # (it's the decoding of the DictionaryArray that handles the nulls and
            # this is only activated with zero_copy_only=False)
            a.to_numpy(zero_copy_only=True)
    
        anonulls = pa.DictionaryArray.from_arrays(
            pa.array([0, 1, 1, 0]),
            pa.array(['foo', 'bar'])
        )
        expected = pa.array(
            ["foo", "bar", "bar", "foo"]
        ).to_numpy(zero_copy_only=False)
        np.testing.assert_array_equal(anonulls.to_numpy(zero_copy_only=False),
                                      expected)
    
        with pytest.raises(pa.ArrowInvalid):
            anonulls.to_numpy(zero_copy_only=True)
    
        afloat = pa.DictionaryArray.from_arrays(
            pa.array([0, 1, 1, 0]),
            pa.array([13.7, 11.0])
        )
        expected = pa.array([13.7, 11.0, 11.0, 13.7]).to_numpy()
>       np.testing.assert_array_equal(afloat.to_numpy(zero_copy_only=True),
                                      expected)

a          = <pyarrow.lib.DictionaryArray object at 0xeafe6ed0>

-- dictionary:
  [
    "foo",
    "bar"
  ]
-- indices:
  [
    0,
    1,
    null,
    0
  ]
afloat     = <pyarrow.lib.DictionaryArray object at 0xeafe6fb0>

-- dictionary:
  [
    13.7,
    11
  ]
-- indices:
  [
    0,
    1,
    1,
    0
  ]
anonulls   = <pyarrow.lib.DictionaryArray object at 0xeafe6e60>

-- dictionary:
  [
    "foo",
    "bar"
  ]
-- indices:
  [
    0,
    1,
    1,
    0
  ]
expected   = array([13.7, 11. , 11. , 13.7])

../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/tests/test_array.py:823: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pyarrow/array.pxi:1590: in pyarrow.lib.Array.to_numpy
    ???
/usr/lib/python3.11/site-packages/numpy/core/fromnumeric.py:192: in take
    return _wrapfunc(a, 'take', indices, axis=axis, out=out, mode=mode)
        a          = array([13.7, 11. ])
        axis       = None
        indices    = array([0, 1, 1, 0], dtype=int64)
        mode       = 'raise'
        out        = None
/usr/lib/python3.11/site-packages/numpy/core/fromnumeric.py:68: in _wrapfunc
    return _wrapit(obj, method, *args, **kwds)
        args       = (array([0, 1, 1, 0], dtype=int64),)
        bound      = <built-in method take of numpy.ndarray object at 0xeaca6ad0>
        kwds       = {'axis': None, 'mode': 'raise', 'out': None}
        method     = 'take'
        obj        = array([13.7, 11. ])
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

obj = array([13.7, 11. ]), method = 'take', args = (array([0, 1, 1, 0], dtype=int64),)
kwds = {'axis': None, 'mode': 'raise', 'out': None}, wrap = <built-in method __array_wrap__ of numpy.ndarray object at 0xeaca6ad0>

    def _wrapit(obj, method, *args, **kwds):
        try:
            wrap = obj.__array_wrap__
        except AttributeError:
            wrap = None
>       result = getattr(asarray(obj), method)(*args, **kwds)
E       TypeError: Cannot cast array data from dtype('int64') to dtype('int32') according to the rule 'safe'

args       = (array([0, 1, 1, 0], dtype=int64),)
kwds       = {'axis': None, 'mode': 'raise', 'out': None}
method     = 'take'
obj        = array([13.7, 11. ])
wrap       = <built-in method __array_wrap__ of numpy.ndarray object at 0xeaca6ad0>

/usr/lib/python3.11/site-packages/numpy/core/fromnumeric.py:45: TypeError
____________________________________________________ test_python_file_large_seeks _____________________________________________________

    def test_python_file_large_seeks():
        def factory(filename):
            return pa.PythonFile(open(filename, 'rb'))
    
>       check_large_seeks(factory)

factory    = <function test_python_file_large_seeks.<locals>.factory at 0xe13b6de8>

../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/tests/test_io.py:262: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

file_factory = <function test_python_file_large_seeks.<locals>.factory at 0xe13b6de8>

    def check_large_seeks(file_factory):
        if sys.platform in ('win32', 'darwin'):
            pytest.skip("need sparse file support")
        try:
            filename = tempfile.mktemp(prefix='test_io')
            with open(filename, 'wb') as f:
                f.truncate(2 ** 32 + 10)
                f.seek(2 ** 32 + 5)
                f.write(b'mark\n')
            with file_factory(filename) as f:
>               assert f.seek(2 ** 32 + 5) == 2 ** 32 + 5
E               assert 5 == ((2 ** 32) + 5)
E                +  where 5 = <bound method NativeFile.seek of <pyarrow.PythonFile closed=False own_file=False is_seekable=True is_writable=False is_readable=True>>(((2 ** 32) + 5))
E                +    where <bound method NativeFile.seek of <pyarrow.PythonFile closed=False own_file=False is_seekable=True is_writable=False is_readable=True>> = <pyarrow.PythonFile closed=False own_file=False is_seekable=True is_writable=False is_readable=True>.seek

f          = <pyarrow.PythonFile closed=True own_file=False is_seekable=True is_writable=False is_readable=True>
file_factory = <function test_python_file_large_seeks.<locals>.factory at 0xe13b6de8>
filename   = '/var/tmp/portage/dev-python/pyarrow-15.0.0/temp/test_ioj_p6zuld'

../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/tests/test_io.py:49: AssertionError
_____________________________________________________ test_memory_map_large_seeks _____________________________________________________

    def test_memory_map_large_seeks():
>       check_large_seeks(pa.memory_map)


../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/tests/test_io.py:1140: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/tests/test_io.py:51: in check_large_seeks
    assert f.read(5) == b'mark\n'
        f          = <pyarrow.MemoryMappedFile closed=True own_file=False is_seekable=True is_writable=False is_readable=True>
        file_factory = <cyfunction memory_map at 0xf228e778>
        filename   = '/var/tmp/portage/dev-python/pyarrow-15.0.0/temp/test_iozl2wxbou'
pyarrow/io.pxi:409: in pyarrow.lib.NativeFile.read
    ???
pyarrow/error.pxi:154: in pyarrow.lib.pyarrow_internal_check_status
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   OSError: Read out of bounds (offset = 4294967301, size = 5) in file of size 10


pyarrow/error.pxi:91: OSError
____________________________________________ TestConvertStructTypes.test_from_numpy_nested ____________________________________________

self = <pyarrow.tests.test_pandas.TestConvertStructTypes object at 0xeb535d90>

    def test_from_numpy_nested(self):
        # Note: an object field inside a struct
        dt = np.dtype([('x', np.dtype([('xx', np.int8),
                                       ('yy', np.bool_)])),
                       ('y', np.int16),
                       ('z', np.object_)])
        # Note: itemsize is not a multiple of sizeof(object)
>       assert dt.itemsize == 12
E       AssertionError: assert 8 == 12
E        +  where 8 = dtype([('x', [('xx', 'i1'), ('yy', '?')]), ('y', '<i2'), ('z', 'O')]).itemsize

dt         = dtype([('x', [('xx', 'i1'), ('yy', '?')]), ('y', '<i2'), ('z', 'O')])
self       = <pyarrow.tests.test_pandas.TestConvertStructTypes object at 0xeb535d90>

../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/tests/test_pandas.py:2604: AssertionError
_________________________________________________________ test_schema_sizeof __________________________________________________________

    def test_schema_sizeof():
        schema = pa.schema([
            pa.field('foo', pa.int32()),
            pa.field('bar', pa.string()),
        ])
    
>       assert sys.getsizeof(schema) > 30
E       assert 28 > 30
E        +  where 28 = <built-in function getsizeof>(foo: int32\nbar: string)
E        +    where <built-in function getsizeof> = sys.getsizeof

schema     = foo: int32
bar: string

../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/tests/test_schema.py:684: AssertionError
____________________________________________________ test_pandas_roundtrip_string _____________________________________________________

    @pytest.mark.pandas
    def test_pandas_roundtrip_string():
        # See https://github.com/pandas-dev/pandas/issues/50554
        if Version(pd.__version__) < Version("1.6"):
            pytest.skip("Column.size() bug in pandas")
    
        arr = ["a", "", "c"]
        table = pa.table({"a": pa.array(arr)})
    
        from pandas.api.interchange import (
            from_dataframe as pandas_from_dataframe
        )
    
        pandas_df = pandas_from_dataframe(table)
>       result = pi.from_dataframe(pandas_df)

arr        = ['a', '', 'c']
pandas_df  =    a
0  a
1   
2  c
pandas_from_dataframe = <function from_dataframe at 0xebbaa398>
table      = pyarrow.Table
a: string
----
a: [["a","","c"]]

../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/tests/interchange/test_conversion.py:159: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/interchange/from_dataframe.py:113: in from_dataframe
    return _from_dataframe(df.__dataframe__(allow_copy=allow_copy),
        allow_copy = True
        df         =    a
0  a
1   
2  c
../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/interchange/from_dataframe.py:136: in _from_dataframe
    batch = protocol_df_chunk_to_pyarrow(chunk, allow_copy)
        allow_copy = True
        batches    = []
        chunk      = <pandas.core.interchange.dataframe.PandasDataFrameXchg object at 0xda1c61f0>
        df         = <pandas.core.interchange.dataframe.PandasDataFrameXchg object at 0xda1c61f0>
../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/interchange/from_dataframe.py:182: in protocol_df_chunk_to_pyarrow
    columns[name] = column_to_array(col, allow_copy)
        allow_copy = True
        col        = <pandas.core.interchange.column.PandasColumn object at 0xda1c65b0>
        columns    = {}
        df         = <pandas.core.interchange.dataframe.PandasDataFrameXchg object at 0xda1c61f0>
        dtype      = <DtypeKind.STRING: 21>
        name       = 'a'
../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/interchange/from_dataframe.py:214: in column_to_array
    data = buffers_to_array(buffers, data_type,
        allow_copy = True
        buffers    = {'data': (PandasBuffer({'bufsize': 2, 'ptr': 3879523528, 'device': 'CPU'}),
          (<DtypeKind.STRING: 21>, 8, 'u', '=')),
 'offsets': (PandasBuffer({'bufsize': 32, 'ptr': 1530035680, 'device': 'CPU'}),
             (<DtypeKind.INT: 0>, 64, 'l', '=')),
 'validity': (PandasBuffer({'bufsize': 3, 'ptr': 1529980112, 'device': 'CPU'}),
              (<DtypeKind.BOOL: 20>, 8, 'b', '='))}
        col        = <pandas.core.interchange.column.PandasColumn object at 0xda1c65b0>
        data_type  = (<DtypeKind.STRING: 21>, 8, 'u', '=')
../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/interchange/from_dataframe.py:396: in buffers_to_array
    data_pa_buffer = pa.foreign_buffer(data_buff.ptr, data_buff.bufsize,
        _          = (<DtypeKind.STRING: 21>, 8, 'u', '=')
        allow_copy = True
        buffers    = {'data': (PandasBuffer({'bufsize': 2, 'ptr': 3879523528, 'device': 'CPU'}),
          (<DtypeKind.STRING: 21>, 8, 'u', '=')),
 'offsets': (PandasBuffer({'bufsize': 32, 'ptr': 1530035680, 'device': 'CPU'}),
             (<DtypeKind.INT: 0>, 64, 'l', '=')),
 'validity': (PandasBuffer({'bufsize': 3, 'ptr': 1529980112, 'device': 'CPU'}),
              (<DtypeKind.BOOL: 20>, 8, 'b', '='))}
        data_buff  = PandasBuffer({'bufsize': 2, 'ptr': 3879523528, 'device': 'CPU'})
        data_type  = (<DtypeKind.STRING: 21>, 8, 'u', '=')
        describe_null = (<ColumnNullType.USE_BYTEMASK: 4>, 0)
        length     = 3
        offset     = 0
        offset_buff = PandasBuffer({'bufsize': 32, 'ptr': 1530035680, 'device': 'CPU'})
        offset_dtype = (<DtypeKind.INT: 0>, 64, 'l', '=')
        validity_buff = PandasBuffer({'bufsize': 3, 'ptr': 1529980112, 'device': 'CPU'})
        validity_dtype = (<DtypeKind.BOOL: 20>, 8, 'b', '=')
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   OverflowError: Python int too large to convert to C ssize_t


pyarrow/io.pxi:1990: OverflowError
_________________________________________________ test_pandas_roundtrip_large_string __________________________________________________

    @pytest.mark.pandas
    def test_pandas_roundtrip_large_string():
        # See https://github.com/pandas-dev/pandas/issues/50554
        if Version(pd.__version__) < Version("1.6"):
            pytest.skip("Column.size() bug in pandas")
    
        arr = ["a", "", "c"]
        table = pa.table({"a_large": pa.array(arr, type=pa.large_string())})
    
        from pandas.api.interchange import (
            from_dataframe as pandas_from_dataframe
        )
    
        if Version(pd.__version__) >= Version("2.0.1"):
            pandas_df = pandas_from_dataframe(table)
>           result = pi.from_dataframe(pandas_df)

arr        = ['a', '', 'c']
pandas_df  =   a_large
0       a
1        
2       c
pandas_from_dataframe = <function from_dataframe at 0xebbaa398>
table      = pyarrow.Table
a_large: large_string
----
a_large: [["a","","c"]]

../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/tests/interchange/test_conversion.py:189: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/interchange/from_dataframe.py:113: in from_dataframe
    return _from_dataframe(df.__dataframe__(allow_copy=allow_copy),
        allow_copy = True
        df         =   a_large
0       a
1        
2       c
../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/interchange/from_dataframe.py:136: in _from_dataframe
    batch = protocol_df_chunk_to_pyarrow(chunk, allow_copy)
        allow_copy = True
        batches    = []
        chunk      = <pandas.core.interchange.dataframe.PandasDataFrameXchg object at 0xda103a10>
        df         = <pandas.core.interchange.dataframe.PandasDataFrameXchg object at 0xda103a10>
../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/interchange/from_dataframe.py:182: in protocol_df_chunk_to_pyarrow
    columns[name] = column_to_array(col, allow_copy)
        allow_copy = True
        col        = <pandas.core.interchange.column.PandasColumn object at 0xda1033d0>
        columns    = {}
        df         = <pandas.core.interchange.dataframe.PandasDataFrameXchg object at 0xda103a10>
        dtype      = <DtypeKind.STRING: 21>
        name       = 'a_large'
../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/interchange/from_dataframe.py:214: in column_to_array
    data = buffers_to_array(buffers, data_type,
        allow_copy = True
        buffers    = {'data': (PandasBuffer({'bufsize': 2, 'ptr': 3879522800, 'device': 'CPU'}),
          (<DtypeKind.STRING: 21>, 8, 'u', '=')),
 'offsets': (PandasBuffer({'bufsize': 32, 'ptr': 1480303312, 'device': 'CPU'}),
             (<DtypeKind.INT: 0>, 64, 'l', '=')),
 'validity': (PandasBuffer({'bufsize': 3, 'ptr': 1478277616, 'device': 'CPU'}),
              (<DtypeKind.BOOL: 20>, 8, 'b', '='))}
        col        = <pandas.core.interchange.column.PandasColumn object at 0xda1033d0>
        data_type  = (<DtypeKind.STRING: 21>, 8, 'u', '=')
../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/interchange/from_dataframe.py:396: in buffers_to_array
    data_pa_buffer = pa.foreign_buffer(data_buff.ptr, data_buff.bufsize,
        _          = (<DtypeKind.STRING: 21>, 8, 'u', '=')
        allow_copy = True
        buffers    = {'data': (PandasBuffer({'bufsize': 2, 'ptr': 3879522800, 'device': 'CPU'}),
          (<DtypeKind.STRING: 21>, 8, 'u', '=')),
 'offsets': (PandasBuffer({'bufsize': 32, 'ptr': 1480303312, 'device': 'CPU'}),
             (<DtypeKind.INT: 0>, 64, 'l', '=')),
 'validity': (PandasBuffer({'bufsize': 3, 'ptr': 1478277616, 'device': 'CPU'}),
              (<DtypeKind.BOOL: 20>, 8, 'b', '='))}
        data_buff  = PandasBuffer({'bufsize': 2, 'ptr': 3879522800, 'device': 'CPU'})
        data_type  = (<DtypeKind.STRING: 21>, 8, 'u', '=')
        describe_null = (<ColumnNullType.USE_BYTEMASK: 4>, 0)
        length     = 3
        offset     = 0
        offset_buff = PandasBuffer({'bufsize': 32, 'ptr': 1480303312, 'device': 'CPU'})
        offset_dtype = (<DtypeKind.INT: 0>, 64, 'l', '=')
        validity_buff = PandasBuffer({'bufsize': 3, 'ptr': 1478277616, 'device': 'CPU'})
        validity_dtype = (<DtypeKind.BOOL: 20>, 8, 'b', '=')
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   OverflowError: Python int too large to convert to C ssize_t


pyarrow/io.pxi:1990: OverflowError
______________________________________________ test_pandas_roundtrip_string_with_missing ______________________________________________

    @pytest.mark.pandas
    def test_pandas_roundtrip_string_with_missing():
        # See https://github.com/pandas-dev/pandas/issues/50554
        if Version(pd.__version__) < Version("1.6"):
            pytest.skip("Column.size() bug in pandas")
    
        arr = ["a", "", "c", None]
        table = pa.table({"a": pa.array(arr),
                          "a_large": pa.array(arr, type=pa.large_string())})
    
        from pandas.api.interchange import (
            from_dataframe as pandas_from_dataframe
        )
    
        if Version(pd.__version__) >= Version("2.0.2"):
            pandas_df = pandas_from_dataframe(table)
>           result = pi.from_dataframe(pandas_df)

arr        = ['a', '', 'c', None]
pandas_df  =      a a_large
0    a       a
1             
2    c       c
3  NaN     NaN
pandas_from_dataframe = <function from_dataframe at 0xebbaa398>
table      = pyarrow.Table
a: string
a_large: large_string
----
a: [["a","","c",null]]
a_large: [["a","","c",null]]

../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/tests/interchange/test_conversion.py:227: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/interchange/from_dataframe.py:113: in from_dataframe
    return _from_dataframe(df.__dataframe__(allow_copy=allow_copy),
        allow_copy = True
        df         =      a a_large
0    a       a
1             
2    c       c
3  NaN     NaN
../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/interchange/from_dataframe.py:136: in _from_dataframe
    batch = protocol_df_chunk_to_pyarrow(chunk, allow_copy)
        allow_copy = True
        batches    = []
        chunk      = <pandas.core.interchange.dataframe.PandasDataFrameXchg object at 0xda15b850>
        df         = <pandas.core.interchange.dataframe.PandasDataFrameXchg object at 0xda15b850>
../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/interchange/from_dataframe.py:182: in protocol_df_chunk_to_pyarrow
    columns[name] = column_to_array(col, allow_copy)
        allow_copy = True
        col        = <pandas.core.interchange.column.PandasColumn object at 0xda103210>
        columns    = {}
        df         = <pandas.core.interchange.dataframe.PandasDataFrameXchg object at 0xda15b850>
        dtype      = <DtypeKind.STRING: 21>
        name       = 'a'
../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/interchange/from_dataframe.py:214: in column_to_array
    data = buffers_to_array(buffers, data_type,
        allow_copy = True
        buffers    = {'data': (PandasBuffer({'bufsize': 2, 'ptr': 3879523744, 'device': 'CPU'}),
          (<DtypeKind.STRING: 21>, 8, 'u', '=')),
 'offsets': (PandasBuffer({'bufsize': 40, 'ptr': 1469510752, 'device': 'CPU'}),
             (<DtypeKind.INT: 0>, 64, 'l', '=')),
 'validity': (PandasBuffer({'bufsize': 4, 'ptr': 1475420176, 'device': 'CPU'}),
              (<DtypeKind.BOOL: 20>, 8, 'b', '='))}
        col        = <pandas.core.interchange.column.PandasColumn object at 0xda103210>
        data_type  = (<DtypeKind.STRING: 21>, 8, 'u', '=')
../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/interchange/from_dataframe.py:396: in buffers_to_array
    data_pa_buffer = pa.foreign_buffer(data_buff.ptr, data_buff.bufsize,
        _          = (<DtypeKind.STRING: 21>, 8, 'u', '=')
        allow_copy = True
        buffers    = {'data': (PandasBuffer({'bufsize': 2, 'ptr': 3879523744, 'device': 'CPU'}),
          (<DtypeKind.STRING: 21>, 8, 'u', '=')),
 'offsets': (PandasBuffer({'bufsize': 40, 'ptr': 1469510752, 'device': 'CPU'}),
             (<DtypeKind.INT: 0>, 64, 'l', '=')),
 'validity': (PandasBuffer({'bufsize': 4, 'ptr': 1475420176, 'device': 'CPU'}),
              (<DtypeKind.BOOL: 20>, 8, 'b', '='))}
        data_buff  = PandasBuffer({'bufsize': 2, 'ptr': 3879523744, 'device': 'CPU'})
        data_type  = (<DtypeKind.STRING: 21>, 8, 'u', '=')
        describe_null = (<ColumnNullType.USE_BYTEMASK: 4>, 0)
        length     = 4
        offset     = 0
        offset_buff = PandasBuffer({'bufsize': 40, 'ptr': 1469510752, 'device': 'CPU'})
        offset_dtype = (<DtypeKind.INT: 0>, 64, 'l', '=')
        validity_buff = PandasBuffer({'bufsize': 4, 'ptr': 1475420176, 'device': 'CPU'})
        validity_dtype = (<DtypeKind.BOOL: 20>, 8, 'b', '=')
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   OverflowError: Python int too large to convert to C ssize_t


pyarrow/io.pxi:1990: OverflowError
__________________________________________________ test_pandas_roundtrip_categorical __________________________________________________

    @pytest.mark.pandas
    def test_pandas_roundtrip_categorical():
        if Version(pd.__version__) < Version("2.0.2"):
            pytest.skip("Bitmasks not supported in pandas interchange implementation")
    
        arr = ["Mon", "Tue", "Mon", "Wed", "Mon", "Thu", "Fri", "Sat", None]
        table = pa.table(
            {"weekday": pa.array(arr).dictionary_encode()}
        )
    
        from pandas.api.interchange import (
            from_dataframe as pandas_from_dataframe
        )
        pandas_df = pandas_from_dataframe(table)
>       result = pi.from_dataframe(pandas_df)

arr        = ['Mon', 'Tue', 'Mon', 'Wed', 'Mon', 'Thu', 'Fri', 'Sat', None]
pandas_df  =   weekday
0     Mon
1     Tue
2     Mon
3     Wed
4     Mon
5     Thu
6     Fri
7     Sat
8     NaN
pandas_from_dataframe = <function from_dataframe at 0xebbaa398>
table      = pyarrow.Table
weekday: dictionary<values=string, indices=int32, ordered=0>
----
weekday: [  -- dictionary:
["Mon","Tue","Wed","Thu","Fri","Sat"]  -- indices:
[0,1,0,2,0,3,4,5,null]]

../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/tests/interchange/test_conversion.py:257: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/interchange/from_dataframe.py:113: in from_dataframe
    return _from_dataframe(df.__dataframe__(allow_copy=allow_copy),
        allow_copy = True
        df         =   weekday
0     Mon
1     Tue
2     Mon
3     Wed
4     Mon
5     Thu
6     Fri
7     Sat
8     NaN
../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/interchange/from_dataframe.py:136: in _from_dataframe
    batch = protocol_df_chunk_to_pyarrow(chunk, allow_copy)
        allow_copy = True
        batches    = []
        chunk      = <pandas.core.interchange.dataframe.PandasDataFrameXchg object at 0xd9e217f0>
        df         = <pandas.core.interchange.dataframe.PandasDataFrameXchg object at 0xd9e217f0>
../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/interchange/from_dataframe.py:186: in protocol_df_chunk_to_pyarrow
    columns[name] = categorical_column_to_dictionary(col, allow_copy)
        allow_copy = True
        col        = <pandas.core.interchange.column.PandasColumn object at 0xda180550>
        columns    = {}
        df         = <pandas.core.interchange.dataframe.PandasDataFrameXchg object at 0xd9e217f0>
        dtype      = <DtypeKind.CATEGORICAL: 23>
        name       = 'weekday'
../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/interchange/from_dataframe.py:293: in categorical_column_to_dictionary
    dictionary = column_to_array(cat_column)
        allow_copy = True
        cat_column = <pandas.core.interchange.column.PandasColumn object at 0xda1801d0>
        categorical = {'categories': <pandas.core.interchange.column.PandasColumn object at 0xda1801d0>,
 'is_dictionary': True,
 'is_ordered': False}
        col        = <pandas.core.interchange.column.PandasColumn object at 0xda180550>
../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/interchange/from_dataframe.py:214: in column_to_array
    data = buffers_to_array(buffers, data_type,
        allow_copy = True
        buffers    = {'data': (PandasBuffer({'bufsize': 18, 'ptr': 3659006432, 'device': 'CPU'}),
          (<DtypeKind.STRING: 21>, 8, 'u', '=')),
 'offsets': (PandasBuffer({'bufsize': 56, 'ptr': 1466456352, 'device': 'CPU'}),
             (<DtypeKind.INT: 0>, 64, 'l', '=')),
 'validity': (PandasBuffer({'bufsize': 6, 'ptr': 1477427216, 'device': 'CPU'}),
              (<DtypeKind.BOOL: 20>, 8, 'b', '='))}
        col        = <pandas.core.interchange.column.PandasColumn object at 0xda1801d0>
        data_type  = (<DtypeKind.STRING: 21>, 8, 'u', '=')
../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/interchange/from_dataframe.py:396: in buffers_to_array
    data_pa_buffer = pa.foreign_buffer(data_buff.ptr, data_buff.bufsize,
        _          = (<DtypeKind.STRING: 21>, 8, 'u', '=')
        allow_copy = True
        buffers    = {'data': (PandasBuffer({'bufsize': 18, 'ptr': 3659006432, 'device': 'CPU'}),
          (<DtypeKind.STRING: 21>, 8, 'u', '=')),
 'offsets': (PandasBuffer({'bufsize': 56, 'ptr': 1466456352, 'device': 'CPU'}),
             (<DtypeKind.INT: 0>, 64, 'l', '=')),
 'validity': (PandasBuffer({'bufsize': 6, 'ptr': 1477427216, 'device': 'CPU'}),
              (<DtypeKind.BOOL: 20>, 8, 'b', '='))}
        data_buff  = PandasBuffer({'bufsize': 18, 'ptr': 3659006432, 'device': 'CPU'})
        data_type  = (<DtypeKind.STRING: 21>, 8, 'u', '=')
        describe_null = (<ColumnNullType.USE_BYTEMASK: 4>, 0)
        length     = 6
        offset     = 0
        offset_buff = PandasBuffer({'bufsize': 56, 'ptr': 1466456352, 'device': 'CPU'})
        offset_dtype = (<DtypeKind.INT: 0>, 64, 'l', '=')
        validity_buff = PandasBuffer({'bufsize': 6, 'ptr': 1477427216, 'device': 'CPU'})
        validity_dtype = (<DtypeKind.BOOL: 20>, 8, 'b', '=')
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   OverflowError: Python int too large to convert to C ssize_t


pyarrow/io.pxi:1990: OverflowError
________________________________________________________ test_empty_dataframe _________________________________________________________

    def test_empty_dataframe():
        schema = pa.schema([('col1', pa.int8())])
        df = pa.table([[]], schema=schema)
        dfi = df.__dataframe__()
>       assert pi.from_dataframe(dfi) == df

df         = pyarrow.Table
col1: int8
----
col1: [[]]
dfi        = <pyarrow.interchange.dataframe._PyArrowDataFrame object at 0xd98381d0>
schema     = col1: int8

../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/tests/interchange/test_conversion.py:522: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/interchange/from_dataframe.py:113: in from_dataframe
    return _from_dataframe(df.__dataframe__(allow_copy=allow_copy),
        allow_copy = True
        df         = <pyarrow.interchange.dataframe._PyArrowDataFrame object at 0xd98381d0>
../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/interchange/from_dataframe.py:140: in _from_dataframe
    batch = protocol_df_chunk_to_pyarrow(df)
        allow_copy = True
        batches    = []
        df         = <pyarrow.interchange.dataframe._PyArrowDataFrame object at 0xd96e41b0>
../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/interchange/from_dataframe.py:182: in protocol_df_chunk_to_pyarrow
    columns[name] = column_to_array(col, allow_copy)
        allow_copy = True
        col        = <pyarrow.interchange.column._PyArrowColumn object at 0xd96a6650>
        columns    = {}
        df         = <pyarrow.interchange.dataframe._PyArrowDataFrame object at 0xd96e41b0>
        dtype      = <DtypeKind.INT: 0>
        name       = 'col1'
../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/interchange/from_dataframe.py:214: in column_to_array
    data = buffers_to_array(buffers, data_type,
        allow_copy = True
        buffers    = {'data': (PyArrowBuffer({'bufsize': 0, 'ptr': 4122363392, 'device': 'CPU'}),
          (<DtypeKind.INT: 0>, 8, 'c', '=')),
 'offsets': None,
 'validity': None}
        col        = <pyarrow.interchange.column._PyArrowColumn object at 0xd96a6650>
        data_type  = (<DtypeKind.INT: 0>, 8, 'c', '=')
../work/apache-arrow-15.0.0/python-python3_11/install/usr/lib/python3.11/site-packages/pyarrow/interchange/from_dataframe.py:396: in buffers_to_array
    data_pa_buffer = pa.foreign_buffer(data_buff.ptr, data_buff.bufsize,
        _          = (<DtypeKind.INT: 0>, 8, 'c', '=')
        allow_copy = True
        buffers    = {'data': (PyArrowBuffer({'bufsize': 0, 'ptr': 4122363392, 'device': 'CPU'}),
          (<DtypeKind.INT: 0>, 8, 'c', '=')),
 'offsets': None,
 'validity': None}
        data_buff  = PyArrowBuffer({'bufsize': 0, 'ptr': 4122363392, 'device': 'CPU'})
        data_type  = (<DtypeKind.INT: 0>, 8, 'c', '=')
        describe_null = (<ColumnNullType.NON_NULLABLE: 0>, None)
        length     = 0
        offset     = 0
        offset_buff = None
        validity_buff = None
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   OverflowError: Python int too large to convert to C ssize_t


pyarrow/io.pxi:1990: OverflowError

Full build & test log (2.5M): pyarrow.txt

This is arrow 15.0.0 on Gentoo, with x86 systemd-nspawn container. I've used -O2 -march=pentium-m -mfpmath=sse -pipe as compiler flags, to rule out i387-specific issues.

>>> pyarrow.show_info()
pyarrow version info
--------------------
Package kind              : not indicated
Arrow C++ library version : 15.0.0  
Arrow C++ compiler        : GNU 13.2.1
Arrow C++ compiler flags  : -O2 -march=pentium-m -mfpmath=sse -pipe
Arrow C++ git revision    :         
Arrow C++ git description :         
Arrow C++ build type      : relwithdebinfo

Platform:
  OS / Arch           : Linux x86_64
  SIMD Level          : avx2    
  Detected SIMD Level : avx2    

Memory:
  Default backend     : system  
  Bytes allocated     : 0 bytes 
  Max memory          : 0 bytes 
  Supported Backends  : system  

Optional modules:
  csv                 : Enabled 
  cuda                : -       
  dataset             : Enabled 
  feather             : Enabled 
  flight              : -       
  fs                  : Enabled 
  gandiva             : -       
  json                : Enabled 
  orc                 : -       
  parquet             : Enabled 

Filesystems:
  GcsFileSystem       : -       
  HadoopFileSystem    : Enabled 
  S3FileSystem        : -       

Compression Codecs:
  brotli              : Enabled 
  bz2                 : Enabled 
  gzip                : Enabled 
  lz4_frame           : Enabled 
  lz4                 : Enabled 
  snappy              : Enabled 
  zstd                : Enabled

Some of these might be problems inside pandas. I'm going to file a bug about the test failures there in a minute, and link it here afterwards.

Component(s)

Python

The text was updated successfully, but these errors were encountered:

mgorny · 2024-02-20T11:53:34Z

pandas counterpart: pandas-dev/pandas#57523

pitrou · 2024-02-20T12:55:40Z

Perhaps you want to submit a PR for this? From what I can tell, this should be mostly a matter of fixing the tests.

mgorny · 2024-02-20T14:43:33Z

I can try but I can't promise I'll come up with anything that makes sense. I have almost no knowledge of numpy, and none of arrow/pandas, so it'll be all guesswork.

…t platforms Use `uintptr_t` rather than `intptr_t` to fix `OverflowError`, visible e.g. when running `tests/interchange/test_conversion.py` tests on 32-bit platforms.

pitrou · 2024-02-20T16:06:52Z

Also cc @jorisvandenbossche

mgorny · 2024-02-20T16:17:53Z

Looking at the other failures:

FAILED tests/test_array.py::test_dictionary_to_numpy - TypeError: Cannot cast array data from dtype('int64') to dtype('int32') according to the rule 'safe'

I haven't figured this out yet. It might be a bug in pandas. FWICS we're constructing two arrays, and they end up having different types inside.

FAILED tests/test_io.py::test_python_file_large_seeks - assert 5 == ((2 ** 32) + 5)
FAILED tests/test_io.py::test_memory_map_large_seeks - OSError: Read out of bounds (offset = 4294967301, size = 5) in file of size 10

This could be a bug in pyarrow (or arrow itself) — either it didn't build with Large File Support, or it doesn't use correct types somewhere in between.

FAILED tests/test_pandas.py::TestConvertStructTypes::test_from_numpy_nested - AssertionError: assert 8 == 12
FAILED tests/test_schema.py::test_schema_sizeof - assert 28 > 30

I think the first one is because np.object_ is 32-bit rather than 64-bit, but I'm not sure about the second one.

Update the size assumptions in tests to account for size differences on 32-bit platforms: `pd.object_` is 4 bytes rather than 8 bytes, and `pa.schema` is twice smaller.

…forms (#40158) Use `uintptr_t` rather than `intptr_t` to fix `OverflowError`, visible e.g. when running `tests/interchange/test_conversion.py` tests on 32-bit platforms. ### Rationale for this change This fixes the `OverflowError`s from #40153, and makes `pyarrow/tests/interchange/` all pass on 32-bit x86. ### What changes are included in this PR? - change the type used to store pointer from `intptr_t` to `uintptr_t` to provide coverage for pointers above `0x80000000`. ### Are these changes tested? These changes are covered by the tests in `pyarrow/tests/interchange`. ### Are there any user-facing changes? It fixes `OverflowError` that can be triggered by working with pandas data types, possibly more (though I'm not sure if this qualifies as a "crash"). * Closes: #40153 Authored-by: Michał Górny <mgorny@gentoo.org> Signed-off-by: Antoine Pitrou <antoine@python.org>

### What changes are included in this PR? Add a Debian-based i386 test build for Python, similar to the existing one for C++. ### Are these changes tested? Yes. The test suite step in the new build will fail until GH-40153 is entirely fixed. ### Are there any user-facing changes? No. * Closes: #40159 Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>

mgorny · 2024-02-21T03:37:55Z

Should we reopen the bug for the remaining issues or should I file a new one?

pitrou · 2024-02-21T08:57:58Z

Oops, sorry, I didn't mean to close it. Let's reopen it.

### Rationale for this change This fixes two tests on 32-bit platforms (tested on x86 specifically). ### What changes are included in this PR? - update the `pd.object_` size assumption to 4 bytes on 32-bit platforms - update the `pa.schema` size assumptions to be twice smaller on 32-bit platforms ### Are these changes tested? The changes fix tests. ### Are there any user-facing changes? Only test fixes. * Closes: #40153 Authored-by: Michał Górny <mgorny@gentoo.org> Signed-off-by: Antoine Pitrou <antoine@python.org>

pitrou · 2024-02-21T10:01:20Z

I've opened #40176 for the large file-related failures.

mgorny · 2024-02-21T15:53:07Z

By the way, I've gotten a reply that the two cases of ValueError: putmask: output array is read-only in pandas-dev/pandas#57523 "could potentially be related to bugs in Arrow".

…t platforms (apache#40158) Use `uintptr_t` rather than `intptr_t` to fix `OverflowError`, visible e.g. when running `tests/interchange/test_conversion.py` tests on 32-bit platforms. ### Rationale for this change This fixes the `OverflowError`s from apache#40153, and makes `pyarrow/tests/interchange/` all pass on 32-bit x86. ### What changes are included in this PR? - change the type used to store pointer from `intptr_t` to `uintptr_t` to provide coverage for pointers above `0x80000000`. ### Are these changes tested? These changes are covered by the tests in `pyarrow/tests/interchange`. ### Are there any user-facing changes? It fixes `OverflowError` that can be triggered by working with pandas data types, possibly more (though I'm not sure if this qualifies as a "crash"). * Closes: apache#40153 Authored-by: Michał Górny <mgorny@gentoo.org> Signed-off-by: Antoine Pitrou <antoine@python.org>

…che#40164) ### What changes are included in this PR? Add a Debian-based i386 test build for Python, similar to the existing one for C++. ### Are these changes tested? Yes. The test suite step in the new build will fail until apacheGH-40153 is entirely fixed. ### Are there any user-facing changes? No. * Closes: apache#40159 Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>

…apache#40165) ### Rationale for this change This fixes two tests on 32-bit platforms (tested on x86 specifically). ### What changes are included in this PR? - update the `pd.object_` size assumption to 4 bytes on 32-bit platforms - update the `pa.schema` size assumptions to be twice smaller on 32-bit platforms ### Are these changes tested? The changes fix tests. ### Are there any user-facing changes? Only test fixes. * Closes: apache#40153 Authored-by: Michał Górny <mgorny@gentoo.org> Signed-off-by: Antoine Pitrou <antoine@python.org>

…n build (apache#40176) ### Rationale for this change Python large file tests fail on 32-bit platforms. ### What changes are included in this PR? 1. Fix passing `int64_t` position to the Python file methods when a Python file object is wrapped in an Arrow `RandomAccessFile` 2. Disallow creating a `MemoryMappedFile` spanning more than the `size_t` maximum, instead of silently truncating its length ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * GitHub Issue: apache#40153 Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>

…latforms

pitrou · 2024-02-29T16:00:12Z

Taken together, PRs #40293, #40294 and #40295 should fix the remaining failures.

mgorny · 2024-02-29T17:02:43Z

Thanks a lot! I can confirm that they fix the remaining issues.

thesamesam · 2024-02-29T17:14:54Z

Thank you for putting the time into this.

### Rationale for this change `Array.to_numpy` calls `np.take` to linearize dictionary arrays. This fails on 32-bit Numpy builds because we give Numpy 64-bit indices and Numpy would like to downcast them. ### What changes are included in this PR? Avoid calling `np.take`, instead using our own dictionary decoding routine. ### Are these changes tested? Yes. A test failure is fixed on 32-bit. ### Are there any user-facing changes? No. * GitHub Issue: #40153 Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>

…ms (#40294) ### Rationale for this change `Tensor.__getbuffer__` would silently assume that `Py_ssize_t` is the same width as `int64_t`, which is true only on 64-bit platforms. ### What changes are included in this PR? Create an internal buffer of `Py_ssize_t` values mirroring a Tensor's shape and strides, to avoid relying on the aforementioned assumption. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * GitHub Issue: #40153 Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>

### Rationale for this change `test_gdb.py` would fail on 32-bit platforms because the gdb extension errors out when a timestamp value is larger than the platform's time_t. ### What changes are included in this PR? 1. Catch `OverflowError` from the Python datetime module when trying to format a timestamp 2. Tweak the expected test results on 32-bit ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * GitHub Issue: #40153 Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>

pitrou · 2024-03-05T12:31:45Z

Issue resolved by pull request 40293
#40293

…latforms (apache#40294) ### Rationale for this change `Tensor.__getbuffer__` would silently assume that `Py_ssize_t` is the same width as `int64_t`, which is true only on 64-bit platforms. ### What changes are included in this PR? Create an internal buffer of `Py_ssize_t` values mirroring a Tensor's shape and strides, to avoid relying on the aforementioned assumption. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * GitHub Issue: apache#40153 Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>

…#40293) ### Rationale for this change `test_gdb.py` would fail on 32-bit platforms because the gdb extension errors out when a timestamp value is larger than the platform's time_t. ### What changes are included in this PR? 1. Catch `OverflowError` from the Python datetime module when trying to format a timestamp 2. Tweak the expected test results on 32-bit ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * GitHub Issue: apache#40153 Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>

…t platforms (apache#40158) Use `uintptr_t` rather than `intptr_t` to fix `OverflowError`, visible e.g. when running `tests/interchange/test_conversion.py` tests on 32-bit platforms. ### Rationale for this change This fixes the `OverflowError`s from apache#40153, and makes `pyarrow/tests/interchange/` all pass on 32-bit x86. ### What changes are included in this PR? - change the type used to store pointer from `intptr_t` to `uintptr_t` to provide coverage for pointers above `0x80000000`. ### Are these changes tested? These changes are covered by the tests in `pyarrow/tests/interchange`. ### Are there any user-facing changes? It fixes `OverflowError` that can be triggered by working with pandas data types, possibly more (though I'm not sure if this qualifies as a "crash"). * Closes: apache#40153 Authored-by: Michał Górny <mgorny@gentoo.org> Signed-off-by: Antoine Pitrou <antoine@python.org>

…che#40164) ### What changes are included in this PR? Add a Debian-based i386 test build for Python, similar to the existing one for C++. ### Are these changes tested? Yes. The test suite step in the new build will fail until apacheGH-40153 is entirely fixed. ### Are there any user-facing changes? No. * Closes: apache#40159 Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>

…apache#40165) ### Rationale for this change This fixes two tests on 32-bit platforms (tested on x86 specifically). ### What changes are included in this PR? - update the `pd.object_` size assumption to 4 bytes on 32-bit platforms - update the `pa.schema` size assumptions to be twice smaller on 32-bit platforms ### Are these changes tested? The changes fix tests. ### Are there any user-facing changes? Only test fixes. * Closes: apache#40153 Authored-by: Michał Górny <mgorny@gentoo.org> Signed-off-by: Antoine Pitrou <antoine@python.org>

…n build (apache#40176) ### Rationale for this change Python large file tests fail on 32-bit platforms. ### What changes are included in this PR? 1. Fix passing `int64_t` position to the Python file methods when a Python file object is wrapped in an Arrow `RandomAccessFile` 2. Disallow creating a `MemoryMappedFile` spanning more than the `size_t` maximum, instead of silently truncating its length ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * GitHub Issue: apache#40153 Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>

…che#40295) ### Rationale for this change `Array.to_numpy` calls `np.take` to linearize dictionary arrays. This fails on 32-bit Numpy builds because we give Numpy 64-bit indices and Numpy would like to downcast them. ### What changes are included in this PR? Avoid calling `np.take`, instead using our own dictionary decoding routine. ### Are these changes tested? Yes. A test failure is fixed on 32-bit. ### Are there any user-facing changes? No. * GitHub Issue: apache#40153 Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>

…latforms (apache#40294) ### Rationale for this change `Tensor.__getbuffer__` would silently assume that `Py_ssize_t` is the same width as `int64_t`, which is true only on 64-bit platforms. ### What changes are included in this PR? Create an internal buffer of `Py_ssize_t` values mirroring a Tensor's shape and strides, to avoid relying on the aforementioned assumption. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * GitHub Issue: apache#40153 Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>

…#40293) ### Rationale for this change `test_gdb.py` would fail on 32-bit platforms because the gdb extension errors out when a timestamp value is larger than the platform's time_t. ### What changes are included in this PR? 1. Catch `OverflowError` from the Python datetime module when trying to format a timestamp 2. Tweak the expected test results on 32-bit ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * GitHub Issue: apache#40153 Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>

mgorny added the Type: bug label Feb 20, 2024

github-actions bot added the Component: Python label Feb 20, 2024

mgorny mentioned this issue Feb 20, 2024

BUG: Test failures on 32-bit x86 with pyarrow installed pandas-dev/pandas#57523

Open

3 tasks

mgorny mentioned this issue Feb 20, 2024

GH-40153: [Python] Fix OverflowError in foreign_buffer on 32-bit platforms #40158

Merged

github-actions bot assigned mgorny Feb 20, 2024

pitrou mentioned this issue Feb 20, 2024

[Python][CI] Add 32-bit CI build #40159

Closed

pitrou mentioned this issue Feb 20, 2024

GH-40159: [Python][CI] Add 32-bit Debian build on Crossbow #40164

Merged

github-actions bot mentioned this issue Feb 20, 2024

GH-40153: [Python] Update size assumptions for 32-bit platforms #40165

Merged

kou changed the title ~~Test failures on 32-bit x86~~ [Python] Test failures on 32-bit x86 Feb 20, 2024

pitrou closed this as completed in #40158 Feb 20, 2024

pitrou reopened this Feb 21, 2024

pitrou closed this as completed in #40165 Feb 21, 2024

pitrou reopened this Feb 21, 2024

pitrou added a commit to pitrou/arrow that referenced this issue Feb 21, 2024

apacheGH-40153: [Python][C++] Fix test failures on 32-bit Python build

a9479e7

github-actions bot mentioned this issue Feb 21, 2024

GH-40153: [Python][C++] Fix large file handling on 32-bit Python build #40176

Merged

pitrou added a commit to pitrou/arrow that referenced this issue Feb 21, 2024

apacheGH-40153: [Python][C++] Fix test failures on 32-bit Python build

8e9aca3

pitrou added a commit to pitrou/arrow that referenced this issue Feb 29, 2024

apacheGH-40153: [C++][Python] Fix test_gdb failures on 32-bit

11012a9

github-actions bot mentioned this issue Feb 29, 2024

GH-40153: [C++][Python] Fix test_gdb failures on 32-bit #40293

Merged

pitrou added a commit to pitrou/arrow that referenced this issue Feb 29, 2024

apacheGH-40153: [Python] Make Tensor.__getbuffer__ work on 32-bit p…

3a7d8e2

…latforms

github-actions bot mentioned this issue Feb 29, 2024

GH-40153: [Python] Make Tensor.__getbuffer__ work on 32-bit platforms #40294

Merged

pitrou added a commit to pitrou/arrow that referenced this issue Feb 29, 2024

apacheGH-40153: [Python] Avoid using np.take in Array.to_numpy()

7fc1ec7

github-actions bot mentioned this issue Feb 29, 2024

GH-40153: [Python] Avoid using np.take in Array.to_numpy() #40295

Merged

pitrou added a commit to pitrou/arrow that referenced this issue Mar 5, 2024

apacheGH-40153: [C++][Python] Fix test_gdb failures on 32-bit

d42ed96

pitrou added this to the 16.0.0 milestone Mar 5, 2024

pitrou closed this as completed Mar 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Python] Test failures on 32-bit x86 #40153

[Python] Test failures on 32-bit x86 #40153

mgorny commented Feb 20, 2024

mgorny commented Feb 20, 2024

pitrou commented Feb 20, 2024

mgorny commented Feb 20, 2024

pitrou commented Feb 20, 2024

mgorny commented Feb 20, 2024

mgorny commented Feb 21, 2024

pitrou commented Feb 21, 2024

pitrou commented Feb 21, 2024

mgorny commented Feb 21, 2024

pitrou commented Feb 29, 2024

mgorny commented Feb 29, 2024

thesamesam commented Feb 29, 2024

pitrou commented Mar 5, 2024

[Python] Test failures on 32-bit x86 #40153

[Python] Test failures on 32-bit x86 #40153

Comments

mgorny commented Feb 20, 2024

Describe the bug, including details regarding any error messages, version, and platform.

Component(s)

mgorny commented Feb 20, 2024

pitrou commented Feb 20, 2024

mgorny commented Feb 20, 2024

pitrou commented Feb 20, 2024

mgorny commented Feb 20, 2024

mgorny commented Feb 21, 2024

pitrou commented Feb 21, 2024

pitrou commented Feb 21, 2024

mgorny commented Feb 21, 2024

pitrou commented Feb 29, 2024

mgorny commented Feb 29, 2024

thesamesam commented Feb 29, 2024

pitrou commented Mar 5, 2024