Describe the bug, including details regarding any error messages, version, and platform.
Version: pyarrow 24.0.0 (regression vs. 23.0.1)
Platform: Linux (python 3.12), but not platform-specific
In pyarrow 24.0.0, pyarrow.lib.Scalar gained arithmetic dunder methods (__add__, __sub__, __mul__, __truediv__, __pow__, __neg__, and bitwise ops) in python/pyarrow/scalar.pxi:
def __add__(self, object other):
return _pc().call_function('add_checked', [self, other])
These implementations unconditionally dispatch to pyarrow.compute.call_function, which raises TypeError via _pack_compute_args when other is not a recognized pyarrow / list / tuple / ndarray type:
TypeError: Got unexpected argument type <class 'MyCustomColumn'> for compute function
Because a raised TypeError does NOT trigger Python's reflected-operator fallback (only a returned NotImplemented does, as you can see in Python data model, §3.3.8), any custom class that previously relied on its own __radd__ / __rmul__ / __rsub__ / __rtruediv__ to handle pyarrow.Scalar + my_obj is now broken. The user has no workaround from their side, as pyarrow.lib.Scalar is an immutable extension type and cannot be monkey-patched, and virtual subclass registration is not honored by CPython's binary-op dispatch (which uses PyType_IsSubtype at the C level).
Reproducer
import pyarrow
class MyCol:
def __radd__(self, other):
return "MyCol.__radd__ called"
s = pyarrow.scalar(5)
c = MyCol()
# Works on pyarrow <= 23 (Scalar had no __add__, so Python dispatches to MyCol.__radd__)
# Fails on pyarrow >= 24 with:
# TypeError: Got unexpected argument type <class '__main__.MyCol'> for compute function
print(s + c)
Expected: "MyCol.__radd__ called" (or at least a NotImplemented return from Scalar.__add__ so Python can fall back).
Actual: TypeError from _pack_compute_args.
Why this matters
Libraries that wrap pyarrow arrays with a richer Python class (like our Data Curator library with its DataColumn class , but also other downstream projects) have historically been able to make pyarrow.Scalar + custom_column work by implementing __radd__ on their class (and the same for the other reflected-operators). This pattern is now silently broken by an upgrade to 24.0.0, with no opt-out and no Python-level workaround.
Component(s)
Python
Describe the bug, including details regarding any error messages, version, and platform.
Version: pyarrow 24.0.0 (regression vs. 23.0.1)
Platform: Linux (python 3.12), but not platform-specific
In pyarrow 24.0.0,
pyarrow.lib.Scalargained arithmetic dunder methods (__add__,__sub__,__mul__,__truediv__,__pow__,__neg__, and bitwise ops) inpython/pyarrow/scalar.pxi:These implementations unconditionally dispatch to
pyarrow.compute.call_function, which raisesTypeErrorvia_pack_compute_argswhenotheris not a recognized pyarrow / list / tuple / ndarray type:Because a raised
TypeErrordoes NOT trigger Python's reflected-operator fallback (only a returnedNotImplementeddoes, as you can see in Python data model, §3.3.8), any custom class that previously relied on its own__radd__/__rmul__/__rsub__/__rtruediv__to handlepyarrow.Scalar + my_objis now broken. The user has no workaround from their side, aspyarrow.lib.Scalaris an immutable extension type and cannot be monkey-patched, and virtual subclass registration is not honored by CPython's binary-op dispatch (which usesPyType_IsSubtypeat the C level).Reproducer
Expected:
"MyCol.__radd__ called"(or at least aNotImplementedreturn fromScalar.__add__so Python can fall back).Actual:
TypeErrorfrom_pack_compute_args.Why this matters
Libraries that wrap pyarrow arrays with a richer Python class (like our Data Curator library with its
DataColumnclass , but also other downstream projects) have historically been able to makepyarrow.Scalar + custom_columnwork by implementing__radd__on their class (and the same for the other reflected-operators). This pattern is now silently broken by an upgrade to 24.0.0, with no opt-out and no Python-level workaround.Component(s)
Python