Skip to content

ValueError with latest commit when using __or__ between pyarrow boolean columns with fillna(False) #12033

@MarcoGorelli

Description

@MarcoGorelli
import pandas as pd
import dask.dataframe as dd

df = dd.from_pandas(pd.DataFrame({'a': [True, False], 'b': [False, False]}).convert_dtypes(dtype_backend='pyarrow'))

print(df.assign(c=df['a'].fillna(False) | df['b'].fillna(False)).compute())
Traceback (most recent call last):
  File "/home/marcogorelli/polars-api-compat-dev/.venv/lib/python3.12/site-packages/dask/dataframe/dispatch.py", line 126, in make_meta
    return make_meta_dispatch(x, index=index)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/marcogorelli/polars-api-compat-dev/.venv/lib/python3.12/site-packages/dask/utils.py", line 781, in __call__
    return meth(arg, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/marcogorelli/polars-api-compat-dev/.venv/lib/python3.12/site-packages/dask/dataframe/backends.py", line 201, in _
    values = pa.chunked_array([v.array]).combine_chunks()
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pyarrow/table.pxi", line 1543, in pyarrow.lib.chunked_array
TypeError: Cannot convert pyarrow.lib.ChunkedArray to pyarrow.lib.Array

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/marcogorelli/polars-api-compat-dev/t.py", line 8, in <module>
    print(df.assign(c=df['a'].fillna(False) | df['b'].fillna(False)).compute())
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/marcogorelli/polars-api-compat-dev/.venv/lib/python3.12/site-packages/dask/dataframe/dask_expr/_collection.py", line 2842, in assign
    result = new_collection(expr.Assign(result, *args))
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/marcogorelli/polars-api-compat-dev/.venv/lib/python3.12/site-packages/dask/_collections.py", line 8, in new_collection
    meta = expr._meta
           ^^^^^^^^^^
  File "/home/marcogorelli/.local/share/uv/python/cpython-3.12.8-linux-x86_64-gnu/lib/python3.12/functools.py", line 995, in __get__
    val = self.func(instance)
          ^^^^^^^^^^^^^^^^^^^
  File "/home/marcogorelli/polars-api-compat-dev/.venv/lib/python3.12/site-packages/dask/dataframe/dask_expr/_expr.py", line 1985, in _meta
    return make_meta(self.operation(*args, **self._kwargs))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/marcogorelli/polars-api-compat-dev/.venv/lib/python3.12/site-packages/dask/dataframe/dispatch.py", line 135, in make_meta
    return func(x, index=index)
           ^^^^^^^^^^^^^^^^^^^^
  File "/home/marcogorelli/polars-api-compat-dev/.venv/lib/python3.12/site-packages/dask/dataframe/backends.py", line 315, in make_meta_object
    raise ValueError(f"Expected iterable of tuples of (name, dtype), got {x}")
ValueError: Expected iterable of tuples of (name, dtype), got Empty DataFrame
Columns: [a, b, c]
Index: []

this came up in the Narwhals CI

>>> dask.__version__
'2025.7.0+2.g7751beb21'
>>> pandas.__version__
'3.0.0.dev0+2247.g6a6a1bab4e'

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs triageNeeds a response from a contributor

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions