Skip to content

[Python] Implement unification of null dictionaries #39927

@alanhdu

Description

@alanhdu

Describe the enhancement requested

Currently, if you try to do something like:

a = pa.array([None, None, None]).cast(pa.dictionary(pa.int8(), pa.null()))
b = pa.array([None, None, None]).cast(pa.dictionary(pa.int8(), pa.null()))

pa.chunked_array([a, b]).unify_dictionaries()

PyArrow will raise an exception like:

File pyarrow/table.pxi:1206, in pyarrow.lib.ChunkedArray.unify_dictionaries()

File pyarrow/error.pxi:154, in pyarrow.lib.pyarrow_internal_check_status()

File pyarrow/error.pxi:91, in pyarrow.lib.check_status()

ArrowNotImplementedError: Unification of null dictionaries is not implemented

Is there any way to just implement this? Manually calling .combine_chunks() seems to work fine, so the logic is clearly implemented somewhere.

a = pa.array([None, None, None]).cast(pa.dictionary(pa.int8(), pa.null()))
b = pa.array([None, None, None]).cast(pa.dictionary(pa.int8(), pa.null()))

pa.chunked_array([a, b]).combine_chunks()

I admit this might be a very niche feature (I am confused why I have feather files with these dictionary-encoded nulls, but I do and it'd be nice to be able to handle them).

Versions

I am testing this on Python 3.10 and pyarrow 15.

Component(s)

Python

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions