Skip to content

[Python] cannot create a chunked_array from dictionary_encoding result #23210

@asfimport

Description

@asfimport

I've experienced a strange error raise when trying to apply pa.chunked_array directly on the indices of dictionary_encoding (code is below). Making a memory view solves the problem.

import pyarrow as pa
ca = pa.array(['a', 'a', 'b', 'b', 'c'])                                                                                           
fca = ca.dictionary_encode()                                                                                                       
fca.indices                                                                                                                        
<pyarrow.lib.Int32Array object at 0x1250fb888>
[
  0,
  0,
  1,
  1,
  2
]

pa.chunked_array([fca.indices])                                                                                                    
---------------------------------------------------------------------------
ArrowInvalid                              Traceback (most recent call last)
<ipython-input-44-71ca3b877e1c> in <module>
----> 1 pa.chunked_array([fca.indices])

~/Projects/miniconda3/envs/pyarrow/lib/python3.7/site-packages/pyarrow/table.pxi in pyarrow.lib.chunked_array()

~/Projects/miniconda3/envs/pyarrow/lib/python3.7/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status()

ArrowInvalid: Unexpected dictionary values in array of type int32

# with another memory view it's  OK
pa.chunked_array([fca.indices.view(fca.indices.type)])                 
Out[45]: 
<pyarrow.lib.ChunkedArray object at 0x12508dc78>
[
  [
    0,
    0,
    1,
    1,
    2
  ]
]
 

Reporter: Artem KOZHEVNIKOV / @artemru
Assignee: Joris Van den Bossche / @jorisvandenbossche

PRs and other links:

Note: This issue was originally created as ARROW-6882. Please see the migration documentation for further details.

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions