Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] DictionaryArray.from_buffers should not crash #30052

Closed
asfimport opened this issue Oct 28, 2021 · 1 comment
Closed

[Python] DictionaryArray.from_buffers should not crash #30052

asfimport opened this issue Oct 28, 2021 · 1 comment

Comments

@asfimport
Copy link
Collaborator

From https://stackoverflow.com/questions/69746789/how-to-make-a-pyarrow-dictionaryarray-with-extensiontype-using-from-buffers-us

Trying to create a DictionaryArray with from_buffers crashes:

>>> import pyarrow as pa
>>> a = pa.array(["one", "two", "three", "two", "one"]).dictionary_encode()
>>> b = pa.DictionaryArray.from_buffers(a.type, len(a), a.indices.buffers())
../src/arrow/array/array_dict.cc:83:  Check failed: (data->dictionary) != (nullptr) 
/home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(+0x11bcb26)[0x7fa850076b26]
/home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(+0x11bcaa4)[0x7fa850076aa4]
/home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(+0x11bcac6)[0x7fa850076ac6]
/home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(_ZN5arrow4util8ArrowLogD1Ev+0x47)[0x7fa850076e25]
/home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(_ZN5arrow15DictionaryArrayC2ERKSt10shared_ptrINS_9ArrayDataEE+0x1b9)[0x7fa84fad33fb]
/home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(_ZN9__gnu_cxx13new_allocatorIN5arrow15DictionaryArrayEE9constructIS2_JRKSt10shared_ptrINS1_9ArrayDataEEEEEvPT_DpOT0_+0x49)[0x7fa84fc0f9f5]
/home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(_ZNSt16allocator_traitsISaIN5arrow15DictionaryArrayEEE9constructIS1_JRKSt10shared_ptrINS0_9ArrayDataEEEEEvRS2_PT_DpOT0_+0x38)[0x7fa84fc0d44d]
/home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(_ZNSt23_Sp_counted_ptr_inplaceIN5arrow15DictionaryArrayESaIS1_ELN9__gnu_cxx12_Lock_policyE2EEC2IJRKSt10shared_ptrINS0_9ArrayDataEEEEES2_DpOT_+0xaf)[0x7fa84fc0a027]
/home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(_ZNSt14__shared_countILN9__gnu_cxx12_Lock_policyE2EEC2IN5arrow15DictionaryArrayESaIS5_EJRKSt10shared_ptrINS4_9ArrayDataEEEEERPT_St20_Sp_alloc_shared_tagIT0_EDpOT1_+0xb2)[0x7fa84fc04560]
/home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(_ZNSt12__shared_ptrIN5arrow15DictionaryArrayELN9__gnu_cxx12_Lock_policyE2EEC1ISaIS1_EJRKSt10shared_ptrINS0_9ArrayDataEEEEESt20_Sp_alloc_shared_tagIT_EDpOT0_+0x4c)[0x7fa84fbffcdc]
/home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(_ZNSt10shared_ptrIN5arrow15DictionaryArrayEEC2ISaIS1_EJRKS_INS0_9ArrayDataEEEEESt20_Sp_alloc_shared_tagIT_EDpOT0_+0x39)[0x7fa84fbfd8f9]
/home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(_ZSt15allocate_sharedIN5arrow15DictionaryArrayESaIS1_EJRKSt10shared_ptrINS0_9ArrayDataEEEES3_IT_ERKT0_DpOT1_+0x38)[0x7fa84fbfb500]
/home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(_ZSt11make_sharedIN5arrow15DictionaryArrayEJRKSt10shared_ptrINS0_9ArrayDataEEEES2_IT_EDpOT0_+0x54)[0x7fa84fbf7be6]
/home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(+0xd36104)[0x7fa84fbf0104]
/home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(+0xd2f2f8)[0x7fa84fbe92f8]
/home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.700(_ZN5arrow9MakeArrayERKSt10shared_ptrINS_9ArrayDataEE+0x99)[0x7fa84fbe1d3d]

I don't know if this can ever work with the current signature, since you can only pass buffers and not the dictionary itself (which is not included in the buffers). In C++ there is an ArrayData::Make that in addition also takes a dictionary. I think we should add a custom from_buffers on DictionaryArray in cython to use that instead of the base class from_buffers implementation.

Reporter: Joris Van den Bossche / @jorisvandenbossche
Assignee: Miles Granger / @milesgranger

PRs and other links:

Note: This issue was originally created as ARROW-14495. Please see the migration documentation for further details.

@asfimport
Copy link
Collaborator Author

Antoine Pitrou / @pitrou:
Issue resolved by pull request 13989
#13989

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants