Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Creating Array with explicit string type fails on Python 2.7 #15886

Closed
asfimport opened this issue Nov 27, 2018 · 5 comments
Closed

Comments

@asfimport
Copy link

Pyarrow arrays of string cannot be created from numpy arrays of string anymore for versions pyarrow>=0.8.0 (this includes pyarrow==0.11.1).

Please find below a quick repro:

import numpy as np
import pyarrow as pa
vec = np.array(["toto", "tata"])
pa.array(vec, pa.string())

Runing this I get the following:

---------------------------------------------------------------------------
ArrowInvalid                              Traceback (most recent call last)
<ipython-input-4-e753fb3a8193> in <module>()
----> 1 pa.array(vec, pa.string())

/usr/local/lib/python2.7/dist-packages/pyarrow/lib.so in pyarrow.lib.array()

/usr/local/lib/python2.7/dist-packages/pyarrow/lib.so in pyarrow.lib._ndarray_to_array()

/usr/local/lib/python2.7/dist-packages/pyarrow/lib.so in pyarrow.lib.check_status()

ArrowInvalid: 'utf32' codec can't decode bytes in position 0-3: code point not in range(0x110000)

However, this code snippet was working fine with pyarrow==0.7.1.

Was there any behavior change with string in pyarrow since 0.7.1?
Do you have any workaround for this?

Jacques

Reporter: jacques
Assignee: Wes McKinney / @wesm

PRs and other links:

Note: This issue was originally created as ARROW-3890. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Wes McKinney / @wesm:
Hm, this works fine for me

In [1]: paste
import numpy as np
import pyarrow as pa
vec = np.array(["toto", "tata"])
pa.array(vec, pa.string())

## -- End pasted text --
Out[1]: 
<pyarrow.lib.StringArray object at 0x7f777aa4b728>
[
  "toto",
  "tata"
]

Could you let us know more information about your environment?

@asfimport
Copy link
Author

Mathieu DESPRIEE:
Ok, that's a python2/python3 problem.
There's effectively a regression in py2, and it's working fine in py3.

@asfimport
Copy link
Author

Wes McKinney / @wesm:
Thanks, I renamed the issue. PRs welcome

@asfimport
Copy link
Author

Wes McKinney / @wesm:
This is actually an issue converting NumPy binary arrays. Here is the trace with -DARROW_EXTRA_ERROR_CONTEXT=on:

>   raise ArrowInvalid(message)
E   ArrowInvalid: ../src/arrow/python/numpy_to_arrow.cc:795 code: converter.Convert()
E   ../src/arrow/python/numpy_to_arrow.cc:660 code: AppendUTF32(data, itemsize_, byteorder, &builder)
E   ../src/arrow/python/numpy_to_arrow.cc:620 code: CheckPyError()
E   'utf32' codec can't decode bytes in position 0-3: code point not in range(0x110000)

@asfimport
Copy link
Author

Wes McKinney / @wesm:
Issue resolved by pull request 3063
#3063

@asfimport asfimport added this to the 0.12.0 milestone Jan 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants