Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] accept pyarrow values / scalars in constructor functions ? #21761

Closed
asfimport opened this issue May 9, 2019 · 4 comments · Fixed by #36162
Closed

[Python] accept pyarrow values / scalars in constructor functions ? #21761

asfimport opened this issue May 9, 2019 · 4 comments · Fixed by #36162

Comments

@asfimport
Copy link

asfimport commented May 9, 2019

Currently, functions like `pyarrow.array` don't accept pyarrow Arrays, or also not scalars of it:

In [42]: arr = pa.array([1, 2, 3])

In [43]: pa.array(arr)
...
ArrowInvalid: Could not convert 1 with type pyarrow.lib.Int64Value: did not recognize Python value type when inferring an Arrow data type

In [44]: pa.array(list(arr))
...
ArrowInvalid: Could not convert 1 with type pyarrow.lib.Int64Value: did not recognize Python value type when inferring an Arrow data type

Do we want to allow those / recognize those here? (the first case could even have a fastpath, as we don't need to do it element by element).

Also scalars are not supported:

In [46]: type(arr.sum())
Out[46]: pyarrow.lib.Int64Scalar

In [47]: pa.array([arr.sum()])
...
ArrowInvalid: Could not convert 6 with type pyarrow.lib.Int64Scalar: did not recognize Python value type when inferring an Arrow data type

And also in other functions we don't accept arrow scalars / values:

In [48]: string = pa.array(['a'])[0]

In [49]: type(string)
Out[49]: pyarrow.lib.StringValue

In [50]: pa.field(string, pa.int64())
...
TypeError: expected bytes, pyarrow.lib.StringValue found

 

 

Reporter: Joris Van den Bossche / @jorisvandenbossche

Related issues:

Note: This issue was originally created as ARROW-5295. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Antoine Pitrou / @pitrou:
I think allowing Arrays is fine. I'm not sure we want to make Scalars first-class citizens like Numpy scalars, though. At least from an implementation POV, it may add a lot of code all over the place...

@asfimport
Copy link
Author

Joris Van den Bossche / @jorisvandenbossche:
Additional case (from ARROW-6222): pyarrow Arrays are also not recognized as list-like when converting/inferring a list array:

In [43]: pa.array([np.array([1, 1]), np.array([2, 2, 2])])                                                                                                                                                         
Out[43]: 
<pyarrow.lib.ListArray object at 0x7f258fa9d0a0>
[
  [
    1,
    1
  ],
  [
    2,
    2,
    2
  ]
]

In [44]: pa.array([pa.array([1, 1]), pa.array([2, 2, 2])])                                                                                                                                                         
---------------------------------------------------------------------------
ArrowInvalid                              Traceback (most recent call last)
<ipython-input-44-a22e0a500750> in <module>
----> 1 pa.array([pa.array([1, 1]), pa.array([2, 2, 2])])

~/scipy/repos/arrow/python/pyarrow/array.pxi in pyarrow.lib.array()

~/scipy/repos/arrow/python/pyarrow/array.pxi in pyarrow.lib._sequence_to_array()

~/scipy/repos/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status()

ArrowInvalid: Could not convert [
  1,
  1
] with type pyarrow.lib.Int64Array: did not recognize Python value type when inferring an Arrow data type

So list (or array) of numpy arrays works, but list of pyarrow arrays not. Again, not the most typical use case of pyarrow Arrays, so not sure we should add this capability.

(although we might want to find a general solution for array-like objects (eg pytorch.Tensors, see ARROW-6222), and a solution for that (somehow trying to coerce to a numpy array?) might also solve the case of a list of arrow arrays)

@asfimport
Copy link
Author

Wes McKinney / @wesm:
A "simple" workaround would be to invoke Scalars' as_py method if they're passed in. That would add perf overhead, though, since we'd need to do isinstance checks.

Another option is to "sanitize" inputs (using a helper function) only in the case of failure on the initial try. So the normal use case won't be affected

@jorisvandenbossche
Copy link
Member

The part about accepting an Array object is solved now with #34275. Leaving this issue open for the Scalar case.

@AlenkaF AlenkaF added this to the 12.0.0 milestone Mar 21, 2023
AlenkaF added a commit to AlenkaF/arrow that referenced this issue Mar 23, 2023
@AlenkaF AlenkaF removed this from the 12.0.0 milestone Apr 6, 2023
@AlenkaF AlenkaF added this to the 13.0.0 milestone Jun 15, 2023
jorisvandenbossche added a commit that referenced this issue Jul 5, 2023
### Rationale for this change

Currently, `pyarrow.array `doesn't accept list of pyarrow Scalars and this PR adds a check to allow that.

* Closes: #21761

Lead-authored-by: AlenkaF <frim.alenka@gmail.com>
Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com>
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment