New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python] ListArray.from_arrays does not check validity of input arrays #22527
Comments
Antoine Pitrou / @pitrou: |
Wes McKinney / @wesm: |
Joris Van den Bossche / @jorisvandenbossche: |
Antoine Pitrou / @pitrou: |
Joris Van den Bossche / @jorisvandenbossche: Not knowingly in detail how
|
Antoine Pitrou / @pitrou: Perhaps we need a separate Edit: oh, you're right about |
Antoine Pitrou / @pitrou: |
From #4979 (comment).
When creating a ListArray from offsets and values in python, there is no validation of the offsets that it starts with 0 and ends with the length of the array (but is that required? the docs seem to indicate that: https://github.com/apache/arrow/blob/master/docs/source/format/Layout.rst#list-type ("The first value in the offsets array is 0, and the last element is the length of the values array.").
The array you get "seems" ok (the repr), but on conversion to python or flattened arrays, things go wrong:
Calling
validate
manually correctly raises:In C++ the main constructors are not safe, and as the caller you need to ensure that the data is correct or call a safe (slower) constructor. But do we want to use the unsafe / fast constructors without validation in Python as default as well? Or should we do a call to
validate
here?A quick search seems to indicate that
pa.Array.from_buffers
does validation, but otherfrom_arrays
method don't seem to explicitly do this.Reporter: Joris Van den Bossche / @jorisvandenbossche
Assignee: Joris Van den Bossche / @jorisvandenbossche
PRs and other links:
Note: This issue was originally created as ARROW-6132. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: