You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I don't know a workaround. Converting to pylist and back is too slow. Is there a way to copy the slice to a new offset-0 StringArray that I could then dictionary-encode? Otherwise, I'm considering building buffers by hand....
This seems to be specific to the string type, as I don't see a similar bug for integer type:
In [7]: a = pa.array(['a', 'b', 'c', 'b'])
In [9]: a[1:].dictionary_encode()
Out[9]:
<pyarrow.lib.DictionaryArrayobjectat0x7f677975e128>
-- dictionary:
[
"c",
"b",
""
]
-- indices:
[
0,
1,
2
]
In [10]: a = pa.array([1, 2, 3, 2])
In [12]: a[1:].dictionary_encode()
Out[12]:
<pyarrow.lib.DictionaryArrayobjectat0x7f6776f5f208>
-- dictionary:
[
2,
3
]
-- indices:
[
0,
1,
0
]
Is there a way to copy the slice to a new offset-0 StringArray that I could then dictionary-encode?
At least in the current pyarrow API, I don't think such a functionality is exposed (apart from getting buffers, slicing/copying, and recreating an array)
Steps to reproduce:
Expected results:
Actual results:
I don't know a workaround. Converting to pylist and back is too slow. Is there a way to copy the slice to a new offset-0 StringArray that I could then dictionary-encode? Otherwise, I'm considering building buffers by hand....
Environment: Docker on Linux 5.2.18-200.fc30.x86_64; Python 3.7.4
Reporter: Adam Hooper / @adamhooper
Assignee: Antoine Pitrou / @pitrou
PRs and other links:
Note: This issue was originally created as ARROW-7266. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: