-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
ARROW-838: [Python] Expand pyarrow.array to handle NumPy arrays not o…
…riginating in pandas This unifies the ingest path for 1D data into `pyarrow.array`. I added the argument `from_pandas` to turn null sentinel checking on or off: ``` In [8]: arr = np.random.randn(10000000) In [9]: arr[::3] = np.nan In [10]: arr2 = pa.array(arr) In [11]: arr2.null_count Out[11]: 0 In [12]: %timeit arr2 = pa.array(arr) The slowest run took 5.43 times longer than the fastest. This could mean that an intermediate result is being cached. 10000 loops, best of 3: 68.4 µs per loop In [13]: arr2 = pa.array(arr, from_pandas=True) In [14]: arr2.null_count Out[14]: 3333334 In [15]: %timeit arr2 = pa.array(arr, from_pandas=True) 1 loop, best of 3: 228 ms per loop ``` When the data is contiguous, it is always zero-copy, but then `from_pandas=True` and no null mask is passed, then a null bitmap is constructed and populated. This also permits sequence reads into integers smaller than int64: ``` In [17]: pa.array([1, 2, 3, 4], type='i1') Out[17]: <pyarrow.lib.Int8Array object at 0x7ffa1c1c65e8> [ 1, 2, 3, 4 ] ``` Oh, I also added NumPy-like string type aliases: ``` In [18]: pa.int32() == 'i4' Out[18]: True ``` Author: Wes McKinney <wes.mckinney@twosigma.com> Closes #1146 from wesm/expand-py-array-method and squashes the following commits: 1570e52 [Wes McKinney] Code review comments d3bbb3c [Wes McKinney] Handle type aliases in cast, too 797f015 [Wes McKinney] Allow null checking to be skipped with from_pandas=False in pyarrow.array f2802fc [Wes McKinney] Cleaner codepath for numpy->arrow conversions 587c575 [Wes McKinney] Add direct types sequence converters for more data types cf40b76 [Wes McKinney] Add type aliases, some unit tests 7b530e4 [Wes McKinney] Consolidate both sequence and ndarray/Series/Index conversion in pyarrow.Array
- Loading branch information
Showing
16 changed files
with
698 additions
and
455 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.