Skip to content

[Python] Inferring / converting nested Numpy array is very slow #18790

@asfimport

Description

@asfimport

Converting a nested Numpy array nested walks over the Numpy data as Python objects, even if the dtype is not "object". This makes it pointlessly slow compared to the non-nested case, and even the nested Python list case:

>>> %%timeit data = list(range(10000))
...:pa.array(data)
...:
746 µs ± 8.36 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %%timeit data = np.arange(10000)
...:pa.array(data)
...:
81.1 µs ± 57.7 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> %%timeit data = [np.arange(10000)]
...:pa.array(data)
...:
3.39 ms ± 6.27 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Reporter: Antoine Pitrou / @pitrou
Assignee: Antoine Pitrou / @pitrou

PRs and other links:

Note: This issue was originally created as ARROW-2514. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions