Skip to content

ARROW-6570: [Python] Use Arrow's allocators for creating NumPy array instead of leaving it to NumPy#5398

Closed
wesm wants to merge 5 commits intoapache:masterfrom
wesm:ARROW-6570
Closed

ARROW-6570: [Python] Use Arrow's allocators for creating NumPy array instead of leaving it to NumPy#5398
wesm wants to merge 5 commits intoapache:masterfrom
wesm:ARROW-6570

Conversation

@wesm
Copy link
Copy Markdown
Member

@wesm wesm commented Sep 17, 2019

This has some benefits:

  • Move pandas-related memory allocations to the same default allocator as the rest of the Arrow platform, rather than mixing jemalloc and the system allocator as things currently are
  • NumPy/pandas-related memory allocations are now accounted for in pyarrow.total_allocated_bytes()
  • Better performance (10+% faster, from quick benchmarks) when using libraries with ARROW_JEMALLOC=ON

There are a couple other usages of the system allocator in arrow_to_pandas.cc but they are for smaller internal bits ("placement" arrays) of data relating to pandas. These can be fixed later if they are deemed bothersome

@wesm
Copy link
Copy Markdown
Member Author

wesm commented Sep 18, 2019

+1. If there are any follow ups on this I'll be happy to address them

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant