Skip to content

Conversation

@wesm
Copy link
Member

@wesm wesm commented Feb 4, 2019

This blog shows how we were able to significant improve performance and memory use in common cases when converting from the Arrow string memory layout to pandas's native memory model based on NumPy arrays of Python objects.

Change-Id: I6e87debdd41565bf8921ea05b997568aa4eb0fa3
@wesm
Copy link
Member Author

wesm commented Feb 4, 2019

Here's a published version https://wesm.github.io/arrow-site-test/blog/2019/02/04/python-string-memory-0.12/

test publishing the website really isn't that easy, and a number of things are broken...

Change-Id: I6e8cd52878c06474ce45f4feb9713bb3f74cda53
@fsaintjacques
Copy link
Contributor

LGTM

Copy link
Member

@xhochy xhochy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, some small improvements but then this can go out tomorrow.

We can use the `memory_profiler` Python package to easily get process memory
usage within a running Python application.

```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
```
```python


## Memory and Performance Benchmarks

We can use the `memory_profiler` Python package to easily get process memory
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We can use the `memory_profiler` Python package to easily get process memory
We can use the [`memory_profiler`][2] Python package to easily get process memory

provide fast and memory-efficient interoperability with pandas and other
popular libraries.

[1]: https://www.slideshare.net/xhochy/extending-pandas-using-apache-arrow-and-numba No newline at end of file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[1]: https://www.slideshare.net/xhochy/extending-pandas-using-apache-arrow-and-numba
[1]: https://www.slideshare.net/xhochy/extending-pandas-using-apache-arrow-and-numba
[2]: https://pypi.org/project/memory-profiler/

Change-Id: I872f47f6ba079c6f06e1f520a0cb7747411c897a
@wesm wesm closed this in 9af5a70 Feb 5, 2019
@wesm wesm deleted the python-string-memory-0.12 branch February 5, 2019 15:07
@wesm
Copy link
Member Author

wesm commented Feb 5, 2019

Oops sorry I missed these edits @xhochy

xhochy pushed a commit that referenced this pull request Feb 8, 2019
…in Arrow 0.12

This blog shows how we were able to significant improve performance and memory use in common cases when converting from the Arrow string memory layout to pandas's native memory model based on NumPy arrays of Python objects.

Author: Wes McKinney <wesm+git@apache.org>

Closes #3553 from wesm/python-string-memory-0.12 and squashes the following commits:

f0d684d <Wes McKinney> Update publication date
2bbb92d <Wes McKinney> Fix some base urls
c624e55 <Wes McKinney> Draft blog post about string memory use work in Arrow 0.12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants