Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize server conversions for client-server array transfers #880

Merged
merged 1 commit into from Jul 14, 2021

Conversation

ronawho
Copy link
Contributor

@ronawho ronawho commented Jul 12, 2021

Improve the performance of ak.array() and pdarray.to_ndarray() by
optimizing how the server converts between bytes and pdarrays.
Previously, the server would write to and read from a big-endian memory
mapped file to convert between bytes and arrays but this is fairly
slow. Optimize this conversion by directly interpreting the underlying
memory as the type we're converting to. In the ak.array() case we
create the local array with makeArrayFromPtr, which will create an
array from the existing bytes without any copies. For the to_ndarray()
case makeArrayFromPtr is also used on some local memory, which is then
reinterpreted as bytes with createBytesWithOwnedBuffer.

Here's the performance improvement for 16-node-xc:

config to_ndarray ak.array
before 33 MiB/s 50 MiB/s
after 410 MiB/s 175 MiB/s

Where ak.array() is slower because there are more copies compared to
to_ndarray() on both the server and client side. Optimizing those
copies out is future work.

Part of #794

Improve the performance of `ak.array()` and `pdarray.to_ndarray()` by
optimizing how the server converts between bytes and pdarrays.
Previously, the server would write to and read from a big-endian memory
mapped file to convert between bytes and arrays but this is fairly
slow. Optimize this conversion by directly interpreting the underlying
memory as the type we're converting to. In the `ak.array()` case we
create the local array with `makeArrayFromPtr`, which will create an
array from the existing bytes without any copies. For the `to_ndarray()`
case `makeArrayFromPtr` is also used on some local memory, which is then
reinterpreted as bytes with `createBytesWithOwnedBuffer`.

Here's the performance improvement for 16-node-xc:

| config | to_ndarray | ak.array   |
| ------ | ---------: | ---------: |
| before |   33 MiB/s |   50 MiB/s |
| after  |  410 MiB/s |  175 MiB/s |

Where `ak.array()` is slower because there are more copies compared to
`to_ndarray()` on both the server and client side. Optimizing those
copies out is future work.
@glitch glitch linked an issue Jul 14, 2021 that may be closed by this pull request
Copy link
Collaborator

@glitch glitch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(caveat: For as much as I understand the chapel stuff) this looks good.

Copy link
Collaborator

@reuster986 reuster986 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate performance of client-server array transfers
3 participants