Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build Comm for ucx-py #2344

Closed
mrocklin opened this issue Nov 8, 2018 · 5 comments

Comments

@mrocklin
Copy link
Member

commented Nov 8, 2018

The ucx-py library provides Python bindings for the openucx library, which wraps around lower level high performance networking solutions like Infiniband and others.

There appears to be an asyncio compatible implemenation in ucx-py in development branches. Here is an example. It would be interesting to implement the Dask Comm API on top of this interface, both as a possible future high-performance networking solution for Dask, and to help drive development upstream in ucx-py.

On the Dask side this is motivated by a few things:

  1. We can possibly drastically reduce our latency. Early ucxpy benchmarks show 20us inter-node latencies on nice networks. We currently operate at a few milliseconds (though some of this is serialization, etc..)
  2. We can possibly offload socket handling from Python. Dask stress benchmarks show that we can spend a lot of time in Python's socket.send. Ideally we can offload this to ucxpy running in another native thread.
  3. UCX can also handle GPU-GPU direct communications, though I would recommend that we handle that only after we get something basic working

For someone looking to solve this problem I recommend looking at the following resources:

  1. The Communications documentation
  2. Current implementations, notably TCP/TLS and inproc (uses Tornado queues)
  3. An abandoned PR implementing comms for asyncio

From an API perspective we would want to be able to do the following:

dask-scheduler --host ucx://localhost
Scheduler started at ucx://localhost

>>> client = Client('ucx://localhost')

ucx-py is experimental and undergoing pretty heavy churn. My guess is that the work proposed in this issue would live in a development branch for a while until ucx-py stabilized a bit. Starting this early though will probably be important in order to get ucx-py to stable point.

cc @TomAugspurger @Akshay-Venkatesh

@jhamman

This comment has been minimized.

Copy link
Member

commented Nov 8, 2018

I get a 404 on https://github.com/Akshay-Venkatesh/ucx-py. Is this a private repo?

@mrocklin

This comment has been minimized.

Copy link
Member Author

commented Nov 8, 2018

Hrm, could be. My apologies to @Akshay-Venkatesh if so . Akshay, please let me know if I should delete this issue from history.

@Akshay-Venkatesh

This comment has been minimized.

Copy link

commented Nov 8, 2018

@mrocklin It's private at the moment. I think you can keep the issue. Let me get back to you whether we plan to keep the repo private.

@quasiben

This comment has been minimized.

Copy link
Member

commented Jul 8, 2019

@mrocklin I think we can close this issue

@mrocklin

This comment has been minimized.

Copy link
Member Author

commented Jul 8, 2019

Thanks @quasiben

@mrocklin mrocklin closed this Jul 8, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.