Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #209: improve socket transport performance #223

Closed
wants to merge 7 commits into from

Conversation

kwen2501
Copy link
Contributor

Split transfers over multiple sockets
Launch multiple threads to drive sockets
Detect AWS NICs and set nsockets/nthreads accordingly

Split transfers over multiple sockets
Launch multiple threads to drive sockets
Detect AWS NICs and set nsockets/nthreads accordingly
@myleott
Copy link

myleott commented Jun 7, 2019

Hey, just wanted to know what is the status of this PR? If it fixes the performance issues reported in #209, when can we expect it to be merged and a new NCCL release?

@kwen2501
Copy link
Contributor Author

kwen2501 commented Jun 8, 2019

Hi @myleott, I have just pushed a small commit to improve the CPU utilization of this PR. We will merge the PR into master branch in about two weeks depending on feedback. This will be part of the next NCCL release, but we do not have a concrete date set for the release yet. Please let us know if you are blocked on a release date. Thanks!

@xw285cornell
Copy link

HI @kwen2501, do you know when this fix will be part of the NCCL release? It seems to be pretty critical for performance. Is there any known downsides for this patch?

@kwen2501
Copy link
Contributor Author

kwen2501 commented Jun 20, 2019

Hi @xw285cornell, as previously mentioned, we would merge this PR into master depending on the feedback. We have some feedback for small changes, and will test it in the next few days. Once that's done, we will merge it into master. Thank you!

@myleott
Copy link

myleott commented Jun 21, 2019

FWIW, I observe 30% speedups on certain configurations just patching this PR.

@kwen2501
Copy link
Contributor Author

Good to hear that! Thanks for sharing!

@kwen2501
Copy link
Contributor Author

@myleott @xw285cornell This PR has been merged into master.

@christopherhesse
Copy link
Contributor

Why is this PR still open?

@sjeaugey
Copy link
Member

The manual squash we did did not reflect the PR as closed... we just need to close it manually :-)
Closing.

@sjeaugey sjeaugey closed this Jul 11, 2019
@sjeaugey sjeaugey deleted the dev/kwen/multi-socket branch July 11, 2019 23:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants