Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Theoretical time of collectives when reduction in network enabled #1242

Open
yupatrick22 opened this issue Apr 1, 2024 · 0 comments
Open

Comments

@yupatrick22
Copy link

yupatrick22 commented Apr 1, 2024

Given a communicator of size p, each participant has unidirectional bandwidth B, and message size is S.

The theoretical time (if latency and computation are both ignored) for gather, scatter, all_gather, reduce-scatter is (p-1)/p*(S/B), and for all_reduce, it should be 2*(p-1)/p*S/B, if reduction in network is disabled.

However, if reduction in network is enabled, what is the theoretical time for the above collectives?

I think, for the case of all_reduce, it should be S/B, since for each participant, it only needs to send S, and receive S, and send receive can happen simultaneously (i.e. pipelining). This also means that reduction in network essentially increases the bandwidth B by a factor 2X.

But, how to calculate the theoretical time for the others? how to evaluate the benefit of reduction in network for the others?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant