Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Selection of Network Ressources and creating worker/endpoint pair #9586

Open
98luks opened this issue Jan 9, 2024 · 4 comments
Open

Selection of Network Ressources and creating worker/endpoint pair #9586

98luks opened this issue Jan 9, 2024 · 4 comments
Labels

Comments

@98luks
Copy link

98luks commented Jan 9, 2024

In my config I provide two selectable network ressources, which can be seen in the Context. Afterwards I create two worker / endpoint pairs but both select the same network ressources. Is there a way to force the second pair to use the other network ressource?

#
# UCP context
#
#     component 0  :  self
#     component 1  :  tcp
#     component 2  :  sysv
#     component 3  :  posix
#     component 4  :  cuda_cpy
#     component 5  :  cuda_ipc
#     component 6  :  gdr_copy
#     component 7  :  ib
#     component 8  :  rdmacm
#     component 9  :  cma
#     component 10 :  knem
#
#            md 0  :  component 0  self 
#            md 1  :  component 2  sysv 
#            md 2  :  component 3  posix 
#            md 3  :  component 4  cuda_cpy 
#            md 4  :  component 5  cuda_ipc 
#            md 5  :  component 6  gdr_copy 
#            md 6  :  component 7  mlx5_0 
#            md 7  :  component 7  mlx5_2 
#            md 8  :  component 9  cma 
#            md 9  :  component 10 knem 
#
#      resource 0  :  md 0  dev 0  flags -- self/memory
#      resource 1  :  md 1  dev 0  flags -- sysv/memory
#      resource 2  :  md 2  dev 0  flags -- posix/memory
#      resource 3  :  md 3  dev 1  flags -- cuda_copy/cuda
#      resource 4  :  md 4  dev 1  flags -- cuda_ipc/cuda
#      resource 5  :  md 5  dev 1  flags -- gdr_copy/cuda
#      resource 6  :  md 6  dev 2  flags -- dc_mlx5/mlx5_0:1
#      resource 7  :  md 6  dev 2  flags -- rc_verbs/mlx5_0:1
#      resource 8  :  md 6  dev 2  flags -- rc_mlx5/mlx5_0:1
#      resource 9  :  md 6  dev 2  flags -- ud_verbs/mlx5_0:1
#      resource 10 :  md 6  dev 2  flags -- ud_mlx5/mlx5_0:1
#      resource 11 :  md 7  dev 3  flags -- dc_mlx5/mlx5_2:1
#      resource 12 :  md 7  dev 3  flags -- rc_verbs/mlx5_2:1
#      resource 13 :  md 7  dev 3  flags -- rc_mlx5/mlx5_2:1
#      resource 14 :  md 7  dev 3  flags -- ud_verbs/mlx5_2:1
#      resource 15 :  md 7  dev 3  flags -- ud_mlx5/mlx5_2:1
#      resource 16 :  md 8  dev 0  flags -- cma/memory
#      resource 17 :  md 9  dev 0  flags -- knem/memory
#

Setup and versions

  • UCX 1.15.0
@98luks 98luks added the Bug label Jan 9, 2024
@yosefe
Copy link
Contributor

yosefe commented Jan 13, 2024

In my config I provide two selectable network ressources, which can be seen in the Context. Afterwards I create two worker / endpoint pairs but both select the same network ressources. Is there a way to force the second pair to use the other network ressource?

#
# UCP context
#
#     component 0  :  self
#     component 1  :  tcp
#     component 2  :  sysv
#     component 3  :  posix
#     component 4  :  cuda_cpy
#     component 5  :  cuda_ipc
#     component 6  :  gdr_copy
#     component 7  :  ib
#     component 8  :  rdmacm
#     component 9  :  cma
#     component 10 :  knem
#
#            md 0  :  component 0  self 
#            md 1  :  component 2  sysv 
#            md 2  :  component 3  posix 
#            md 3  :  component 4  cuda_cpy 
#            md 4  :  component 5  cuda_ipc 
#            md 5  :  component 6  gdr_copy 
#            md 6  :  component 7  mlx5_0 
#            md 7  :  component 7  mlx5_2 
#            md 8  :  component 9  cma 
#            md 9  :  component 10 knem 
#
#      resource 0  :  md 0  dev 0  flags -- self/memory
#      resource 1  :  md 1  dev 0  flags -- sysv/memory
#      resource 2  :  md 2  dev 0  flags -- posix/memory
#      resource 3  :  md 3  dev 1  flags -- cuda_copy/cuda
#      resource 4  :  md 4  dev 1  flags -- cuda_ipc/cuda
#      resource 5  :  md 5  dev 1  flags -- gdr_copy/cuda
#      resource 6  :  md 6  dev 2  flags -- dc_mlx5/mlx5_0:1
#      resource 7  :  md 6  dev 2  flags -- rc_verbs/mlx5_0:1
#      resource 8  :  md 6  dev 2  flags -- rc_mlx5/mlx5_0:1
#      resource 9  :  md 6  dev 2  flags -- ud_verbs/mlx5_0:1
#      resource 10 :  md 6  dev 2  flags -- ud_mlx5/mlx5_0:1
#      resource 11 :  md 7  dev 3  flags -- dc_mlx5/mlx5_2:1
#      resource 12 :  md 7  dev 3  flags -- rc_verbs/mlx5_2:1
#      resource 13 :  md 7  dev 3  flags -- rc_mlx5/mlx5_2:1
#      resource 14 :  md 7  dev 3  flags -- ud_verbs/mlx5_2:1
#      resource 15 :  md 7  dev 3  flags -- ud_mlx5/mlx5_2:1
#      resource 16 :  md 8  dev 0  flags -- cma/memory
#      resource 17 :  md 9  dev 0  flags -- knem/memory
#

Setup and versions

  • UCX 1.15.0

Currently, there is no API to select different resources per worker, but only per context.

@shamisp
Copy link
Contributor

shamisp commented Jan 13, 2024

You may try to create 2 different context. It is an interesting discussion if this has to be supported per worker.

@98luks
Copy link
Author

98luks commented Jan 13, 2024

Thank you for your response. I got now the functionality I want, with two seperate contexts. Each of them have one selectable network ressource. So far everything works and they do not interfer each other. I just read that it is suggested to only create one context. Which problems may occur if two contexts are created?

@shamisp
Copy link
Contributor

shamisp commented Jan 13, 2024

Having two context is not a problem (this is how UCX was designed). With two workers under the same context so of the resource utilization (memory specifically) is more efficient when compared to two context.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants