-
Notifications
You must be signed in to change notification settings - Fork 418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use hybrid MR mode with the cxi provider #19148
Conversation
Regarding I confirmed that |
As described by @gbtitus: The libfabric cxi provider for Cassini-based networks now supports the so-called "hybrid" local memory registration mode. For functions that accept a local MR descriptor, this allows the client to pass the non-NULL MR descriptor (pointer) if the local address is already registered, or NULL to indicate that the local memory is not registered and the provider should use its internal MR cache. (Absent this hybrid mode the provider always uses its internal MR cache, which is a source of contention for multi-threaded clients such as Chapel and in certain cases SHMEM.) Signed-off-by: John H. Hartman <jhh67@users.noreply.github.com>
The CHPL_RT_COMM_OFI_CXI_HYBRID_MR environment variable controls whether or not the cxi hybrid MR mode is used. It is enabled by default. Signed-off-by: John H. Hartman <jhh67@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Full -multilocale-only
testing was clean. See https://github.com/Cray/chapel-private/issues/3153 for instructions on running testing going forward.
Performance also looks good. This significantly improves thread scaling for fetching atomics and GETs. Here's results for the comm-ops microbenchmark:
Fetching AMO:
cores | main | hybrid |
---|---|---|
1 | 1.55s (0.32 Mops/s) | 1.47s ( 0.34 Mops/s) |
4 | 2.32s (0.86 Mops/s) | 1.50s ( 1.33 Mops/s) |
16 | 8.56s (0.93 Mops/s) | 1.53s ( 5.23 Mops/s) |
64 | 45.73s (0.70 Mops/s) | 1.60s (19.98 Mops/s) |
GET:
cores | main | hybrid |
---|---|---|
1 | 1.47s (0.34 Mops/s) | 1.42s ( 0.35 Mops/s) |
4 | 2.28s (0.88 Mops/s) | 1.47s ( 1.36 Mops/s) |
16 | 8.43s (0.95 Mops/s) | 1.50s ( 5.35 Mops/s) |
64 | 44.79s (0.71 Mops/s) | 1.56s (20.54 Mops/s) |
And here's results for 16-node fine-grained indexgather:
cores | main | hybrid |
---|---|---|
1 | 0.008 GB/s/node | 0.008 GB/s/node |
4 | 0.014 GB/s/node | 0.018 GB/s/node |
16 | 0.016 GB/s/node | 0.071 GB/s/node |
64 | 0.011 GB/s/node | 0.236 GB/s/node |
As described by @gbtitus:
The libfabric cxi provider for Cassini-based networks now supports the
so-called "hybrid" local memory registration mode. For functions that
accept a local MR descriptor, this allows the client to pass the
non-NULL MR descriptor (pointer) if the local address is already
registered, or NULL to indicate that the local memory is not registered
and the provider should use its internal MR cache. (Absent this hybrid
mode the provider always uses its internal MR cache, which is a source
of contention for multi-threaded clients such as Chapel and in certain
cases SHMEM.)
Resolves Cray/chapel-private#2957.
Signed-off-by: John H. Hartman jhh67@users.noreply.github.com