RFC: set gasnetex by default when building with USE_GASNET/Legion_USE_GASNet #1508

elliottslaughter · 2023-07-12T19:49:10Z

Realm's gasnetex backend has been stable for some time now. I am wondering if it makes sense to set it as the default, i.e., when USE_GASNET=1 (Makefile build) or Legion_USE_GASNet=ON (CMake build) we set REALM_NETWORKS=gasnetex / Legion_NETWORKS=gasnetex instead of gasnet1.

If you were already setting REALM_NETWORKS/Legion_NETWORKS this would have no effect.

Thoughts?

Here's a sample MR based on this approach: https://gitlab.com/StanfordLegion/legion/-/merge_requests/840

The text was updated successfully, but these errors were encountered:

mariodirenzo · 2023-07-12T20:59:50Z

I'm not sure if the issue has been fixed but AFAIK the gasnetex layer requires setting the runtime flag -gex:obcount followed by a large number, which depends on the number of nodes involved in a calculation. The last time I've seen an execution failing because of the missing -gex:obcount there wasn't any error message and it was up to the user to understand that the flag was required. I believe that switching by default to the gasnetex backend without fixing the issue could break many applications that need to run at large scale.

syamajala · 2023-07-12T21:03:39Z

I have run S3D on 2048 nodes (16k ranks) on Frontier using gasnetex without needing -gex:obcount. This could be application dependent though.

eddy16112 · 2023-07-12T21:11:33Z

Yes, I have seen hangs without -gex:obcount as well.

manopapad · 2023-07-12T22:09:32Z

I'm generally in favor. gasnetex has shown better performance than gasnet1 for legate, but we have run into some papercuts when using gasnetex that don't happen with gasnet1. These are bit old, so they may have been fixed since.

Slow memory registration at startup: gasnetex would try to register all of -ll:csize with the NIC, and that could lead to multi-minute delays at startup.
-gex:obcount issue: With gasnet1 there was only a small number of endpoints per rank, that require the allocation of an output buffer. With gasnetex there's now also 2 endpoints per GPU (one for fsize and one for ib_fsize) (because we can have direct communication between GPUs), so now the default limit may not be large enough if we need to instantiate all endpoints, which happens in cases of all-to-all communication. Realm can't calculate a good setting at gasnet initialization time, because it doesn't know at that point how many GPU processors are going to be created. Setting -gex:obcount to (4 + 2 * gpus/node) * nodes should be sufficient for the worst-case scenario, but this pessimistic setting will almost always be overkill (if you don't have true all-to-all communication). Previously this would result in a hang.
Too many simultaneous local client threads: With -gex:immediate 1 (default) meta-tasks will inject AM requests directly into gasnet, and gasnet wants to record all the thread IDs that have ever called into it, in a statically-sized data structure, and this limit can be reached at larger scales, producing an error message like GASNet Extended API: Too many simultaneous local client threads (limit=256). To raise this limit, configure GASNet using --with-max-pthreads-per-node=N. We work around this by setting -gex:immediate 0.

streichler · 2023-07-13T00:28:25Z

Slow memory registration at startup

Fixed.

-gex:obcount issue

Not fixed. Fixing should be a gate to switching gasnetex to the default.

Too many simultaneous local client threads

Requires both -gex:immediate 1 (the default) and also -ll:force_kthreads (not the default, but a very common opt-in) to occur. Should eventually be fixed, but probably doesn't need to be a blocker.

elliottslaughter · 2024-02-05T17:26:48Z

-gex:obcount issue

Not fixed. Fixing should be a gate to switching gasnetex to the default.

Could someone explain what this issue actually is? What causes it, and what options do we have for fixing it? Is there a timeline on a fix?

Too many simultaneous local client threads

Requires both -gex:immediate 1 (the default) and also -ll:force_kthreads (not the default, but a very common opt-in) to occur. Should eventually be fixed, but probably doesn't need to be a blocker.

Same.

lightsighter · 2024-02-22T03:57:23Z

Could someone explain what this issue actually is? What causes it, and what options do we have for fixing it? Is there a timeline on a fix?

I talked with @streichler. My understanding of this issue is that Realm makes a certain number of output buffers for receiving messages from senders in the registered memory segment for GASNet. Right now Realm assigns these buffers to senders as it receives messages, but once a buffer is assigned to a sender it will always be associated with that sender. If you run out of buffers because they've all been assigned to other senders then nothing can receive the incoming message and we hang. As @streichler suggested there are two solutions:

Allocate enough of these buffers on every node using the formula that Manolis gave above to ensure that there are enough buffers so every sender can have one and therefore you'll never run out. This is overly pessimistic and results in memory consumption on each node that grows with the scale of the machine, so while it is a sound work-around it's not a reasonable one if we're going to care about scalability.
Have a way to temporally multiplex (share) buffers between endpoints so that multiple endpoints can receive messages into the same buffer in a non-interfering way.

Sean says:

I think (2) is the better answer, but it requires messing with a bunch of finicky and aggressive multi-threaded code and neither breaking it nor slowing it down

I think @eddy16112 is going to take a look at this now that he has a reproducer although it's likely going to take a while to get all the details right and get it tested well.

Same.

GASNet right now has a build parameter that puts a static upper bound on the number of threads that can send active messages, presumably because they have some per-thread data structures for performance. If you run with -gex:immediate 1 Realm will send messages immediately instead of sticking them in a queue for active message sender threads to push out, which means that pretty much any Realm thread can send an active message for things like event triggers or DMAs or ... whatever. If you also run with -ll:force_kthreads, then Realm will create and destroy many more threads for running tasks on processors. From GASNets' perspective each of those threads is a new sender thread (GASNet doesn't track when threads are destroyed; once it's seen a thread it assumes it's alive forever). So if you run with both -gex:immediate 1 and -ll:force_kthreads it's not surprising that you quickly exceed the static upper bound on the number of threads that GASNet sets for sending active messages. I don't think we need to fix this one other than maybe just automatically switching -gex:immediate to 0 if -ll:force_kthreads is set.

eddy16112 · 2024-02-27T01:40:44Z

I am trying to figure out if we can turn the obcount hang into an error message telling people to increase the obcount. I reproduced the hang with the memspeed. Apparently, the program will fall into the overflow code path and never send the packet, which is the place the hang comes from. https://gitlab.com/StanfordLegion/legion/-/blob/master/runtime/realm/gasnetex/gasnetex_internal.cc?ref_type=heads#L1841-1854 @streichler is this code path typical for the obcount hang? Is there any other case that will fall into the same code path? If not, can we throw an error message of Insufficient obcount here?

streichler mentioned this issue Dec 12, 2023

Realm: Too many simultaneous local client threads #1612

Closed

mariodirenzo mentioned this issue Feb 5, 2024

Realm: Failed to send message #1597

Closed

lightsighter mentioned this issue Feb 11, 2024

Legion: hang at 256 nodes on Frontier #1626

Closed

elliottslaughter mentioned this issue Mar 14, 2024

realm: switch default network layer to gasnetex #1099

Closed

lightsighter mentioned this issue May 21, 2024

Reductions cause a deadlock when using Legion with CUDA #1691

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: set gasnetex by default when building with USE_GASNET/Legion_USE_GASNet #1508

RFC: set gasnetex by default when building with USE_GASNET/Legion_USE_GASNet #1508

elliottslaughter commented Jul 12, 2023

mariodirenzo commented Jul 12, 2023

syamajala commented Jul 12, 2023

eddy16112 commented Jul 12, 2023

manopapad commented Jul 12, 2023

streichler commented Jul 13, 2023

elliottslaughter commented Feb 5, 2024

lightsighter commented Feb 22, 2024

eddy16112 commented Feb 27, 2024

RFC: set gasnetex by default when building with USE_GASNET/Legion_USE_GASNet #1508

RFC: set gasnetex by default when building with USE_GASNET/Legion_USE_GASNet #1508

Comments

elliottslaughter commented Jul 12, 2023

mariodirenzo commented Jul 12, 2023

syamajala commented Jul 12, 2023

eddy16112 commented Jul 12, 2023

manopapad commented Jul 12, 2023

streichler commented Jul 13, 2023

elliottslaughter commented Feb 5, 2024

lightsighter commented Feb 22, 2024

eddy16112 commented Feb 27, 2024