New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Realm Instance Redistricting #1078
Comments
As discussed offline, we are going evaluate the scope of work to make sure it's planned ahead so that it won't be blocking control_replication merge |
Just to be clear, you guys won't be blocking the control replication merge no matter what. This is needed for the refactoring of how Legion manages memory "temporary" memory for tasks which will be worked on after control replication merges. So you're not on the critical path. 😉 |
@lightsighter I am just re-writing your original problem statement here just to see whether I got it. From a given instance When either While While While When If The interface:
I obviously made a lot of assumptions. Please drop comments what you agree/disagree on and we can hash out gradually the spec for what are we going to design and implement here. |
No, at the point of redistricting, instance A ceases to exist (its instance ID should be reclaimed, and its meta-data deleted). Instance B and C then come into existence as completely new and independent instances, just using the same memory that instance A used to occupy.
After an instance is redistricted, the new instances that come into existence should be normal instances and when they are deleted their memory should be reclaimed and available for new instances to be made from it. That should be completely independent though for B and C, so B should be able to be deleted its memory reused independently of the lifetime of C. I think that should greatly simplify the design. Instances should be instances, there shouldn't be anything special kinds of them. We're just deleting one instance and then reusing its memory to create one or more new instances potentially with different layouts from the original instance. |
Are we going to keep some of the data of A into B or C? If not, why do we explicitly want to reuse the memory of A for B and C? |
No, at least not explicitly. Realm doesn't have to go through and reinitialize the memory when it does the redistricting so whatever data was there when A existed will still happen to be there when B and C are created, but that won't be important to these semantics.
To avoid resource deadlocks. Legion needs to reserve memory in advance of running a task for various different objects that might need to create instances of their own during the execution of the task (e.g. deferred values/buffers, output regions, futures, etc). Legion doesn't know what those things are though in advance or what their layouts look like, so we can't make instances for them before the task starts. So instead we reserve a single instance to be the memory pool for the task, and then as the task runs, we'll create new instances from the memory pool instance using redistricting. If we tried to make instances on the fly we would be exposing ourselves to resource deadlocks. We work around those deadlocks today with the two pools approach, but we've already determined that is non-viable going forward for a number of reasons. So Legion is going to make a 1-D instance to be the memory pool of each task (size controlled by the task variant/mapper) and then gradually make new instances by breaking off parts of that 1-D pool as the task is running and giving the new instances different layouts from the 1-D one-field layout of the pool instance. |
OK, the reason of avoiding resource deadlocks makes sense to me. I have two other questions:
BTW, to avoid resource deadlocks, there could be another simpler solution other than the nested region instances. We can propose a new API named |
Redistricting instance A should completely destroy it. It should be as if we had called Note: we could optionally make redistricting a deferrable operation just like
I don't think it matters what the layout of A was. For the purposes of redistricting it is just a contiguous block of memory. Even compact-sparse instance layouts guarantee that the pieces of the instance are compact in memory. So Realm is just going to take that block of memory and split it up between the new instances and then reclaim any leftover memory. If there is not enough space for all the new instances to be satisfied that can be an error. The layout of A though should be meaningless when it comes to doing redistricting; all that matter is whether there is enough space in A to satisfy the instances B, C, etc.
The ability to create grouped instances is not actually the problem from a deadlock perspective. Deadlocks occur when Realm does a deferred allocation (meaning the instance won't be available until some other instance is deleted) for an earlier operation that depends the deletion of an instance allocated for a later operation in program order. Say we have two tasks A and B with B depending on the results of A (a true data dependence). Task A maps and starts running. Then task B maps and allocates an instance X that it will use when it will run. Next as task A is running it decides it needs to create an on-the-fly eager instance. It asks Realm to do this, but memory is tight. Realm can see that instance X will be deleted after task B runs, so it happily says to task A, here is instance Y for you to use, but you can only use it once this event triggers and I'll trigger it as soon as instance X is deleted. We've now created a resource deadlock cycle: B has a true data dependence on A, but A cannot continue until B runs and deletes instance X. To avoid this, Legion needs to pre-reserve an instance during the mapping stage of the pipeline for task A to use as a memory pool for any dynamic allocations that A needs to do when it is running, to avoid creating the back-edge in the dependence graph illustrated above. Now you might say: why not try to make an allocation interface in Realm that only succeeds if the allocation can be immediately satisfied with no deferred allocations? It's true that it would avoid the cycle, but remember that memory was tight in the example above. Realm couldn't satisfy A's request for an instance eagerly which is why it had to do a deferred allocation. So if you make an interface that only succeeds for eager allocations, then the example above will still fail, just with a perceived out-of-memory error. Better than a deadlock, but still suboptimal. This program can actually fit in memory, we just need A to pre-reserve its memory so that B is the one that gets the deferred allocation. Legion will pre-reserve the memory pool for task A, but then it needs to be able to split that pool up into new instances with arbitrary layouts. |
Yeah, I understand the case of resource deadlock. So I assume that the Legion runtime does not even know the there is an eager instance during the mapping stage of task A, right? I am thinking about if we can let all realm Memory to have an eager and deferred pool, such that all the |
I don't want to ask users to enumerate all the ways that they will need to create an instance during the execution of the task. Asking them to put an upper bound on how much dynamic memory they will need during the execution of the task seems like a reasonable ask though.
That doesn't actually fix the problem. The problem as @manopapad and @magnatelee will tell you is that having two pools is absolutely terrible for the user experience. The whole reason that we're doing this is to get rid of having two pools so users don't need to decide how to split their memories up because they really don't know how to do that well and they are almost guaranteed to make the wrong split of their memory into the two pools. |
It means Legion will allocate a fix size eager instance for all tasks. I think it may waste resources because not all the tasks will need an eager memory pool of the same size. If we have two global pools for each type of memory, then each task can allocate instances when necessary. Actually, I do not think it is terrible for the user experience, because applications will only need to use a separate API ( |
No, Legion will never guess how much dynamic memory to allocate for a task.
That's why Legion will never guess how much dynamic memory a task will require: it will mandate that the application/mapper specify an upper bound. For every single task either the application will need to put a static upper bound on the amount of dynamic memory a task variant requires when it is registered with Legion, or the mapper will have to specify an upper bound when it maps the task before it runs. Either way, how much "waste" there is will be a function of how tight the upper bound provided by the application/mapper is to the actual amount of memory used by the task. Applications/mappers can be as precise/imprecise as they want to be in bounding the amount of dynamic memory they might ultimately need. Legion will never guess how much dynamic memory a task might need; it will always be under the control of the application/mapper.
That's not the only thing they need to do. Before the program starts, every single memory in the machine where eager-only instances need to be made has to be partitioned into two pools. Specifying the percentage of memory that goes into each of the two pools is very hard and is often different for many different memories across the machine. Furthermore, users don't understand what it is for and therefore they almost never size it correctly. This is what we have today and the Legate team has already said categorically that it is intolerable from a user experience perspective and needs to be abolished. You're welcome to try to convince them otherwise. 😉 I personally think having two pools is ugly from an implementation side of things, but I probably wouldn't have prioritized addressing it for quite a while since it works. However the user experience is so bad and so many people have complained about it that the Legate team has mandated that removing two pools should be the next most important thing to do in Legion after merging the control replication branch, which should give you some indication of how horrible it is for them. I'll also say that I'm not opposed to having Realm add a way to only allocate an instance if it can be allocated immediately (e.g. |
Redistricting operation should be deferred in my opinion as most of the operations realm does today - it is consistent with the programming model. You will be able to turn it off by providing a
Let's not go down this road. I don't think we need a two-pool-system here and most importantly it does not solve the problem that we are trying to solve. Assuming the available memory for the overall application is tight, eager-pool will only "steal" parts of it from a regular pool (used by task
@lightsighter Can you think of any other examples that would be good as a case-study? I think we should document them as much as possible right now to make sure we design it all the right way from the start. |
Yes, I'm fine with the interface being deferrable and then in the Legion case I'll just call the version with a NO_EVENT precondition so it happens immediately.
At least all of the Legion examples are some variation on the one I presented above. Deferred values/buffers, output regions, futures: they all have the same flavor as the case above. I suppose at least one other case that I've considered although don't have a plan to implement right now is to allow mappers to try to "realloc" an instance from an existing instance which would also involve redistricting, although in that case the current interface would work as well. Other than that I can't think of any other uses cases off the top of my head that couldn't be handled by the interface described above. |
We are getting close to getting the MR through review and merging: |
In several cases, we have a need for taking an existing Realm instance and splitting it apart into new Realm instances with different layouts. The API for this feature is flexible. The only requirements are that it be able to produce new instances from an existing one with specified layout constraints. Whether the prior instance is completely destroyed or is left with remaining memory is open for discussion. We can also discuss whether this should be a deferred operation in Realm or whether it can be done only eagerly.
The text was updated successfully, but these errors were encountered: