-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Realm/Legion: Add support for structured (~affine) indirect copies #705
Comments
@streichler How would you feel about Realm supporting a "mixed mode" indirect copy? For example where the source indirection is a transform and the destination indirection is a field. See this Legate issue: |
The intent is that each indirection used can independently be chosen as structured or unstructured, so this case should work. |
Legate.numpy would also appreciate it if you could supply both an unstructured and a structured transformation on each direction, to be applied one after the other. The motivating use-case would be something like this:
where we need to use |
After thinking further on this, we realized cuNumeric actually requires up to 3 transformations on each direction,
To implement this with a single indirect copy, we would need to proceed as follows:
Alternatively, we could attach affine transformations to each field on the copy (source, target, source indirection, target indirection), to mirror how we attach affine transformations to Stores passed to tasks (in that case we do it by defining custom accessors). |
Right so we effectively need the ability to have an affine transform applied to any of the region requirements to a copy operation in Legion. The place where this gets interesting is that this now will require there to effectively be an "abstract" iteration space for any kind of copy because the source/destination indirection region requirements no longer implicitly represents the iteration space for the copy. We'll need a way to specify that in the new interface, although I suppose that was going to be true anyway just to deal with having affine transforms for the source/indirect parts anyway. |
The iteration space of the copy has always been explicit ( |
We're just going to do what we do with the other indirection fields in that case and have the application provide region requirements that can optionally over-approximate the needed data for that field, just like we would do on the indirection for a current gather/scatter copy. The application can provide as tight of bounds as it wants, but over-approximate is fine, especially since the indirection fields are read-only. |
Here is a working merge request for an interface for this issue: https://gitlab.com/StanfordLegion/legion/-/merge_requests/467 Also, @manopapad did a small worked example to understand how to map NumPy copies down to the interface:
which would translate to the interface as:
Translated: for each point in the copy index space, you do the source indirect transform on the point, using the result you read the value out of the source indirect array, then you apply the source transform to the read value, then you use that value to read out of the source array and write into the destination array. @streichler and @apryakhin to verify that conforms with their expectations of what Realm will eventually be able to support. |
Correcting my example, after some clarifications from @lightsighter:
becomes the following copy operation:
which works as follows:
|
@manopapad Can you extent you example above by writing down what the arguments to the index copy launcher in Legion would be for this case from cuNumeric's perspective? Let's assume we do a index copy launch with two point copy operations. I'm betting that Legion's current semantics won't quite do what we want here, but I'd like to see how you want it to work. Feel free to describe any partitions you want on the data, no need to write down explicit projection functions, just give names to subregions for the arrays and say which subregions each point copy operation is going to use. |
Here is a possible execution: We have all the required information at the cunumeric level to partition the indirection fields such that each point copy operation will find all its indirection information locally (even if there are transformations on the indirection fields). Whether or not the partitions of the Copy index space partitions:
Partitions of
Partitions of
Partitions of
Point copy 0:
The way the source array Point copy 1:
The way the source array |
Right this is kind of what I was expecting to happen. I suspect that we'll need some extra support in legion to support "collective" behavior of the point copy operations in the case when there is a transform on the indirection fields. It will require more preimages, but that'll be the price we'll pay for supporting the additional functionality. |
Thanks for the discussion! I think we need to reach a consensus on what functionality needs to be supported by Realm as the end state. We are getting close to merging this PR which adds a native support for structured indirect copies in Realm. The set of use cases can be found here: When PR is merged, Realm will be able to apply any kind of affine transform on both src and dst index spaces, with a single level of indirection. It will be taking the fast path since a transform is not applied on a set of points individually but directly on lo/hi points of an index space bounds. The example below is effectively a structured copy with an affine transform on array c "S(ATc)" and an unstructured copy with an affine transform on a set of points through the source indirect field (U(ATa))…so S(ATc)U(ATa). The existing API and the data path (when PR is applied) at this point do not support combining structured and unstructured copies together.
An even simpler version which is also not supported:
A structured copy with an affine transform followed by an unstructured copy using the set of transformed points. After giving it some thoughts, I think it can be done but have several questions:
Assuming we need this going forwards, we can certainly design an API (Realm) which enables us to incrementally add functionality without the need to update the call sites...e.g. Legion -> Legate -> CuNumeric. |
Could you walk us through an example with some small arrays, showing what we can do with the new interface?
I have updated nv-legate/cunumeric#41 with the logic followed in the current implementation. As you suggest, we can handle such cases by making intermediate copies, but we could avoid some copies if Realm could handle affine transformations directly (which AFAIU is also useful for other usecases). We certainly don't need Realm to support the full range of cases.
So far we haven't seen any "real-world" instances of mixed indexing (but haven't really seen many workloads that use advanced indexing in general).
The cases listed on nv-legate/cunumeric#41 are the only ones that should arise in cuNumeric, assuming we are not fusing operations. |
Yes, certainly but please note that there is no new API (yet). The PR is based on the existing Ream API for affine indirect copies.
Thanks! Effectively we don't have any data points yet. Will it be possible for us to estimate nevertheless what set of operations are going to be the most critical? By the critical I mean things such as: provide good performance, low memory footprint and a frequency of their use in applications. I believe the description below would be a gather and the step#2 materializes to a new intermediate array?
For example:
I took a look at the tests provided by tests/integration/test_advanced_indexing.py. I think most of the cases come down to the following patterns?
Am I missing anything here? @manopapad @lightsighter I don't quite understand in detail how we are going to ensure that dimensionality matches up between different transforms (single copy operation)? Is that something Legion/CuNumeric should/can handle? In addition, for cases where we deal with the distributed data I assume that it's a pre-condition that a copy is split on independent operations on local data only before it's fed to the Realm API? |
Yes it will do gather operation at 3) and copy of the
This is correct, but keep in mind that indirect can have multiple transforms.
|
I don't think we can make a meaningful estimate with the information that we have.
You can assume that the dimensionality of arrays and transforms will be checked before the operation makes it to Legion/Realm (e.g. Legate will never emit a copy where
As analyzed on #705 (comment), I think we can guarantee at the Legate level that each point copy operation will find all the relevant indirection information locally. However, the AFAIK we are not planning to inspect the indirection data at the Legate level, so Legion and/or Realm will have to handle out-of-partition accesses.
I think we are operating under the assumption that all transformations can be collapsed into a single affine transformation. |
Yes, I believe that multiple affine transforms can be folded into a single one.
This API we provide today by Realm (C++):
It enables us to handle the following cases (common patterns for structured copy with an affine transform).
Essentially we can express any transform as a function of A * p + b where:
In addition, the API allows us to express an unstructured indirect copy (however without an affine transform). For example:
Besides the mixed indirect copies could you think of any examples that are required for advanced indexing but will not be handled by the current API? @ipdemes @lightsighter @manopapad Thanks! |
Note that simply declaring a variable like I can think of a couple of cases where having direct support for one affine indirection on each direction of a copy (but not mixed unstructured + affine) would help us:
|
Just wanted to summarize what we have discussed today:
Please find more links below: |
@apryakhin :
|
Thanks Irina, there are effectively:
We should be able to handle all 4 cases when we merge an "affine" branch into the master. |
@ipdemes: based on what @magnatelee showed in the meeting yesterday, there are cases in the TorchSWE codebase where we have basic indexing mixed with advanced indexing, e.g. something like:
AFAIU the way we handle this internally results in a scatter-gather copy with a transform on the
and then we emit the scatter-gather copy for:
Currently, because Legion copies cannot handle transformations, the array Is it possible to collect information on how often this sort of pattern occurs? Also, in an expression like this It may not be possible to extract this level of detail by analyzing the text of the code. Assuming we were able to run TorchSWE on one rank, it might be feasible to add some instrumentation to |
I went thought Torch SWE code one time and here is an updated list of Advanced Indexing operations used there:
|
@magnatelee for visibility I am going to be resurrecting this github issue |
The work is done on the Realm side, however we still need Legion plumbing and testing |
@apryakhin Can you provide a pointer to the branch where the Realm code is ready? |
With unstructured scatter/gather just about done, we still need to handle the structured cases for things like stencils, GMG, etc.
The text was updated successfully, but these errors were encountered: