You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PR #2105 implements operand analysis to determine vectorization of gmem loads. Quote:
Are we supporting a general pointwise epilogue fusion, or we only support biases? For example, if I have a "bias" whose shape is [N, M], I transpose it to [M, N] and add it to matmul result, can this be handled? It is very important that what we accept into the scheduler is compatible with what we assume here. Building out a general vectorization analysis should refer to the pointwise scheduler and uses code in https://github.com/NVIDIA/Fuser/blob/main/csrc/scheduler/vectorize_helper.h, and this is what #807 is doing.
Regarding this, I believe we have two options:
Make sure that we accept only the very limited cases of epilogue fusion (i.e. just with bias and activation) into the schedule and use this simple analysis.
Use vectorize_helper to build out a complete analysis for pointwise epilogue like the pointwise scheduler.
Whichever option we take, I don't think that is easy and well tested. For option 1, we need to review the scheduler canScheduleCompileTime code and brainstorm more adversarial examples, and for option 2, we need to copy some code from the pointwise scheduler like in Matmul, enable epilogue input vectorization #807. @drzejan2 do you remember the status of Matmul, enable epilogue input vectorization #807?
But anyway, epilogue vectorization is a much more difficult task than A and B. Can we move it to a separate PR?
Option 1 is tracked in #2167. This issue corresponds to option 2 listed above. We might additionally need to update MatmulParams::SupportedVectorization if we support different vectorizations for the different input and output tensors
This PR restricts the accepted matmul segments for the nvfuser matmul
scheduler to only those containing pointwise epilogues. Additionally, it
rules out cases for which we cannot yet reliably determine epilogue input
vectorization due to transposes (TODO, see #2169).
Note that this check can be lifted when more epilogue cases are
supported, e.g. #2213.
Fixes#2167.
This is stacked on #2175 and follow-up PR to that introducing LinearOp
because currently segmentation fails for matmuls unless the complete
fusion can be scheduled (see #1707). The MatmulOp and LinearOp IR nodes
remove the need to inspect operand producer branches, so segmentation
should work fine once that work is merged. This PR will be marked as
draft until then.
This PR does the following:
1. Rename `RolesMap` to `TensorRolesMap` and introduce `DimRolesMap`
which is a mapping from `ValGroup` to `MatmulDomain`.
2. Compute a canonical dim ordering on `ValGroup`s based on allocation
domains of inputs and outputs. This is used to compute vectorization
properly but can be used for canonicalization of loop domains in
scheduleMatmul in a future PR.
3. Properly infer vectorization for every operand, epilogue input, and
output based on canonical dim ordering.
This is in preparation for further generalization to accomodate multiple
MmaOps in a single Fusion.
Fixes#2169.
---------
Co-authored-by: Gao, Xiang <qasdfgtyuiop@gmail.com>
PR #2105 implements operand analysis to determine vectorization of gmem loads. Quote:
vectorize_helper
to build out a complete analysis for pointwise epilogue like the pointwise scheduler.Whichever option we take, I don't think that is easy and well tested. For option 1, we need to review the scheduler
canScheduleCompileTime
code and brainstorm more adversarial examples, and for option 2, we need to copy some code from the pointwise scheduler like in Matmul, enable epilogue input vectorization #807. @drzejan2 do you remember the status of Matmul, enable epilogue input vectorization #807?Originally posted by @zasdfgbnm in #2105 (comment)
Option 1 is tracked in #2167. This issue corresponds to option 2 listed above. We might additionally need to update
MatmulParams::SupportedVectorization
if we support different vectorizations for the different input and output tensorsSee #807 and #682
The text was updated successfully, but these errors were encountered: