-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixup AIRRt lowerings to support convolution examples #620
Fixup AIRRt lowerings to support convolution examples #620
Conversation
… offsets via strides instead of memref shapes
// If the last dimension's stride value is not 1, then for AIE2 we use the | ||
// second dimension of shim dma bd to implement the last dimension. | ||
if (*lastStrideConst != 1) { | ||
offsets.push_back(zero_idx); | ||
wraps.push_back(one_idx); | ||
strides.push_back(one_idx); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't follow the description. Are we doing something AIE2 specific?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that in this line https://github.com/Xilinx/mlir-aie/blob/d850560c77799af96c6361a79e34cb0a8e842c50/lib/Dialect/AIEX/Transforms/AIEDmaToNpu.cpp#L302 the d0_stride is always set to 0, meaning that the innermost dimension must always be accessing data in "row-major, consecutive" order. I do not know if this was intended for constraint of any specific architecture version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's set to zero because it is lowering airrt.dma_memcpy_nd
which implicitly sets it to zero (or actual - 1). The op literally does not have a d0 stride operand (that's why there's three strides vs. four wraps/offsets in the c++ code). See here: https://github.com/Xilinx/mlir-air/blob/main/mlir/include/air/Dialect/AIRRt/AIRRtOps.td#L131. Why? Because AIE1 could only do 1d contiguous memcpy and everything else was emulated in firmware. Then npu lowering was bolted onto that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the information on why this is the case. I reverted this change in #626
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Xilinx/mlir-aie#1584 is intended to simplify the situation
// In aiex.npu ops, stride value 0 means 1; only the highest dimension stride | ||
// value 0 really means repeat. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we doing something runtime (npu) specific here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this line https://github.com/Xilinx/mlir-aie/blob/d850560c77799af96c6361a79e34cb0a8e842c50/lib/Dialect/AIEX/Transforms/AIEDmaToNpu.cpp#L305 aiex.npu.dma has two schemes on assigning registers, depending on if stride[i] is zero or not. I think this is to get around with the fact that in the hardware the register field is taking "strides[i] - 1".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't understand what this code is trying to accomplish. The lowering from air
to airrt
shouldn't be concerned with the semantics of the aiex.npu
op it might eventually lower to. The lowering should only care about the semantics of airrt.dma_memcpy_nd
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense. Reverted in #626
## Tiling | ||
################################################ | ||
|
||
air_tiled_ir_string = """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to generate this via tiling from linalg or using the python bindings? And if not, why not have a conv2d.mlir input file instead of embedding as a string? Then one can at least look at it with syntax highlighting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this test should definitely be possible to be converted to having input IR generated by Transform Dialect or python bindings. Either choice makes great sense to me. This board test is just a placeholder to verify the functionality of convolution from tiled IR downwards. We should definitely come back to the test and change the input IR format.
SmallVector<int64_t> | ||
staticOffsets; // Note: for static offsets we compose one single offset | ||
// at the last dimension. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the motivation for this? Should it be a canonicalization in mlir-aie instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed an issue in mlir-aie, where the multi-dimensional offsets were composed to the BD's base offset using memref shape: https://github.com/Xilinx/mlir-aie/blob/d850560c77799af96c6361a79e34cb0a8e842c50/lib/Dialect/AIEX/Transforms/AIEDmaToNpu.cpp#L284
I think this would not work when the memref shape and data movement wraps do not match. We have never had this issue before because with matmul they would always match; direct code-generated conv2d starts to trigger this issue.
So to get around it I compose the offsets in mlir-air, into one single static offset at the lowest dimension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we fix the mlir-aie code to work in the general case instead of adding workarounds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense. Seems related: Xilinx/mlir-aie#1578
-air-to-std
were modified because inair.channel.put/get
we followmemref.subview
convention for strides, where zero means repeating.