Fixup AIRRt lowerings to support convolution examples #620

erwei-xilinx · 2024-06-26T22:20:16Z

Some existing tests for -air-to-std were modified because in air.channel.put/get we follow memref.subview convention for strides, where zero means repeating.

…e do not match

… offsets via strides instead of memref shapes

mlir/test/Conversion/AIRLowering/air_channel_get_put.mlir

fifield · 2024-06-27T13:53:41Z

mlir/lib/Conversion/AIRLoweringPass.cpp

+  // If the last dimension's stride value is not 1, then for AIE2 we use the
+  // second dimension of shim dma bd to implement the last dimension.
+  if (*lastStrideConst != 1) {
+    offsets.push_back(zero_idx);
+    wraps.push_back(one_idx);
+    strides.push_back(one_idx);
+  }


I don't follow the description. Are we doing something AIE2 specific?

I noticed that in this line https://github.com/Xilinx/mlir-aie/blob/d850560c77799af96c6361a79e34cb0a8e842c50/lib/Dialect/AIEX/Transforms/AIEDmaToNpu.cpp#L302 the d0_stride is always set to 0, meaning that the innermost dimension must always be accessing data in "row-major, consecutive" order. I do not know if this was intended for constraint of any specific architecture version.

It's set to zero because it is lowering airrt.dma_memcpy_nd which implicitly sets it to zero (or actual - 1). The op literally does not have a d0 stride operand (that's why there's three strides vs. four wraps/offsets in the c++ code). See here: https://github.com/Xilinx/mlir-air/blob/main/mlir/include/air/Dialect/AIRRt/AIRRtOps.td#L131. Why? Because AIE1 could only do 1d contiguous memcpy and everything else was emulated in firmware. Then npu lowering was bolted onto that.

Thanks for the information on why this is the case. I reverted this change in #626

Xilinx/mlir-aie#1584 is intended to simplify the situation

fifield · 2024-06-27T13:54:11Z

mlir/lib/Conversion/AIRLoweringPass.cpp

+  // In aiex.npu ops, stride value 0 means 1; only the highest dimension stride
+  // value 0 really means repeat.


Are we doing something runtime (npu) specific here?

In this line https://github.com/Xilinx/mlir-aie/blob/d850560c77799af96c6361a79e34cb0a8e842c50/lib/Dialect/AIEX/Transforms/AIEDmaToNpu.cpp#L305 aiex.npu.dma has two schemes on assigning registers, depending on if stride[i] is zero or not. I think this is to get around with the fact that in the hardware the register field is taking "strides[i] - 1".

I still don't understand what this code is trying to accomplish. The lowering from air to airrt shouldn't be concerned with the semantics of the aiex.npu op it might eventually lower to. The lowering should only care about the semantics of airrt.dma_memcpy_nd.

Make sense. Reverted in #626

fifield · 2024-06-27T13:58:01Z

test/xrt/13_conv2d_i32/aie.py

+    ## Tiling
+    ################################################
+
+    air_tiled_ir_string = """


Is it possible to generate this via tiling from linalg or using the python bindings? And if not, why not have a conv2d.mlir input file instead of embedding as a string? Then one can at least look at it with syntax highlighting.

Yeah this test should definitely be possible to be converted to having input IR generated by Transform Dialect or python bindings. Either choice makes great sense to me. This board test is just a placeholder to verify the functionality of convolution from tiled IR downwards. We should definitely come back to the test and change the input IR format.

fifield · 2024-06-27T14:05:35Z

mlir/lib/Conversion/AIRRtToNpuPass.cpp

+    SmallVector<int64_t>
+        staticOffsets; // Note: for static offsets we compose one single offset
+                       // at the last dimension.


What is the motivation for this? Should it be a canonicalization in mlir-aie instead?

I noticed an issue in mlir-aie, where the multi-dimensional offsets were composed to the BD's base offset using memref shape: https://github.com/Xilinx/mlir-aie/blob/d850560c77799af96c6361a79e34cb0a8e842c50/lib/Dialect/AIEX/Transforms/AIEDmaToNpu.cpp#L284

I think this would not work when the memref shape and data movement wraps do not match. We have never had this issue before because with matmul they would always match; direct code-generated conv2d starts to trigger this issue.

So to get around it I compose the offsets in mlir-air, into one single static offset at the lowest dimension.

Why don't we fix the mlir-aie code to work in the general case instead of adding workarounds?

Make sense. Seems related: Xilinx/mlir-aie#1578

erwei-xilinx added 2 commits June 26, 2024 16:08

Refactor lowering to airrt.dma to fixup bug when wrap and memref shap…

bb0dacb

…e do not match

Only using the last dimension for static offsets; switch to composing…

fef825d

… offsets via strides instead of memref shapes

erwei-xilinx changed the title ~~Fixup airrt npu for conv~~ Fixup AIRRt lowerings to support convolution examples Jun 26, 2024

erwei-xilinx added 3 commits June 26, 2024 16:26

Remove unused variable

f1c0a13

Convolution board test

396b83e

Formatting

9719104

erwei-xilinx merged commit 4559217 into Xilinx:main Jun 26, 2024
11 checks passed

erwei-xilinx deleted the fixup_airrt_npu_for_conv branch June 26, 2024 23:58

fifield reviewed Jun 27, 2024

View reviewed changes

mlir/test/Conversion/AIRLowering/air_channel_get_put.mlir Show resolved Hide resolved

fifield reviewed Jun 27, 2024

View reviewed changes

hunhoffe mentioned this pull request Jun 27, 2024

Single Core DMA/Channel Matrix Scalar Add Examples Broken #623

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixup AIRRt lowerings to support convolution examples #620

Fixup AIRRt lowerings to support convolution examples #620

erwei-xilinx commented Jun 26, 2024

fifield Jun 27, 2024

erwei-xilinx Jun 27, 2024

fifield Jun 27, 2024

erwei-xilinx Jun 27, 2024

fifield Jun 27, 2024

fifield Jun 27, 2024

erwei-xilinx Jun 27, 2024

fifield Jun 27, 2024

erwei-xilinx Jun 27, 2024

fifield Jun 27, 2024 •

edited

Loading

erwei-xilinx Jun 27, 2024

fifield Jun 27, 2024

erwei-xilinx Jun 27, 2024

fifield Jun 27, 2024

erwei-xilinx Jun 27, 2024

		// In aiex.npu ops, stride value 0 means 1; only the highest dimension stride
		// value 0 really means repeat.

Fixup AIRRt lowerings to support convolution examples #620

Fixup AIRRt lowerings to support convolution examples #620

Conversation

erwei-xilinx commented Jun 26, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fifield Jun 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fifield Jun 27, 2024 •

edited

Loading