Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixup AIRRt lowerings to support convolution examples #620

Merged
merged 5 commits into from
Jun 26, 2024

Conversation

erwei-xilinx
Copy link
Collaborator

  • Some existing tests for -air-to-std were modified because in air.channel.put/get we follow memref.subview convention for strides, where zero means repeating.

@erwei-xilinx erwei-xilinx changed the title Fixup airrt npu for conv Fixup AIRRt lowerings to support convolution examples Jun 26, 2024
@erwei-xilinx erwei-xilinx merged commit 4559217 into Xilinx:main Jun 26, 2024
11 checks passed
@erwei-xilinx erwei-xilinx deleted the fixup_airrt_npu_for_conv branch June 26, 2024 23:58
Comment on lines +515 to +521
// If the last dimension's stride value is not 1, then for AIE2 we use the
// second dimension of shim dma bd to implement the last dimension.
if (*lastStrideConst != 1) {
offsets.push_back(zero_idx);
wraps.push_back(one_idx);
strides.push_back(one_idx);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow the description. Are we doing something AIE2 specific?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that in this line https://github.com/Xilinx/mlir-aie/blob/d850560c77799af96c6361a79e34cb0a8e842c50/lib/Dialect/AIEX/Transforms/AIEDmaToNpu.cpp#L302 the d0_stride is always set to 0, meaning that the innermost dimension must always be accessing data in "row-major, consecutive" order. I do not know if this was intended for constraint of any specific architecture version.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's set to zero because it is lowering airrt.dma_memcpy_nd which implicitly sets it to zero (or actual - 1). The op literally does not have a d0 stride operand (that's why there's three strides vs. four wraps/offsets in the c++ code). See here: https://github.com/Xilinx/mlir-air/blob/main/mlir/include/air/Dialect/AIRRt/AIRRtOps.td#L131. Why? Because AIE1 could only do 1d contiguous memcpy and everything else was emulated in firmware. Then npu lowering was bolted onto that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the information on why this is the case. I reverted this change in #626

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Xilinx/mlir-aie#1584 is intended to simplify the situation

Comment on lines +538 to +539
// In aiex.npu ops, stride value 0 means 1; only the highest dimension stride
// value 0 really means repeat.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we doing something runtime (npu) specific here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this line https://github.com/Xilinx/mlir-aie/blob/d850560c77799af96c6361a79e34cb0a8e842c50/lib/Dialect/AIEX/Transforms/AIEDmaToNpu.cpp#L305 aiex.npu.dma has two schemes on assigning registers, depending on if stride[i] is zero or not. I think this is to get around with the fact that in the hardware the register field is taking "strides[i] - 1".

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand what this code is trying to accomplish. The lowering from air to airrt shouldn't be concerned with the semantics of the aiex.npu op it might eventually lower to. The lowering should only care about the semantics of airrt.dma_memcpy_nd.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense. Reverted in #626

## Tiling
################################################

air_tiled_ir_string = """
Copy link
Collaborator

@fifield fifield Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to generate this via tiling from linalg or using the python bindings? And if not, why not have a conv2d.mlir input file instead of embedding as a string? Then one can at least look at it with syntax highlighting.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this test should definitely be possible to be converted to having input IR generated by Transform Dialect or python bindings. Either choice makes great sense to me. This board test is just a placeholder to verify the functionality of convolution from tiled IR downwards. We should definitely come back to the test and change the input IR format.

Comment on lines +140 to +142
SmallVector<int64_t>
staticOffsets; // Note: for static offsets we compose one single offset
// at the last dimension.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the motivation for this? Should it be a canonicalization in mlir-aie instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed an issue in mlir-aie, where the multi-dimensional offsets were composed to the BD's base offset using memref shape: https://github.com/Xilinx/mlir-aie/blob/d850560c77799af96c6361a79e34cb0a8e842c50/lib/Dialect/AIEX/Transforms/AIEDmaToNpu.cpp#L284

I think this would not work when the memref shape and data movement wraps do not match. We have never had this issue before because with matmul they would always match; direct code-generated conv2d starts to trigger this issue.

So to get around it I compose the offsets in mlir-air, into one single static offset at the lowest dimension.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we fix the mlir-aie code to work in the general case instead of adding workarounds?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense. Seems related: Xilinx/mlir-aie#1578

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants