Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Unity][Dlight] Fix DecodeGeMV rule for spatial-inner with grouping (#…
…15340) This PR fixes a bug of DecodeGeMV dlight rule when the innermost tensor dimension is spatial with `unroll_factor` (for example, the grouping used in group quantization). Prior to this PR, a reduction loop that is bound to threadIdx was reordered to reside outside a split spatial loop, which prevents the TIR LowerCrossThreadReduction pass to successfully apply due to some safety-guard requirement. This PR fixes this issue by not reordering the split spatial loop after the reduction loop, so that the pass can be applied. Note that we can do this as the order of thread-binding loops does not matter.
- Loading branch information